Books & Videos

Table of Contents

  1. Chapter 1 The Basics

    1. The Importance of Language Annotation

    2. A Brief History of Corpus Linguistics

    3. Language Data and Machine Learning

    4. The Annotation Development Cycle

    5. Summary

  2. Chapter 2 Defining Your Goal and Dataset

    1. Defining Your Goal

    2. Background Research

    3. Assembling Your Dataset

    4. The Size of Your Corpus

    5. Summary

  3. Chapter 3 Corpus Analytics

    1. Basic Probability for Corpus Analytics

    2. Counting Occurrences

    3. Language Models

    4. Summary

  4. Chapter 4 Building Your Model and Specification

    1. Some Example Models and Specs

    2. Adopting (or Not Adopting) Existing Models

    3. Different Kinds of Standards

    4. Summary

  5. Chapter 5 Applying and Adopting Annotation Standards

    1. Metadata Annotation: Document Classification

    2. Text Extent Annotation: Named Entities

    3. Linked Extent Annotation: Semantic Roles

    4. ISO Standards and You

    5. Summary

  6. Chapter 6 Annotation and Adjudication

    1. The Infrastructure of an Annotation Project

    2. Specification Versus Guidelines

    3. Be Prepared to Revise

    4. Preparing Your Data for Annotation

    5. Writing the Annotation Guidelines

    6. Annotators

    7. Choosing an Annotation Environment

    8. Evaluating the Annotations

    9. Creating the Gold Standard (Adjudication)

    10. Summary

  7. Chapter 7 Training: Machine Learning

    1. What Is Learning?

    2. Defining Our Learning Task

    3. Classifier Algorithms

    4. Sequence Induction Algorithms

    5. Clustering and Unsupervised Learning

    6. Semi-Supervised Learning

    7. Matching Annotation to Algorithms

    8. Summary

  8. Chapter 8 Testing and Evaluation

    1. Testing Your Algorithm

    2. Evaluating Your Algorithm

    3. Problems That Can Affect Evaluation

    4. Final Testing Scores

    5. Summary

  9. Chapter 9 Revising and Reporting

    1. Revising Your Project

    2. Reporting About Your Work

    3. Summary

  10. Chapter 10 Annotation: TimeML

    1. The Goal of TimeML

    2. Related Research

    3. Building the Corpus

    4. Model: Preliminary Specifications

    5. Annotation: First Attempts

    6. Model: The TimeML Specification Used in TimeBank

    7. Annotation: The Creation of TimeBank

    8. TimeML Becomes ISO-TimeML

    9. Modeling the Future: Directions for TimeML

    10. Summary

  11. Chapter 11 Automatic Annotation: Generating TimeML

    1. The TARSQI Components

    2. Improvements to the TTK

    3. TimeML Challenges: TempEval-2

    4. Future of the TTK

    5. Summary

  12. Chapter 12 Afterword: The Future of Annotation

    1. Crowdsourcing Annotation

    2. Handling Big Data

    3. NLP Online and in the Cloud

    4. And Finally...

  1. Appendix List of Available Corpora and Specifications

    1. Corpora

    2. Specifications, Guidelines, and Other Resources

    3. Representation Standards

  2. Appendix List of Software Resources

    1. Annotation and Adjudication Software

    2. Machine Learning Resources

  3. Appendix MAE User Guide

    1. Installing and Running MAE

    2. Loading Tasks and Files

    3. Saving Files

    4. Defining Your Own Task

    5. Frequently Asked Questions

  4. Appendix MAI User Guide

    1. Installing and Running MAI

    2. Loading Tasks and Files

    3. Adjudicating

    4. Saving Files

  5. Appendix Bibliography

    1. References for Using Amazon’s Mechanical Turk/Crowdsourcing

  6. Colophon