A short introduction to tagtog, text annotation made easy

By Jorge Campos

The challenges of Machine Learning (ML) start with collecting training data. First, labeled datasets are scarce. Second, the increasing complexity and changing nature of linguistic nuances, such as in humanities, healthcare or finance, require the constant knowledge and verification from subject-matter experts (SMEs). In the context of natural language processing (NLP), this knowledge comes in the form of text annotations.

tagtog is a collaborative text annotation platform to find, create, and maintain NLP datasets efficiently. Accessible on the Cloud and On-Premises.

Collaborations between data analytics/AI professionals and SMEs often fail. This is partially due to the lack of accessible tools, which could allow SMEs to participate in Name Entity Recognition (NER) or text classification tasks. To bridge this gap, tagtog was designed as a collaborative annotation platform with an easy-to-use interface.

Image for post
Image for post
Three entities annotated, two annotations are overlapping

Creating training data on tagtog is as simple as highlighting text. In addition, you can associate relations, attach attributes to entities, or classifying the whole document. Annotations might be done both manually and automatically.

Image for post
Image for post

Automatic annotations reduce the effort required to produce labeled datasets. There are two methods available:

- Dictionaries: import or create collections of terms and extend them during the annotation tasks.

- ML: tagtog learns continuously from your annotations to generate precise predictions out of the box. If preferred, an external ML model can be plugged into the platform. SMEs review the ML predictions creating a continuous learning loop to train and keep the model up to date.

To quickly bootstrap annotation projects, tagtog supports several file formats natively. It enriches the annotating experience, eliminates unnecessary parsing steps, and allows users to annotate directly over PDFs, import PubMed articles, HTML, CSV, source code, or even Markdown files. For tighter integration, an API is available to import annotations and files, export annotations and metrics, and search.

Image for post
Image for post
Malicious code flagged on tagtog

To track annotation projects and data quality, tagtog measures the progress of the project members along with their agreement with other annotators (Inter-Annotator Agreement). Simply spot biases, unbalanced classes, or oversampled data by checking the distribution of your annotations.

Image for post
Image for post
Inter-annotator agreement matrix. It contains the scores between pairs of users. For example, Vega and Joao agree on the 87% of the cases.

I hope this helped. Please let me know if you have any questions or feedback. You can find more tutorials about this text annotation tool here or following our blog.

Documentation: http://docs.tagtog.net

At 🍃tagtog.net we aim to democratize text analytics.

👏 👏 👏 if you liked the post and want to share it with others!

The text annotation tool to train #AI. Easy. 🔗tagtog.net

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store