In short: OCR to PDF & upload to tagtog

By Dr. Juan Miguel Cejuela🤲 this story’s open link

Your problem: you have a bunch of scanned images or PDFs but cannot make use of it because you cannot even select the text.

Your solution: you just need to OCR your scans (saving the results into text-embedded PDFs) and then to upload those into tagtog. From there on, you can annotate (by just highlighting the text) and export your valuable data into machine-readable JSON. This is thanks to the Native PDF annotation built in tagtog.

Annotated (Native) PDFs look like this on tagtog:

Any text-embedded PDF can be annotated in tagtog and it’s as easy as selecting text.

All your text selections can…

By Jorge Campos

When you finish this article, you will understand:

  • What is a webhook
  • How to connect your model to tagtog (or other services) using webhooks
  • How to test webhooks locally
  • How using webhooks will make the training of your model more accessible

First of all, what is a webhook?

A webhook is a notification mechanism that allows your system to receive events from a different service in real-time. Event notifications are sent via HTTP POST requests to an endpoint defined by you.

Think of it as an SMS notification. You make a change to your bank account details (event), and you receive an SMS asking…

By Jorge Campos

Taking the spaCy v3 release opportunity, we wanted to write a small piece about how to integrate tagtog and spaCy.

tagtog is a multi-user text annotation tool designed to build high-quality data efficiently. spaCy is an open-source library for advanced Natural Language Processing (NLP) in Python.

This example uses spaCy to automatically generate NER (Named-Entity Recognition) annotations and display these annotations directly in tagtog.

First, we create a project in tagtog and define a few entity types in the project settings. Bear in mind that these types should map those used by your model. We will use…

Photo by Halacious on Unsplash

By Jorge Campos

At tagtog, we are constantly innovating. Lately, we have been playing with the potential of Markdown to make the annotation process more enjoyable and clear.

Now we go one step further. For each Markdown code block, you can now select a predefined layout. Say hello to tagtog blocks! 👋

Let's see some examples!

By Jorge Campos

When annotators label the same data, adjudication is the process to resolve inconsistencies among the different versions and to promote a final version to the gold standard (master in tagtog).

In a multi-user annotation environment, adjudication is usually a manual and lengthy process. It involves comparing each version and resolving conflicts one by one, especially in NLP.

tagtog now supports automatic adjudication to accelerate this process. Today we show when and how to use it.

If you want to know more about the different types of methods for adjudication you can read this post: The adjudication process…

Label images and text in their original context

By Juan Miguel Cejuela👐Open Link to this article

A recent update of tagtog, gave support to annotate Markdown files. Therefore now images, nested lists, or code blocks are fully supported. This opens many new possibilities for annotation. Let’s focus on 3.

Image 1: Annotating News with Images & Markdown on

1. Annotate in Context 👁 (+ it just looks better 💅)

It's not the same annotating the original text in its original context than annotating whatever unformatted text your parser strips out. First of all, text in images and even the visual style convey meaning. Are we capturing these nuances in NLP systems right now?

What would you prefer to…

Some useful tips and suggestions that may be useful for anyone who is managing a team of annotators

By Uxío García Andrade(👐Open Link to this article)

In our latest update, we have included a new 🆕 feature to our search engine, which allows users to filter documents based on whether a given user has confirmed them (or any user has confirmed them). This feature, despite it may not look like such a big deal, it can be used for many different things. In this post I will discuss some of the applications 🚀.

Your annotators are working hard. Help them a little.

First, let’s see how it is possible to use the query in the tagtog web application. You just need to navigate to a…

Training your machine learning models in a collaborative platform has never been easier.

By Uxío García Andrade

tagtog is a collaborative text annotation platform to find, create, and maintain NLP datasets efficiently. You can find a quick introduction to tagtog here. But, apart from being just a text annotation platform, tagtog has many other functionalities, and in this article I will explain how to take advantage of one of them: training your artificial intelligence models.

The first step to start an AI project is to decide what problem we want to solve, and then choose a model that could perform adequately. Also, bear in mind that the performance of the model will depend…

By Jorge Campos

The challenges of Machine Learning (ML) start with collecting training data. First, labeled datasets are scarce. Second, the increasing complexity and changing nature of linguistic nuances, such as in humanities, healthcare or finance, require the constant knowledge and verification from subject-matter experts (SMEs). In the context of natural language processing (NLP), this knowledge comes in the form of text annotations.

tagtog is a collaborative text annotation platform to find, create, and maintain NLP datasets efficiently. Accessible on the Cloud and On-Premises.

Collaborations between data analytics/AI professionals and SMEs often fail. This is partially due to the lack…

Photo by Franki Chamaki

By Jorge Campos

Some weeks ago we rolled out at 🍃 a feature to track the quality of your datasets using the: Inter-Annotator Agreement (IAA).

If you have labeled data and different people (or ML systems) have collaborated to label the same subsets of data (e.g. 4 subject-matter experts annotate separately the same subset of legal contracts), you can compare these annotations to have an idea of their quality. If all your annotators make the same annotations independently (high IAA), it means your guidelines are clear and your annotations are most likely correct.

Note that a high IAA doesn’t strictly…


