Automatic adjudication based on the inter-annotator agreement

5 min readOct 22, 2019

By Jorge Campos

When annotators label the same data, adjudication is the process to resolve inconsistencies among the different versions and to promote a final version to the gold standard (master).

In a multi-user annotation environment, adjudication is usually a manual and lengthy process. It involves comparing each version and resolving conflicts one by one, especially in NLP.

tagtog now supports automatic adjudication to accelerate this process. Today we show when and how to use it.

If you want to know more about the different types of methods for adjudication you can read this post: The adjudication process in collaborative annotation

Requirements

There are two major constraints you need to be aware of before automating this process:

Overlapping in the annotated data

There must be overlapping within the data annotated. A portion of the data items must be annotated by more than one annotator to guarantee annotations are comparable.

In tagtog, you choose the overlapping degree when you distribute the dataset among users. For instance, three different users must annotate each document.

Clear and well-structured guidelines

If there is room for ambiguity, the judgment of the annotators might differ in identical scenarios and lower IAA.

Create and maintain your guidelines at tagtog. You can add pictures and format your rules using Markdown.

It all boils down to the complexity of your schema and domain. Before starting the production work, it is usually required to go over several annotation sessions to refine the guidelines.

We calculate the IAA automatically while users annotate. After each session, you can monitor progress and decide when you start with the bulk of your work. This usually means getting high IAA metrics first.

Automatic adjudication

In an environment with the appropriate conditions described above, it makes sense to move towards automation. It is recommended to interact with annotators, and it is clear that the manual resolution of inconsistencies is always more accurate. However, this process is hardly scalable.

We would like to introduce automatic adjudication using IAA. It chooses the annotations from the available-best annotator for each annotation task.

It follows a merging strategy based on choosing the annotations from the user with the highest IAA for all the documents for a specific annotation task.

A sample project

Let's create a new project and edit the settings.

Entities

Create two entity types (Settings > Entities):

SoftSkill: annotate soft skills in job offers.
TechnicalSkill: annotate technical skill in job offers.

Guidelines

In certain cases, the annotation tasks created might cause confusion across your team. Let's try to be more specific in the guidelines (Settings > Guidelines) of the project.

Members

We add some collaborators to help us tag.

**Fig. 2.** As we want their help to annotate, we attach them with the supercurator role. There is one admin and three supercurators (annotators). All of them (4) will annotate.

Work distribution

In this example 3 different annotators will annotate each document. Let's setup the distribution settings.

**Fig. 3.** We assign 3 annotators per document and confirm that the owner of the project (the person who created the project, in this case the admin) will also participate in the annotation work.

Now we are ready to import some text.

Documents

For this sample project, I have added the requirements of 6 job offers. As tagtog supports Markdown, I have imported formatted text, so the content is more engaging for the annotation process.

Each time a document is imported, it is assigned automatically to 3 members.

When users enter the project, they are automatically redirected to the TODO filter. This filter only shows the documents in their queue.

jorge has annotated his version of the document — **Fig. 4.** **Version from jorge.** This user has annotated his version of the document. Once a user completes the annotations for a given document, the user confirms the document.

Up to three members annotate their version for this document. In this case, member03. — **Fig. 5.** **Version from member03.** Three different members annotate their version for this document. As you can see, there are substantial differences between this version and the version in Fig. 4.

Once all users annotated and confirmed their version of the documents assigned, we have enough information to calculate the agreement among annotators (IAA). The platform will crunch the numbers for us.

You can always check the IAA at the Metrics section in your project.

**Fig. 6.** IAA metrics for two annotation tasks: entity type technicalSkill and entity type softSkill.

The matrices in Fig. 6. show the agreement between each possible pair of annotators for each annotation task. For example: member03 and jorge agree on the 84.41% of the cases for the annotation task techicalSkill. Whereas, member03 and jorge agree on the 74.27% of the cases for the annotation task softSkill.

These numbers reveal:

jorge is the annotator with the highest IAA average for the annotation task technicalSkill
member03 is the annotator with the highest IAA average for the annotation task softSkill