Annotating Images & Markdown

Label images and text in their original context

By Juan Miguel Cejuela👐Open Link to this article

A recent update of tagtog, gave support to annotate Markdown files. Therefore now images, nested lists, or code blocks are fully supported. This opens many new possibilities for annotation. Let’s focus on 3.

1. Annotate in Context 👁 (+ it just looks better 💅)

It's not the same annotating the original text in its original context than annotating whatever unformatted text your parser strips out. First of all, text in images and even the visual style convey meaning. Are we capturing these nuances in NLP systems right now?

What would you prefer to annotate? this?

NLP has different sources of bias: 1. The selection of the training data. 2. The biases of the annotators. 3. The inductive bias of the model. 4. How the task is designed overall. @eurnlp #eurnlp

or this?

How about this? Note that some of the text’s meaning is lost without is complementing image (Image 3).

For clarification, tagtog doesn't show tweets yet in their original formatting. We will have soon a specific presentation mode for tweets and you can follow our updates on Twitter @tagtog_net🐦. However, nicely formatted news like this post's header (Image 1), for example, are now perfectly possible.

2. Label Images 🖼

Several image annotation tasks are concerned only with assigning concrete labels to the images. Often the labels are binary, enumerations, or free strings. This is possible on tagtog with document labels, which in this case they refer to the images.

Moreover, several other NLP tasks are concerned with matching text to images. Take for instance the NLVR dataset on tagtog (Image 4), from the original NLRV dataset. In this dataset, the task was to determine whether the caption text (e.g. "There is a box with a blue circle, a black circle and a black square”) correctly described, true or false, the presented image. These types of annotations are now supported on tagtog.

3. Annotate Markdown Documentation 🅜📝

With more and more documentation being written in markdown (from README’s, to comments, to even entire theses and books), it makes sense to label and reap this vastness of data.

CommonMark Logo

How about annotating security flaws in code shared in StackOverflow? (Image 5).

We will review the many possibilities for annotating code (and verbatim-like-styled text) in a future post.

In the meantime, hope you liked this one!

What would you annotate with markdown? 🤔

Need training data for #NLP? Find & create it for free on: 🍃tagtog

Are you on Twitter? 🐦Follow @tagtog_net🐦

👏👏👏 Clap if you like the post, and want to share it with others! 🧡💚

The text annotation platform to train #NLP. Easy. 🔗

The text annotation platform to train #NLP. Easy. 🔗