Annotating Images & Markdown

🍃tagtog
3 min readOct 13, 2019

Label images and text in their original context

By Juan Miguel Cejuela — 👐Open Link to this article

A recent update of tagtog, gave support to annotate Markdown files. Therefore now images, nested lists, or code blocks are fully supported. This opens many new possibilities for annotation. Let’s focus on 3.

Image 1: Annotating News with Images & Markdown on tagtog.net

1. Annotate in Context 👁 (+ it just looks better 💅)

It's not the same annotating the original text in its original context than annotating whatever unformatted text your parser strips out. First of all, text in images and even the visual style convey meaning. Are we capturing these nuances in NLP systems right now?

What would you prefer to annotate? this?

NLP has different sources of bias: 1. The selection of the training data. 2. The biases of the annotators. 3. The inductive bias of the model. 4. How the task is designed overall. @eurnlp #eurnlp

or this?

Image 2: Tweet in its original context; better suitable for annotation

How about this? Note that some of the text’s meaning is lost without is complementing image (Image 3).

Image 3: Tweet’s text full meaning is lost without the image

For clarification, tagtog doesn't show tweets yet in their original formatting. However, nicely formatted news like this post's header (Image 1), for example, are now perfectly possible.

2. Label Images 🖼

Several image annotation tasks are concerned only with assigning concrete labels to the images. Often the labels are binary, enumerations, or free strings. This is possible on tagtog with document labels, which in this case they refer to the images.

Moreover, several other NLP tasks are concerned with matching text to images. Take for instance the NLVR dataset on tagtog (Image 4), from the original NLRV dataset. In this dataset, the task was to determine whether the caption text (e.g. "There is a box with a blue circle, a black circle and a black square”) correctly described, true or false, the presented image. These types of annotations are now supported on tagtog.

Image 4: Labeling images and text associated to an image.

3. Annotate Markdown Documentation 🅜📝

With more and more documentation being written in markdown (from README’s, to comments, to even entire theses and books), it makes sense to label and reap this vastness of data.

CommonMark Logo
CommonMark, the de-facto specification for Markdown.

How about annotating security flaws in code shared in StackOverflow? (Image 5).

Image 5: finding security threats in code posted in StackOverflow.

We will review the many possibilities for annotating code (and verbatim-like-styled text) in a future post.

In the meantime, hope you liked this one!

What would you annotate with markdown? 🤔

👏👏👏 Clap if you like the post, and want to share it with others! 🧡💚

--

--