By Jorge Campos
The challenges of Machine Learning (ML) start with collecting training data. First, labeled datasets are scarce. Second, the increasing complexity and changing nature of linguistic nuances, such as in humanities, healthcare or finance, require the constant knowledge and verification from subject-matter experts (SMEs). In the context of natural language processing (NLP), this knowledge comes in the form of text annotations.
tagtog is a collaborative text annotation platform to find, create, and maintain NLP datasets efficiently. Accessible on the Cloud and On-Premises.
Collaborations between data analytics/AI professionals and SMEs often fail. This is partially due to the lack of accessible tools, which could allow SMEs to participate in Name Entity Recognition (NER) or text classification tasks. To bridge this gap, tagtog was designed as a collaborative annotation platform with an easy-to-use interface.
Creating training data on tagtog is as simple as highlighting text. In addition, you can associate relations, attach attributes to entities, or classifying the whole document. Annotations might be done both manually and automatically.
Automatic annotations reduce the effort required to produce labeled datasets. There are two methods available:
- Dictionaries: import or create collections of terms and extend them during the annotation tasks.
- ML: tagtog learns continuously from your annotations to generate precise predictions out of the box. If preferred, an external ML model can be plugged into the platform. SMEs review the ML predictions creating a continuous learning loop to train and keep the model up to date.
To quickly bootstrap annotation projects, tagtog supports several file formats natively. It enriches the annotating experience, eliminates unnecessary parsing steps, and allows users to annotate directly over PDFs, import PubMed articles, HTML, CSV, source code, or even Markdown files. For tighter integration, an API is available to import annotations and files, export annotations and metrics, and search.
To track annotation projects and data quality, tagtog measures the progress of the project members along with their agreement with other annotators (Inter-Annotator Agreement). Simply spot biases, unbalanced classes, or oversampled data by checking the distribution of your annotations.
At 🍃tagtog.net we aim to democratize text analytics.
👏 👏 👏 if you liked the post and want to share it with others!