tagtog is a multi-user text annotation tool designed to build high-quality data efficiently. spaCy is an open-source library for advanced Natural Language Processing (NLP) in Python.
This example uses spaCy to automatically generate NER (Named-Entity Recognition) annotations and display these annotations directly in tagtog.
First, we create a project in tagtog and define a few entity types in the project settings. Bear in mind that these types should map those used by your model. We will use the spaCy’s model en_core_web_sm for English to extract entities representing people, organizations, and money. We review the model label scheme and find these labels: PERSON, ORG, and MONEY. In tagtog, we create similar entity types:
Second, we want to upload to tagtog a text annotated by this model. To do that, first we transform the annotations coming out of the spaCy model and transform them into the annotations tagtog can digest. Below you can find a Python code snippet that does the following:
- Given a sample text, it forwards it to the en_core_web_sm model.
- It transforms the model response into annotations tagtog can understand.
- It pushes the text and annotations (pre-annotated document) to tagtog using its API.
The sample text:
Paypal Holdings Inc (PYPL) President and CEO Daniel Schulman Sold $2.7 million of Shares
🪄Now you can find the annotated text in your tagtog project.
On top, here is the full GitHub repository (this code is generalized to also work with PDFs!):
Like birds? Follow us on Twitter 🐦
👏👏👏 if you like the post, and want to share it with others! 💚🧡