The cold start problem in NLP

The cold start problem in NLP:

  • You have to train a model that detects new cryptocurrencies on Twitter, but you are not given any annotated tweets.
  • You have to build a healthcare chatbot that will take action when a user reports common symptoms for a new disease: but again, nobody wants to annotate your chat logs.

How to successfully perform Named Entity Recognition when there is no hand-labeled data for the target domain?

In this tutorial I show you how to combine Rubrix and skweak to rapidly produce annotated data from rules based on expert knowledge, with minimal manual effort.



Ruan Chaves Rodrigues

Machine Learning Engineer. MSc student at the EMLCT programme. Personal website: