The cold start problem in NLP

The cold start problem in NLP:

  • You have to train a model that detects new cryptocurrencies on Twitter, but you are not given any annotated tweets.
  • You have to build a healthcare chatbot that will take action when a user reports common symptoms for a new disease: but again, nobody wants to annotate your chat logs.

How to successfully perform Named Entity Recognition when there is no hand-labeled data for the target domain?

In this tutorial I show you how to combine Rubrix and skweak to rapidly produce annotated data from rules based on expert knowledge, with minimal manual effort.

--

--

Ruan Chaves Rodrigues

Machine Learning Engineer. MSc student at the EMLCT programme. Personal website: https://ruanchaves.github.io/