2020/09/07

Me

  • Lasse Hjorth Madsen
  • Master of Political Science way back
  • Data scientist at Novo Nordisk

Political Science was broken last time I saw it

  • Relied mostly on survey data
  • Too little statistics of too poor quality
  • No consensus of what constitues good science

The good news: Data science can fix that

  • Many more sources of data
  • Better statistical tools and understanding
  • A mindset from the natural sciences

Case: Finding similar deviations

Novo Nordisk Kalundborg

Deviations at Novo Nordisk

  • Problem: We need a way to find similar descriptions of deviations
  • Solution: Natural language processing. Document similarity.
  • Question: How to turn text into numbers?

Toy example: Document-term-frequencies

text are dinosaurs reptiles games play kids like
dinosaurs are reptiles 1 1 1 0 0 0 0
dinosaurs play games 0 1 0 1 1 0 0
kids like games 0 0 0 1 0 1 1


  • You have got to be joking …
  • Bag-of-words: Word order is ignored
  • All words are treated equally

Words plotted by frequency rank and frequency

Words plotted by frequency rank and frequency