When

Thursday June 4th; 1:00-2:30 and 2:30-4:00

Github

All code and data

Instructors

Dave Campbell and Nathan Taback

R libraries to install

Part 1

Topics

  • R Libraries for Working with Text
  • Importing unstructured text into R: copy/paste, external API, webscraping, R libraries
  • Regular expressions and text normalization (e.g., tokenization)
  • N-Grams
  • Word Vectors and Matricies

Part 2

Topics

Statistical Models for Unstructured Text

  • Sentiment analysis models
  • Topic modelling; LDA and what about inference?
  • Word2Vec, what it is and what about inference
  • Open problems and general discussion