When
Thursday June 4th; 1:00-2:30 and 2:30-4:00
R libraries to install
- Data Acquisition tools: gutenbergr, genius, rvest, RSelenium
- Tidyverse Tools: tidyverse, tidytext, dplyr, stringr, janitor, ggplot2
- LDA libraries: topicmodels, tm, lda
- Word2Vec library installs from github:
- library(devtools)
- install_github(“bmschmidt/wordVectors”)
- library(wordVectors)
Part 1
Topics
- R Libraries for Working with Text
- Importing unstructured text into R: copy/paste, external API, webscraping, R libraries
- Regular expressions and text normalization (e.g., tokenization)
- N-Grams
- Word Vectors and Matricies
Part 2
Topics
Statistical Models for Unstructured Text
- Sentiment analysis models
- Topic modelling; LDA and what about inference?
- Word2Vec, what it is and what about inference
- Open problems and general discussion