Course detailHarvardEmerging / Needs Reviewopen

CSCI S-89B

Introduction to Natural Language Processing

Students are introduced to modern techniques of natural language processing (NLP) and learn foundations of text classification, named entity recognition, parsing, language modeling including text generation, topic modeling, and machine translation.

Methods for representing text as data studied in the course are tokenization, n-grams, bag of words, term frequency-inverse document frequency (TD-IDF) weighting, word embeddings like Word2Vec and GloVe, autoencoders, t-SNE, character embeddings, and topic modeling.

The machine learning algorithms for NLP covered in the course are recurrent neural networks (RNNs) including long short-term memory (LSTM), conditional random fields (CRFs), bidirectional LSTM with a CRF (BiLSTM-CRF), generative adversarial networks (GANs), attention models, transformers, bidirectional encoder representations from transformers (BERT), latent Dirichlet allocation (LDA), non-negative matrix factorization (NMF), and structural topic modeling (STM).

Students get hands-on experience using both Python and R.

Schedule note
TTh 6:30pm - 9:30pm Jun 21 to Aug 6

Help keep the register running.

Every cup of coffee fuels the sync workers and proxy rotations.

Buy me a coffee