Date Lecture Readings Logistics
Tues 01/16/24 Lecture #1:
  • Course Introduction
  • Logistics
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 1

Thu 01/18/24 Lecture #2:
  • Text classification
  • Generative Models vs Discriminative Models
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 4 & 5

Tues 01/23/24 Lecture #3:
  • Text classification (cont.)
  • Evaluation
  • N-Gram Language Models
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 4
  • Jurafsky & Martin Chapter 3
Optional readings:
  • The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing (Dror et al., 2018) [link](https://aclanthology.org/P18-1128.pdf)

Thu 01/25/24 Lecture #4:
  • Word embeddings and vector semantics
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 6

HW1 out 1/29

Tues 01/30/24 Lecture #5:
  • Word embeddings and vector semantics (cont.)
[ slides ]
Main readings:
  • Jurafsky & Martin Chapter 6
Optional readings:
  • Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]
  • Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link]
  • Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]

Thu 02/01/24 Lecture #6:
  • Information theory
  • Sequence labeling
[ slides ]
Main readings:
  • Eisenstein Chapter 7

Fri 02/02/24
2:00-3:00pm
Recitation #1:
Numpy and Pytorch Tutorial Session

Tues 02/06/24 Lecture #7:
  • Sequential classification
  • Conditional Random Fields
[ slides ]
Main readings:
  • Eisenstein 7.1-7.3
  • Micheal Collins' notes on CRFs [link]
Optional readings:
  • Lafferty et al. (2001) Conditional Random Fields- Probabilistic Models for Segmenting and Labeling Sequence Data [link]
  • Sutton and McCallum (2010) An Introduction to Conditional Random Fields [link]

Thu 02/08/24 Lecture #8:
  • Basics of Neural Networks and Language Model Training
[ slides (annotated) ]
Main readings:
  • The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link]
  • Little book of deep learning (François Fleuret) - Ch 3

2/11 HW 1 due

Tues 02/13/24 Lecture #9:
  • Autograd
  • Building blocks of Neural Networks
  • Convolutional layers
  • Network layers and optimizers
[ slides ]
Main readings:
  • Little book of deep learning (François Fleuret) - Ch 4


Project teams due

Thu 02/15/24 Lecture #10:
  • Building blocks of Neural Networks for NLP
  • Taks specific neural network architectures
  • RNNs
[ slides ]
Main readings:
  • Goldberg Chapter 9

2/16 HW 2 out

Tues 02/20/24 Lecture #11:
  • RNNs (contd.)
  • Machine translation
[ slides (annotated) ]
Main readings:
  • Understanding LSTM Networks (Christopher Olah) [link]
  • Eisenstein, Chapter 18
Optional readings:
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]

Thu 02/22/24 Lecture #12:
  • Machine translation (contd.)
  • Attention
  • Transformers
[ slides (annotated) ]
Main readings:
  • Statistical Machine Translation (Koehn) [link]
  • Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
  • Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
  • Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • Illustrated Transformer [link]

Tues 02/27/24 Lecture #13:
  • Transformers (cont'd.)
  • Language modeling with Transformers
[ slides (annotated) ]
Main readings:
  • Illustrated Transformer [link]
  • Attention is All You Need (Vaswani et al., 2017) [link]
  • The Annotated Transformer (Harvard NLP) [link]
  • GPT-2 (Radford et al., 2019) [link]

Thu 02/29/24 Lecture #14:
  • Pre-training and transfer learning
  • Objective functions for pre-training
  • Model architectures
  • ELMO, BERT, GPT, T5
[ slides ]
Main readings:
  • The Illustrated BERT, ELMo, and co. (Jay Alammar) [link]
  • BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link]
  • GPT-2 (Radford et al., 2019) [link]

3/1 HW 2 due

Tues 03/05/24 Midterm Exam

Thu 03/07/24 Lecture #15:
  • Transfer learning (contd.)
  • Encoder-decoder pretrained models
  • Architecture and pretraining objectives
[ slides ]
Main readings:
  • T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link]
  • BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link]
  • What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]

3/8 Project proposals due;
3/10 HW 3 out

03/08/24 - 03/24/24 Spring recess - No classes

Tues 03/26/24 Lecture #16:
  • Decoding and generation
  • Large language models and impact of scale
  • In-context learning and prompting
[ slides ]
Main readings:
  • The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link]
  • How to generate text- using different decoding methods for language generation with Transformers [link]
  • Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link]
  • Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link]
  • GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]

Thu 03/28/24 Lecture #17:
  • Retreival-Augmented Language Models
Guest lecturer:
Akari Asai, University of Washington
Photo of Akari Asai

Tues 04/02/24 Lecture #18:
  • In-context learning and prompting (cont'd)
  • Improving instruction following and few-shot learning
[ slides ]
Main readings:
  • Few-Shot Learning with Language Models (Brown et al., 2020) [link]
  • Finetuned Language Models Are Zero-Shot Learners (Wei et al., 2022) [link]
  • Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link]
  • Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link]
  • Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link]
  • Emergent Abilities of Large Language Models (Wei et al., 2022) [link]

4/2 HW3 due;

Thu 04/04/24 Lecture #19:
  • Reinforcement learning from Human Feedback
  • Alignment
[ slides ]
Main readings:
  • Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link]
  • Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link]
  • Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link]
  • RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]

HW 4 out 4/7

Tues 04/09/24 Lecture #20:
  • The Quest to build an (O)pen (L)anguage (Mo)del
Guest lecturer:
Luca Soldaini, Allen Institute for AI
Photo of Luca Soldaini
[ slides ]
Main readings:
  • OLMo- Accelerating the Science of Language Models (Groeneveld et al., 2024) [link]
  • Dolma- an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [link]

Thu 04/11/24 Lecture #21:
  • AI safety, security, and ethics
  • Privacy and societal implications
[ slides ]

Tues 04/16/24 Lecture #22:
  • Self-Alignment of Large Language Models
Guest lecturer:
Jason Weston, Meta AI
Photo of Jason Weston
Main readings:
  • Self-Alignment with Instruction Backtranslation (Li et al., 2023) [link]
  • Self-Rewarding Language Models (Yuan et al., 2024) [link]

Thu 04/18/24 Lecture #23:
  • Efficiency in large language models
  • Efficient inference and quantization
  • Efficient transformers
[ slides ]

Tues 04/23/24 Lecture #24:
  • Project presentations

Thu 04/25/24 Lecture #25:
  • Project presentations

Last meeting of the class;
4/28 HW 4 due
5/8 Final project due