CPSC 477/577 | Schedule

Date	Lecture	Readings	Logistics
Tues 01/16/24	Lecture #1: Course Introduction Logistics [ slides ]	Main readings: Jurafsky & Martin Chapter 1
Thu 01/18/24	Lecture #2: Text classification Generative Models vs Discriminative Models [ slides ]	Main readings: Jurafsky & Martin Chapter 4 & 5
Tues 01/23/24	Lecture #3: Text classification (cont.) Evaluation N-Gram Language Models [ slides ]	Main readings: Jurafsky & Martin Chapter 4 Jurafsky & Martin Chapter 3 Optional readings: The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing (Dror et al., 2018) [link](https://aclanthology.org/P18-1128.pdf)
Thu 01/25/24	Lecture #4: Word embeddings and vector semantics [ slides ]	Main readings: Jurafsky & Martin Chapter 6	HW1 out 1/29
Tues 01/30/24	Lecture #5: Word embeddings and vector semantics (cont.) [ slides ]	Main readings: Jurafsky & Martin Chapter 6 Optional readings: Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link] Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link] Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]
Thu 02/01/24	Lecture #6: Information theory Sequence labeling [ slides ]	Main readings: Eisenstein Chapter 7
Fri 02/02/24 2:00-3:00pm	Recitation #1: Numpy and Pytorch Tutorial Session
Tues 02/06/24	Lecture #7: Sequential classification Conditional Random Fields [ slides ]	Main readings: Eisenstein 7.1-7.3 Micheal Collins' notes on CRFs [link] Optional readings: Lafferty et al. (2001) Conditional Random Fields- Probabilistic Models for Segmenting and Labeling Sequence Data [link] Sutton and McCallum (2010) An Introduction to Conditional Random Fields [link]
Thu 02/08/24	Lecture #8: Basics of Neural Networks and Language Model Training [ slides (annotated) ]	Main readings: The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link] Little book of deep learning (François Fleuret) - Ch 3	2/11 HW 1 due
Tues 02/13/24	Lecture #9: Autograd Building blocks of Neural Networks Convolutional layers Network layers and optimizers [ slides ]	Main readings: Little book of deep learning (François Fleuret) - Ch 4	Project teams due
Thu 02/15/24	Lecture #10: Building blocks of Neural Networks for NLP Taks specific neural network architectures RNNs [ slides ]	Main readings: Goldberg Chapter 9	2/16 HW 2 out
Tues 02/20/24	Lecture #11: RNNs (contd.) Machine translation [ slides (annotated) ]	Main readings: Understanding LSTM Networks (Christopher Olah) [link] Eisenstein, Chapter 18 Optional readings: Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Thu 02/22/24	Lecture #12: Machine translation (contd.) Attention Transformers [ slides (annotated) ]	Main readings: Statistical Machine Translation (Koehn) [link] Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link] Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link] Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link] Attention is All You Need (Vaswani et al., 2017) [link] Illustrated Transformer [link]
Tues 02/27/24	Lecture #13: Transformers (cont'd.) Language modeling with Transformers [ slides (annotated) ]	Main readings: Illustrated Transformer [link] Attention is All You Need (Vaswani et al., 2017) [link] The Annotated Transformer (Harvard NLP) [link] GPT-2 (Radford et al., 2019) [link]
Thu 02/29/24	Lecture #14: Pre-training and transfer learning Objective functions for pre-training Model architectures ELMO, BERT, GPT, T5 [ slides ]	Main readings: The Illustrated BERT, ELMo, and co. (Jay Alammar) [link] BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link] GPT-2 (Radford et al., 2019) [link]	3/1 HW 2 due
Tues 03/05/24	Midterm Exam
Thu 03/07/24	Lecture #15: Transfer learning (contd.) Encoder-decoder pretrained models Architecture and pretraining objectives [ slides ]	Main readings: T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link] BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link] What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]	3/8 Project proposals due; 3/10 HW 3 out
03/08/24 - 03/24/24	Spring recess - No classes
Tues 03/26/24	Lecture #16: Decoding and generation Large language models and impact of scale In-context learning and prompting [ slides ]	Main readings: The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link] How to generate text- using different decoding methods for language generation with Transformers [link] Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link] Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link] GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]
Thu 03/28/24	Lecture #17: Retreival-Augmented Language Models Guest lecturer: Akari Asai, University of Washington
Tues 04/02/24	Lecture #18: In-context learning and prompting (cont'd) Improving instruction following and few-shot learning [ slides ]	Main readings: Few-Shot Learning with Language Models (Brown et al., 2020) [link] Finetuned Language Models Are Zero-Shot Learners (Wei et al., 2022) [link] Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link] Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link] Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link] Emergent Abilities of Large Language Models (Wei et al., 2022) [link]	4/2 HW3 due;
Thu 04/04/24	Lecture #19: Reinforcement learning from Human Feedback Alignment [ slides ]	Main readings: Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link] Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link] Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link] RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]	HW 4 out 4/7
Tues 04/09/24	Lecture #20: The Quest to build an (O)pen (L)anguage (Mo)del Guest lecturer: Luca Soldaini, Allen Institute for AI [ slides ]	Main readings: OLMo- Accelerating the Science of Language Models (Groeneveld et al., 2024) [link] Dolma- an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [link]
Thu 04/11/24	Lecture #21: AI safety, security, and ethics Privacy and societal implications [ slides ]
Tues 04/16/24	Lecture #22: Self-Alignment of Large Language Models Guest lecturer: Jason Weston, Meta AI	Main readings: Self-Alignment with Instruction Backtranslation (Li et al., 2023) [link] Self-Rewarding Language Models (Yuan et al., 2024) [link]
Thu 04/18/24	Lecture #23: Efficiency in large language models Efficient inference and quantization Efficient transformers [ slides ]
Tues 04/23/24	Lecture #24: Project presentations
Thu 04/25/24	Lecture #25: Project presentations		Last meeting of the class; 4/28 HW 4 due 5/8 Final project due

Tues 01/16/24

Lecture #1:

Course Introduction
Logistics

[ slides ]

Main readings:

Jurafsky & Martin Chapter 1

Thu 01/18/24

Lecture #2:

Text classification
Generative Models vs Discriminative Models

[ slides ]

Main readings:

Jurafsky & Martin Chapter 4 & 5

Tues 01/23/24

Lecture #3:

Text classification (cont.)
Evaluation
N-Gram Language Models

[ slides ]

Main readings:

Jurafsky & Martin Chapter 4
Jurafsky & Martin Chapter 3

Optional readings:

The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing (Dror et al., 2018) [link](https://aclanthology.org/P18-1128.pdf)

Thu 01/25/24

Lecture #4:

Word embeddings and vector semantics

[ slides ]

Main readings:

Jurafsky & Martin Chapter 6

HW1 out 1/29

Tues 01/30/24

Lecture #5:

Word embeddings and vector semantics (cont.)

[ slides ]

Main readings:

Jurafsky & Martin Chapter 6

Optional readings:

Distributed Representation of Words and Phrases and their Compositionality (Mikolov et al., 2013) [link]
Efficient Estimation of Word Representations in Vector Space (Mikolov et al., 2013) [link]
Word2vec Explained- deriving Mikolov et al.'s negative-sampling word-embedding method (Goldberg and Levy, 2014) [link]

Thu 02/01/24

Lecture #6:

Information theory
Sequence labeling

[ slides ]

Main readings:

Eisenstein Chapter 7

Fri 02/02/24
2:00-3:00pm

Recitation #1:
Numpy and Pytorch Tutorial Session

Tues 02/06/24

Lecture #7:

Sequential classification
Conditional Random Fields

[ slides ]

Main readings:

Eisenstein 7.1-7.3
Micheal Collins' notes on CRFs [link]

Optional readings:

Lafferty et al. (2001) Conditional Random Fields- Probabilistic Models for Segmenting and Labeling Sequence Data [link]
Sutton and McCallum (2010) An Introduction to Conditional Random Fields [link]

Thu 02/08/24

Lecture #8:

Basics of Neural Networks and Language Model Training

[ slides (annotated) ]

Main readings:

The Matrix Calculus You Need For Deep Learning (Terence Parr and Jeremy Howard) [link]
Little book of deep learning (François Fleuret) - Ch 3

2/11 HW 1 due

Tues 02/13/24

Lecture #9:

Autograd
Building blocks of Neural Networks
Convolutional layers
Network layers and optimizers

[ slides ]

Main readings:

Little book of deep learning (François Fleuret) - Ch 4

Project teams due

Thu 02/15/24

Lecture #10:

Building blocks of Neural Networks for NLP
Taks specific neural network architectures
RNNs

[ slides ]

Main readings:

Goldberg Chapter 9

2/16 HW 2 out

Tues 02/20/24

Lecture #11:

RNNs (contd.)
Machine translation

[ slides (annotated) ]

Main readings:

Understanding LSTM Networks (Christopher Olah) [link]
Eisenstein, Chapter 18

Optional readings:

Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]

Thu 02/22/24

Lecture #12:

Machine translation (contd.)
Attention
Transformers

[ slides (annotated) ]

Main readings:

Statistical Machine Translation (Koehn) [link]
Neural Machine Translation and Sequence-to-sequence Models- A Tutorial (Graham Neubig) [link]
Learning to Align and Translate with Attention (Bahdanau et al., 2015) [link]
Luong et al. (2015) Effective Approaches to Attention-based Neural Machine Translation [link]
Attention is All You Need (Vaswani et al., 2017) [link]
Illustrated Transformer [link]

Tues 02/27/24

Lecture #13:

Transformers (cont'd.)
Language modeling with Transformers

[ slides (annotated) ]

Main readings:

Illustrated Transformer [link]
Attention is All You Need (Vaswani et al., 2017) [link]
The Annotated Transformer (Harvard NLP) [link]
GPT-2 (Radford et al., 2019) [link]

Thu 02/29/24

Lecture #14:

Pre-training and transfer learning
Objective functions for pre-training
Model architectures
ELMO, BERT, GPT, T5

[ slides ]

Main readings:

The Illustrated BERT, ELMo, and co. (Jay Alammar) [link]
BERT- Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018) [link]
GPT-2 (Radford et al., 2019) [link]

3/1 HW 2 due

Tues 03/05/24

Midterm Exam

Thu 03/07/24

Lecture #15:

Transfer learning (contd.)
Encoder-decoder pretrained models
Architecture and pretraining objectives

[ slides ]

Main readings:

T5- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (Raffel et al., 2020) [link]
BART- Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension (Lewis et al., 2019) [link]
What Language Model Architecture and Pretraining Objective Work Best for Zero-Shot Generalization? (Wang et al, 2022) [link]

3/8 Project proposals due;
3/10 HW 3 out

03/08/24 - 03/24/24

Spring recess - No classes

Tues 03/26/24

Lecture #16:

Decoding and generation
Large language models and impact of scale
In-context learning and prompting

[ slides ]

Main readings:

The Curious Case of Neural Text Degeneration (Holtzman et al., 2019) [link]
How to generate text- using different decoding methods for language generation with Transformers [link]
Scaling Laws for Neural Language Models (Kaplan et al., 2020) [link]
Training Compute-Optimal Large Language Models (Hoffmann et al., 2022) [link]
GPT3 paper - Language Models are Few-Shot Learners (Brown et al., 2020) [link]

Thu 03/28/24

Lecture #17:

Retreival-Augmented Language Models

Guest lecturer:
Akari Asai, University of Washington

Tues 04/02/24

Lecture #18:

In-context learning and prompting (cont'd)
Improving instruction following and few-shot learning

[ slides ]

Main readings:

Few-Shot Learning with Language Models (Brown et al., 2020) [link]
Finetuned Language Models Are Zero-Shot Learners (Wei et al., 2022) [link]
Multitask Prompted Training Enables Zero-Shot Task Generalization (Sanh et al., 2021) [link]
Scaling Instruction-Finetuned Language Models (Chung et al., 2022) [link]
Are Emergent Abilities of Large Language Models a Mirage? (Sha et al., 2023) [link]
Emergent Abilities of Large Language Models (Wei et al., 2022) [link]

4/2 HW3 due;

Thu 04/04/24

Lecture #19:

Reinforcement learning from Human Feedback
Alignment

[ slides ]

Main readings:

Training language models to follow instructions with human feedback (Ouyang et al., 2022) [link]
Fine-Tuning Language Models from Human Preferences (Ziegler et al., 2019) [link]
Direct Preference Optimization- Your Language Model is Secretly a Reward Model (Rafailov et al., 2023) [link]
RLAIF- Scaling Reinforcement Learning from Human Feedback with AI Feedback (Lee et al., 2023) [link]

HW 4 out 4/7

Tues 04/09/24

Lecture #20:

The Quest to build an (O)pen (L)anguage (Mo)del

Guest lecturer:
Luca Soldaini, Allen Institute for AI

[ slides ]

Main readings:

OLMo- Accelerating the Science of Language Models (Groeneveld et al., 2024) [link]
Dolma- an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research [link]

Thu 04/11/24

Lecture #21:

AI safety, security, and ethics
Privacy and societal implications

[ slides ]

Tues 04/16/24

Lecture #22:

Self-Alignment of Large Language Models

Guest lecturer:
Jason Weston, Meta AI

Main readings:

Self-Alignment with Instruction Backtranslation (Li et al., 2023) [link]
Self-Rewarding Language Models (Yuan et al., 2024) [link]

Thu 04/18/24

Lecture #23:

Efficiency in large language models
Efficient inference and quantization
Efficient transformers

[ slides ]

Tues 04/23/24

Lecture #24:

Project presentations

Thu 04/25/24

Lecture #25:

Project presentations

Last meeting of the class;
4/28 HW 4 due
5/8 Final project due