All homeworks should be submitted through Gradscope.

Homework 4

The goal of this homework is to understand how to fine-tune language models with preference feedback using the Direct Preference Optimization (DPO) framework. You will also learn how to implement a training pipeline for language models.

Part 1: Concepts component - Handout

Download the homework handout from the following link: [Download].

For this part, you need to complete the homework in LaTeX and return the pdf solution. Further instructions are provided in the pdf.

Part 2: Hands-on excercise - Fine-tuning language models with preference feedback

In this part, you will implement a training pipeline for language models and fine-tune them with preference feedback using the Direct Preference Optimization (DPO) framework. You will also evaluate the performance of the fine-tuned models on a text classification task. The detailed instructions are provided in the Colab notebook.

Access the Colab Notebook Here: Colab Notebook

Homework 3

The goal of this homework is to get familiar with implementation of language models and transformers.
Additionally, you will explore transfer learning and see how you can apply a pre-trained and fine-tuned language model to the text summarization task..

Due date: 4/2/24

Part 1: Implementing a transformer language modeling pipeline from scratch

You will implement a transformer language model from scratch. This involves implementing the input pipeline as well as the model itself.
All necessary instructions, including step-by-step guidance and required data for all parts, are provided in the following single Colab notebook.
Access the Colab Notebook-1 Here

Part 2: Text summarization with pre-trained language models and decoding methods

In this part, you will explore transfer learning and see how you can apply a pre-trained and fine-tuned language model to the text summarization task.
You will also implement decoding methods such as greedy decoding and beam search decoding and see their impact on the quality of the generated summaries.
All necessary instructions, including step-by-step guidance and required data for all parts, are provided in the following single Colab notebook. Access the Colab Notebook-2 Here

Homework 2

The goal of this homework is to get you familiar with language modeling evaluation, sequence labeling and part-of-speech tagging.
You will implement evaluation metrics such as perplexity, classic HMM-based sequence tagging methods and the viterbi algorithm.
You will train a Neural BiLSTM tagger and compare it with the HMM-based tagger that you implemented.

Part 1: Language model evaluation

You will implement the perplexity evaluation metric and investigate a few simple bigram language model performances. You will also investigate the data and corpus perplexity scores.

Part 2: Sequence Labeling

In this part you will implement the classic HMM-based sequence tagging methods and the viterbi algorithm.

Part 3: Neural POS Tagging

You will train a Neural BiLSTM tagger and compare it with the HMM-based tagger that you implemented.

The detailed instructions are provided in the Colab notebook.

Getting started

All necessary instructions, including step-by-step guidance and required data for all parts, are provided in the following single Colab notebook.

Access the Colab Notebook Here

Homework 1

Part 1: Concepts component - Handout (25 points)

Download the homework handout from the following link: [Download].

For this part, you need to complete the homework in LaTeX and return the pdf solution. Further instructions are provided in the pdf.

Part 2: Hands-on excercise - Implementing a Naive Bayes classifier (25 points)

The second part is implementation of a Naive Bayes Bag-of-Words classifier. A Colab notebook is provided to guide you through the process of implementing the model from scratch and training it on a toy data sample. You will be completing the parts indicated with # TODO: Implement and submit the completed notebook. Please see the instructions and the notebook here: Colab-1.

Part 3: Hands-on excercise - Implementing the Word2Vec model (50 points)

The third part is implementation of a Word2Vec SkipGram model. A Colab notebook is provided to guide you through the process of implementing the model from scratch and training it on a toy data sample. You will be completing the parts indicated with # TODO: Implement and submit the completed notebook. Please see the instructions and the notebook here: Colab-2.