Introduction

This course provides a deep dive into Natural Language Processing (NLP), a pivotal and dynamic subfield of Artificial Intelligence (AI) that focuses on the interaction between computers and human language.

The course begins by exploring the fundamental principles of NLP, providing a solid grounding in how natural language is processed and understood by machines. Students will first explore the traditional methods of NLP, and study the classic NLP tasks as well as understanding their historical significance and foundational role. These methods, based on statistical and machine learning approaches, lay the groundwork for understanding how machines interpret language.

Transitioning to modern NLP, the course delves into the revolutionary impact of deep learning and neural networks. Here, students will learn about representation learning methods, including word representations and sentence representations. Then the course dives into the foundations of language modeling and self-supervised learning in NLP. Specifically, we will discuss sequence-to-sequence models, transformers, and transfer learning, including models like GPT (Generative Pre-trained Transformer), BERT (Bidirectional Encoder Representations from Transformers), and T5. These models have transformed the landscape of NLP by enabling more general language understanding and generation capabilities. We then transition into contemporary topics in NLP including LLMs, parameter-efficient fine-tuning, efficiency, and incorporating other modalities.

Through a blend of lectures, hands-on projects and assignments, and case studies, students will gain practical experience in both traditional and modern NLP techniques. The goal of the course is to introduce the students to the field and provide them with a comprehensive overview of fundamentals that helped shaped today’s advanced AI models.

Learning Resources

Textbook

  • Dan Jurafsky and James H. Martin. Speech and Language Processing (2024 pre-release)
  • Yoav Goldberg. A Primer on Neural Network Models for Natural Language Processing
  • Jacob Eisenstein. Natural Language Processing

We will also using papers from major conferences in the field including ACL, EMNLP, NAACL, ICLR, NeurIPS, etc.

Anonymous feedback

If you wish to share comments, questions, or feedback anonymously please use this form: Anonymous Form.
I will check this regularly and respond to questions/comments.

Communication

We use Canvas and email for main announcements. For questions about the course, discussions about material, and faciliatating discussions for projects between students, we will mainly use Ed Discussion.


Grading

Final grades will be comprised of:

  • 32%: Assignments, which includes both written and coding problem sets
  • 20%: Midterm, in person, closed book
  • 8%: Participation and quizzes
  • 40%: Final projects, including a project proposal (5%), project final presentation (15%), project final report (15%), code and reproducibility checklist (5%)
  • Grading for graduate students: Graduate students will need to incorporate a novelty element and a more in-depth literature review in their final projects

AI Assistant policies

Using assistance from AIs such as ChatGPT to complete your homeworks, quizzes, projects, and exam is not allowed except for the following circumstances:

  • The assignment explicitly asks for it
  • AI Assistant is used to improve writing or check grammar. If you take advantage of any sort of AI assistance for an assignment, you should explicitly mention how you used AI when submitting the assignment.

Employing AI tools to complete assignments in cases other than above or without permission of instructor will be considered a violation of the Honor Code.

Co-pilot and coding assistants are also NOT allowed.

Late submissions

You can still submit your assignment after the deadlines for up to 3 days. You will, however, receive partial credit for late submissions. Every late day will result in 10% deduction in full credit for that assignment

Note: Late days can only be used on the assignments, and not on the project proposal or the final report and the presentation.

Grading for graduate students

Grading components for graduate students will be the same as undergraduate students. The only difference is the following:

For class projects we expect graduate students to work on a research problem (The project should propose either a novel research, a novel investigation of existing methods, an extension of prior work for a specific purpose, or a new application.). Graduate student projects are also expected to have a more thorough literature review component in their final project report.

Class project (40%)

Students must complete a final research project on a topic of their choice related to the class. The students should team up with other students and the team size is limited to 2 to 3 students. If you don’t choose a team you will be randomly assigned a team mate. Invidiual projects are allowed only in exceptional cases and by providing reasonable justification.

  • 5%: proposal
    • Students should submit a 1-2 page proposal for their project. The proposal should state and motivate the problem, and position the proposed project within related work. The project proposal should also include a brief description of the approach as well as the experimental plan (e.g., baselines, datasets, etc) to validate the effectiveness of the approach. Here are some ideas on types of projects.:
    • For undergraduate students the project could be reimplementation of an exsiting method, a new user-facing application that uses NLP models for a new problem, a comprehensive survey into a subtopic of interest, deeper investigation of a paper and providing further insights by conducting additional experiments, or novel reseach.
    • For graduate students the project should include a component of novelty. E.g., it could propose a novel research, a novel investigation of existing methods, an extension of prior work for a specific purpose, or a new application.
  • 15%: Final project report
    • 4-6 (no more than 6 pages) page conference format report (e.g., NeurIPS) detailing the project motivation, related work, proposed approach, results, and discussion. You can think of this as a conference paper. Negative results will not be penalized, but should be accompanied with detailed analysis of why the proposed methods didn’t work and provide some additional insights into the problem. 
    • References and appendix won’t count towards the page limit
  • 15%: Final project presentation
    • 5 minute in person in-class presentations
  • 5%: Code and reproducibility checklist
    • Your project code should be clean, readable, with clear running instructions, and the results should be fully reproducible. We will provide a reproducibility checklist that should be returned.

Integrity

Academic integrity requires that students at Yale acknowledge all of the sources that inform their coursework. Most commonly, this means (a) citing the sources of any text or data that you include in papers and projects, and (b) only collaborating with other students or using AI composition software in ways that are explicitly endorsed by the assignment. Yale’s dedication to academic integrity flows from our two primary commitments: supporting research and educating students to contribute to ongoing scholarship. A safe and ethical climate for research demands that previous authors and artists receive credit for their work. And learning requires that you do your own work. Conventions for acknowledging sources vary across disciplines, and instructors should instruct you in the forms they expect; they should also delineate which forms of collaboration among students are permitted. But ultimately it is the student’s responsibility to act with integrity, and the burden is on you to ask questions if anything about course policies is unclear.

Diversity statement

We embrace and celebrate diversity, understanding that the richest learning experiences come from the exchange of ideas among individuals from varied backgrounds, cultures, and perspectives. We uphold a commitment to mutual respect and open-mindedness, encouraging each participant to both share their unique insights and actively listen to others. Recognizing that learning is a collaborative and evolving process, we foster an inclusive environment where constructive criticism is welcomed, mistakes are embraced as opportunities for growth, and every student is both a teacher and a learner. Our goal is to cultivate a dynamic, respectful, and inclusive classroom environment.

FAQs

  • What if I am unable to attend the class?
    A: If a student frequently misses classes or can’t attend a significant part of the course we recommend dropping the course.

  • How to audit the course?
    A: We welcome Yale students to audit the course! If you want to actually master the material of the class, we very strongly recommend that auditors do all the assignments. However due to high demand we cannot grade assignments. Also please do not use course resources such as TA or instructor time.

  • Will the lectures be recorded?
    A: This course is in-person, and regular attendance is mandatory. Some lectures might be recorded for future use, but regular, high-quality video recordings aren’t assured. Students should plan assuming there won’t be recordings. Those who can’t attend in person should NOT take the course.