Natural Language Processing (CSCI 5541)
Tuesdays & Thursdays, 11:15 AM - 12:30 PM Akerman Hall 319
Course Information
Summary: This course provides an overview of computational techniques enabling machines to interpret and respond to human language. Topics include text classification, distributional semantics, language models, and generative tools like ChatGPT. Prerequisites include knowledge of linear algebra, calculus, probability, and Python. Completion of 5521 or graduate standing is recommended.
NLP is an interdisciplinary field grounded in linguistics, cognitive science, and social sciences. It focuses on building computational models for applications such as machine translation and conversational systems. Topics also address ethical AI design, such as fairness, interpretability, and bias mitigation. Students will read scholarly papers, build datasets, develop NLP models, and reflect on ethical implications in a semester-long project.
Grades are based on homework, participation, and a course project. All materials are posted on the class site. Canvas is used for submissions and grades, and Slack is used for Q&A and discussions. Email responses are not guaranteed.
Class Topics Overview
- Sep 2–16: Class Overview, Intro to NLP, Text Classification
- Sep 18–25: Word Embeddings & Language Models (Models, Search, Evaluation)
- Sep 30–Oct 9: Project Guidelines & Proposal Pitch
- Oct 14–23: Transformers, Pretraining, Prompting, LLMs
- Oct 28–Nov 6: Agents, Efficiency, Alignment, Reasoning
- Nov 11–20: Interpretability, Evaluation, Data, Human-AI Collaboration
- Nov 25–Dec 4: Human-centric NLP, Final Project Poster Presentations
Note: Full lecture slides, readings, and deadlines will be posted on the Schedule tab.
Instructors

Instructor

Graduate TA

Graduate TA

Undergraduate TA

Undergraduate TA
- Class meets
- Tuesday and Thursday, 11:15AM to 12:30PM, Akerman Hall 319
- Office hours
- DK: mostly over Slack (in-person by appointment)
- Shirley: Wednesday 11:30-12 via Google Meet
- Shuyu: Thursday 4:30-5:00 at Shepherd 443 or via Zoom
- Drew: Tuesday 1-2 at Keller 1-213 Table #1 or via Zoom
- Xiaxuan: Monday 1:30-2 via zoom
- Class page
- dykang.github.io/classes/csci5541/F25
- Slack
- csci5541f25.slack.com/
- Canvas
- canvas.umn.edu/courses/518535
Information
Grading and Late Policy
Grading
- 60% Homework (hw1-4 for individual, hw5/6 for team)
- 30% Project (team)
- 10% Class Participation (individual)
Late policy for deliverables
Each student will be granted 5 late days to use for homeworks over the duration of the semester. After all free late days are used up, penalty is 1 point for each additional late day. The late days and penalty will be applied to all team members for group homework and project.Schedule
We will cover basic NLP representations g(x), to build text classifiers P_theta(y|g(x)) , language models P_theta(g(x)), and large language models P_{theta is large}(g(x)). Based on knowledge you gain during the class, your team will develop your own NLP systems during the semester-long project. Pay attention to due dates and homework release. Lecture slides and homework/project description will be available in .
Homework Details (60%)
All questions regarding homework MUST be communicated with the lead TA via Slack homework channels (e.g., #hw1, #hw2) or during their office hours. Homework 1, 2, 3, and 4 must be completed individually, while Homework 5 and 6 are team-based (maximum of 4 people). The same team must be used for both Homework 4/5 and the course project.
The use of external resources (books, research papers, websites, etc.) or collaboration (students, professors, ChatGPT, etc.) must be clearly acknowledged in your report. See the notes for academic integrity guidelines.
All homework is due by midnight (11:59 PM) on the due date. There are no extensions due to a tight schedule, but late days can still be used. For team assignments, late days will be deducted from each team member. Refer to the homework description and Canvas link for submission:
Homework Assignments & Deadlines
- HW1: Building MLP-based text classifier with PyTorch (10 points, Individual, due: Sep 11 Thursday) (, )
- HW2: Fine-tuning text classifier using HuggingFace (10 points, Individual, due: Sep 21 Sunday) (, )
- HW3: Authorship attribution using language models (LMs) (10 points, Individual, due: Oct 5 Sunday) (, )
- HW4: Generating and evaluating text from pretrained LMs (10 points, Individual, due: Oct 19 Sunday) (, )
- HW5: Prompting with large language models (LLMs) (10 points, Team, due: Nov 2 Sunday) (, )
- HW6: Post training LLMs for alignment (10 points, Team, due: Nov 16 Sunday) (, )
Project Details (30%)
Please carefully read the project description (To be updated) , as it contains essential details about expectations, deadlines, grading rubric, and FAQs. It is your responsibility to ensure you do not miss any of the information provided.
Each team (maximum 4 members) must submit their report, a link to the code (or a zipped archive), and presentation slides/poster on Canvas before the final deadline. Use the official ACL style templates via Overleaf or GitHub.
Below are the required deliverables and deadlines (some fall on weekdays):
- Team formation (1 point, Due: Sep 18)
- Project brainstorming (1 point, Due: Sep 25)
- Proposal pitch (3 points, Due: Oct 7 and 9) – Slide decks: Group A | Group B
- Proposal report (5 points, Due: Oct 14)
- Midterm office hour participation (5 points, Due: Nov 14)
- Poster presentation (5 points, Due: Dec 2 and 4)
- Final report (10 points, Due: Dec 12) | Evaluation rubric
Below are selected reports and posters from previous semesters. Some were extended into publications at top-tier venues:
- [CSCI 5541 S23] Simulating Everyone's Voice: Exploring ChatGPT's Ability to Simulate Human Annotators
- [CSCI 5541 S23] Vision & Language-guided Generalized Object Grasping
- [CSCI 5541 S23] Generalizability of FLAN-T5 Model Using Composite Task Prompting
- [CSCI 5541 S23] Comparing the Effectiveness of Fine-tuning vs. One-Shot Learning on the Kidz Bopification Task
- [CSCI 5980 F22] Generating Controllable Long-dialogue with Coherence → Published in AAAI 2024
- [CSCI 8980 S22] Understanding Narrative Transportation in Fantasy Fanfiction → Published in Workshop on Narrative Understanding (WNU) @ACL 2023
Class Participation (10%)
Your class participation is carefully evaluated. Please upload a profile picture on Canvas and Slack so we can identify you during final evaluations.
The following criteria will be used to assess participation:
- Active involvement and discussion during class
- Engagement on Slack and during office hours with both the instructor and TAs
- Participation and Q&A during the project proposal and poster presentations
We explicitly track both offline and online participation and apply min-max normalization at the end of the course.
If you do not participate in class, Slack, or related discussions, your participation score will be zero.
Prerequisites
Required: CSCI 2041 – Advanced Programming Principles
Recommended: CSCI 5521 – Introduction to Machine Learning, or any course that covers core machine learning algorithms.
This course also assumes the following background:
- Strong programming skills, comparable to a third- or fourth-year undergraduate CS major. All assignments will be in Python.
- Familiarity with basic probability, linear algebra, and calculus.
Notes to Students
Academic Integrity
Unless group work is explicitly allowed, all assignments and project reports must reflect your individual effort. Verbal collaboration is acceptable, but anything you submit must be your own work. Please list the names of collaborators and cite any resources used. If you're uncertain whether something counts as cheating, consult the instructor beforehand. Academic dishonesty will result in a failing grade and will be handled according to University policies.
Students with Disabilities
If you have or suspect you have a disability and require accommodations, please contact your instructor and the Disability Resources Center (DRC).
COVID-19
All students are expected to follow University COVID-19 guidelines, including any masking or vaccination policies. This course is in-person and participation is expected. However, hybrid or online accommodations may be made if necessary. If you're feeling unwell, please stay home and catch up on the material remotely.