Special Topics in Large Language Models
Tuesdays and Thursdays, 04:00 PM to 05:15 PM, Appleby Hall 3
Course Information
Course Description: This graduate level special topics course examines emerging frontiers in large language models and their expanding roles across cognitive science, human AI interaction, and the social sciences. Students will explore state of the art research in areas such as cognitive architectures, reasoning and planning, compositionality, social cognition, and test time scaling, as well as applications of language models in domains including law, medicine, journalism, and scientific discovery.
Each student or team will select a focused topic, conduct a comprehensive literature review, and lead a seminar style lecture and discussion. The course culminates in a semester long research or implementation project, presented as a final paper and in class presentation. Projects should emphasize novel failure modes, under explored behaviors, or emerging risks rather than incremental performance gains.
Instructor: Dongyeop Kang (DK)
Meeting time: Tuesday and Thursday, 04:00 PM to 05:15 PM
Location: Appleby Hall 3
(campus map)
Homepage:
dykang.github.io/classes/csci8980/S26/index.html
Canvas:
canvas.umn.edu/courses/553602
Slack:
csci8980s26.slack.com
Assessment
Student learning is assessed based on how effectively students identify, define, and solve research problems in their chosen topic.
- Research Project (50%) originality, rigor, and clarity in identifying and addressing a novel research question
- Topic Presentation (30%) analytical depth, synthesis of literature, and ability to communicate complex ideas effectively
- Participation and Discussion (20%) quality of contributions to peer learning and critical engagement with readings
Participation and Discussion
Participation is evaluated based on the quality of contributions to seminar discussion and critical engagement with readings. Lecture recordings support class participation.
- Come prepared to discuss the assigned readings
- Ask precise questions and challenge assumptions constructively
- Use Slack for clarifications and ongoing discussion
Course Schedule
| Date | Topic and Focus | Presenters | Reading for Presentation | Other Reading (Mandatory) |
| 01-20 | Class Overview | |||
| 01-22 | Turntable Research Discussion | |||
| 01-27 | Test Time Scaling and Self Evolving Agents |
|
||
| 01-29 |
|
|||
| 02-03 | Expert AI and Workflow Modeling |
|
|
|
| 02-05 |
|
|||
| 02-10 | DK Office Hour with Project Ideas | |||
| 02-12 | ||||
| 02-17 | Reasoning and Planning |
|
||
| 02-19 |
|
|||
| 02-24 | Data |
|
||
| 02-26 |
|
|||
| 03-03 | Evaluation |
|
||
| 03-05 |
|
|||
| 03-10 | Spring Break | |||
| 03-12 | Spring Break | |||
| 03-17 | Midterm Project Presentation | |||
| 03-19 | Midterm Project Presentation | |||
| 03-24 | No Class | |||
| 03-26 | No Class | |||
| 03-31 | Cognition of LLMs |
|
||
| 04-02 |
|
|||
| 04-07 | LLMs in the World: Society and Pluralism | |||
| 04-09 |
|
|||
| 04-14 | Human AI Collaboration |
|
||
| 04-16 |
|
|||
| 04-21 | Beyond Transformers | |||
| 04-23 |
|
|||
| 04-28 | Final Project Presentation | |||
| 04-30 | Final Project Presentation |
Reading
Topic Presentation (30m talk and 15m discussion)
Reading for Presentation is organized around topic blocks, with four papers assigned per block and typically two papers discussed in each class session. Each student is required to present a total of two to three papers over the course of the term. Students may indicate their topic and paper preferences using the provided interest form. Presenters should select papers across perspectives, with at least one paper from the human centered track and one from the machine centered track.
- In depth survey and lecture: Each presenting student prepares a structured synthesis of the topic and leads the class through a lecture style presentation and discussion.
- Required components: Presentations must include a curated reading list, summary of key technical ideas and takeaways, and a limitation or discussion effort when feasible.
- Lecture format: Sessions follow a seminar style format, with presenters responsible for moderating discussion and encouraging critical engagement from the class.
- Bonus Point: Fruitful discussions and insightful questions during presentations may be rewarded with bonus participation points at the instructor's discretion.
Class Projects
The course includes a semester long research or implementation project. Projects should go beyond surface level performance gains and instead focus on novel failure modes, under explored behaviors, or emerging risks of large language models or agentic systems. Incremental leaderboard gains or prompt only tweaks are discouraged. Projects may be completed individually or in small teams, maximum 2 members. Each project must be DK confirmed during DK Office Hour week (Feb 10, 12) and grounded in a clear research question, supported by relevant literature and empirical analysis.
Project Scope
Students are encouraged to produce one of the following research artifacts:
- A benchmark or evaluation suite
- A dataset capturing non trivial behaviors or failure modes
- A measurement or diagnostic framework
- A failure taxonomy or behavioral analysis
- A mitigation, repair, or control algorithm
Deliverables
Each team must submit a written report, reproducible code or data, and presentation materials via Canvas. Reports should follow a standard conference paper format, ACL style preferred, using templates via Overleaf or GitHub.
Milestones and Timeline
The following milestones are aligned with the course schedule. Canvas submission links will be provided.
- Team formation and project idea Due: early February
- Project proposal (problem statement, related work, plan) Due: mid February (Canvas link TBD)
- Midterm presentation In class: March 17 to 19
- Final presentation In class: April 28 to 30
- Final report and code submission Due: May 8 (Friday) (Canvas link TBD)
Projects are expected to be reproducible, clearly scoped, and analytically grounded. Strong projects typically combine careful problem formulation with diagnostic evaluation, stress testing, or controlled ablation studies. Evaluation emphasis: originality, rigor, and insight into model behavior under realistic, adversarial, or long horizon conditions rather than raw performance.
See evaluation rubric for final reports.
Selected Past Projects
Reference examples. Links will be added when available.
- Simulating Everyone's Voice: Exploring ChatGPT's Ability to Simulate Human Annotators (report and poster TBD)
- Vision and Language guided Generalized Object Grasping (report and poster TBD)
- Generating Controllable Long dialogue with Coherence (Published in AAAI 2024, link TBD)
- Understanding Narrative Transportation in Fantasy Fanfiction (ACL Workshop on Narrative Understanding, link TBD)