CSCI 8980-06 Intro to NLP

Spring 2022, Tuesday and Thursday, 4:00pm to 5:15pm, Keller Hall 2-260

Course Information

Natural Language Processing (NLP) is an interdisciplinary field that is based on theories in linguistics and cognitive/social science. The main focus of NLP is building computational models for applications such as machine translation and dialogue systems that can then interact with real users. Research and development in NLP therefore also includes considering important issues related to real-world AI, such as bias, ethics, controllability, and interpretability. This course will cover a broad range of topics related to NLP, from theories to computational models to data annotation and evaluation, leading to in-depth discussions with students. Students will read papers on those topics, create linguistically annotated data, and implement algorithms on applications they are interested in. Note that I will teach "NLP with Deep Learning" in Fall 2022 for those who are interested in computational aspects of NLP.

There will be a semester-long class project where you collect your own dataset, ensure it is accurate, develop a model using existing computing tools, evaluate the system, and consider its ethical and societal impacts. Every class, I will give a 30-minute lecture and students lead a discussion on the reading assignment for the rest. The grade will be evaluated based on the course project, participation, and assignments.

All class material will be posted on Canvas and on the class page. We will use Canvas for homework submissions and grading, Slack for discussion and QA. Please use Slack channels rather than personal emails or messages to ask questions. This helps other students, who may have the same question. Personal emails may not be answered. If you cannot make it to office hours, please use Slack to make an appointment.

Instructor: Dongyeop Kang (a.k.a DK)
Class meets: Tuesday and Thursday, 4:00pm to 5:15pm in Keller Hall 2-260
Office hours: Friday, 3:00pm to 4:30pm in Shepherd 259
Class page: dykang.github.io/classes/csci8980/Spring2021/
Slack: csci8980-06-s22.slack.com
Canvas: canvas.umn.edu/courses/302319

Schedule

We will cover basic models and represetnations, applications, and advanced topics.
Pleaes pay attention to due dates and project presentations. You can use DK's office hours for project discussion. 🍬 is an optional reading.

Date	Topic	Readings (schedule)
W1: Jan 18	Class Overview [slides] HW1 out (Paper Presentation)
W1: Jan 20	Text Classification [slides]	Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C. D., Ng, A. Y., & Potts, C. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank In Proceedings of the 2013 conference on empirical methods in natural language processing (pp. 1631-1642). Zhang, J., Chang, J. P., Danescu-Niculescu-Mizil, C., Dixon, L., Hua, Y., Thain, N., & Taraborelli, D. (2018).Conversations Gone Awry: Detecting Early Signs of Conversational Failure 🍬Text classifier with NLTK and Scikit-Learn
W2: Jan 25	Topic Modeling [slides] HW2 out (Paper Replication)	🍬Blei, D. M. (2012) Probabilistic topic models Communications of the ACM, 55(4), 77-84. 🍬K-Means Clustering with scikit-learn
W2: Jan 27	Language Models [slides] Project consultation (Office Hour)	Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Amodei, D. (2020). Language Models are Few-Shot Learners, NeurIPS 2020 Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610-623). Bommasani, Rishi, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein et al. On the Opportunities and Risks of Foundation Models Liu, P., Yuan, W., Fu, J., Jiang, Z., Hayashi, H., & Neubig, G. (2021).Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
W3: Feb 1	Lexical Semantics [slides] Project Description out [slides]	Ruppenhofer, J., Ellsworth, M., Schwarzer-Petruck, M., Johnson, C. R., & Scheffczyk, J. (2016).FrameNet II: Extended Theory and Practice International Computer Science Institute, and FrameNet Project Hovy, E., Marcus, M., Palmer, M., Ramshaw, L., & Weischedel, R. (2006, June) OntoNotes: The 90% Solution In Proceedings of the human language technology conference of the NAACL, Companion Volume: Short Papers (pp. 57-60). 🍬Word Senses and WordNet
W3: Feb 3	Distributional Semantics [slides] Project consultation (Office Hour)	Turney, P. D., & Pantel, P. (2010). From Frequency to Meaning: Vector Space Models of Semantics Journal of artificial intelligence research, 37, 141-188. Faruqui, M., Dodge, J., Jauhar, S. K., Dyer, C., Hovy, E., & Smith, N. A. (2014) Retrofitting Word Vectors to Semantic Lexicons 🍬Gensim's word2vec tutorial
W4: Feb 8	Contextualized Word Embeddings [slides]	Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018).BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Rogers, A., Kovaleva, O., & Rumshisky, A. (2020). A Primer in BERTology: What We Know About How BERT Works Transactions of the Association for Computational Linguistics, 8, 842-866. 🍬Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2019).Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 🍬Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTa: A Robustly Optimized BERT Pretraining Approach 🍬Smith, N. A. (2019). Contextual Word Representations: A Contextual Introduction 🍬 Fine-tuning tutorial on HuggingFace
W4: Feb 10	Discourse [slides] HW2 due, Feb 10 11:59pm Project consultation (Office Hour)	A comparative study on various discourse frameworks (DISRPT 2021): Rhetorical Structure Theory (RST), Penn Discourse Treebank (PDTB), and Segmented Discourse Representation Theory (SDRT), Piper, A., So, R. J., & Bamman, D. (2021) Narrative Theory for Computational Narrative Understanding In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (pp. 298-311).
W5: Feb 15	Machine Translation [slides]	Zhang, W., Feng, Y., Meng, F., You, D., & Liu, Q. (2019).Bridging the Gap between Training and Inference for Neural Machine Translation Lample, G., Conneau, A., Denoyer, L., & Ranzato, M. A. (2017). Unsupervised Machine Translation Using Monolingual Corpora Only 🍬 Liu, Y., Gu, J., Goyal, N., Li, X., Edunov, S., Ghazvininejad, M., ... & Zettlemoyer, L. (2020). Multilingual Denoising Pre-training for Neural Machine Translation> Transactions of the Association for Computational Linguistics, 8, 726-742. 🍬 Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate
W5: Feb 17	Question Answering and Reasoning [slides] HW3 out (Error Analysis) Project Proposal Due, Feb 17 11:59pm	Yang, Z., Qi, P., Zhang, S., Bengio, Y., Cohen, W. W., Salakhutdinov, R., & Manning, C. D. (2018). HOTPOTQA: A Dataset for Diverse, Explainable Multi-hop Question Answering Andreas, J., Rohrbach, M., Darrell, T., & Klein, D. (2016).Learning to Compose Neural Networks for Question Answering Zhang, M. J., & Choi, E. (2021). SITUATEDQA: Incorporating Extra-Linguistic Contexts into QA
W6: Feb 22	Dialogue [slides]	Li, J., Galley, M., Brockett, C., Gao, J., & Dolan, B. (2015). A Diversity-Promoting Objective Function for Neural Conversation Models Zhang, S., Dinan, E., Urbanek, J., Szlam, A., Kiela, D., & Weston, J. (2018). Personalizing Dialogue Agents: I have a dog, do you have pets too? Wang, X., Shi, W., Kim, R., Oh, Y., Yang, S., Zhang, J., & Yu, Z. (2019).Persuasion for Good: Towards a Personalized Persuasive Dialogue System for Social Good 🍬 Rashkin, H., Smith, E. M., Li, M., & Boureau, Y. L. (2018).Towards Empathetic Open-domain Conversation Models: A New Benchmark and Dataset 🍬 Lewis, M., Yarats, D., Dauphin, Y. N., Parikh, D., & Batra, D. (2017). Deal or No Deal? End-to-End Learning for Negotiation Dialogues 🍬 Kang, D., Balakrishnan, A., Shah, P., Crook, P., Boureau, Y. L., & Weston, J. (2019). Recommendation as a Communication Game: Self-Supervised Bot-Play for Goal-oriented Dialogue 🍬 Dialogue system development frameworks: ParlAI and ConvKit
W6: Feb 24	Summarization [slides]	See, A., Liu, P. J., & Manning, C. D. (2017). Get To The Point: Summarization with Pointer-Generator Networks Laban, P., Hsi, A., Canny, J., & Hearst, M. A. (2021). The Summary Loop: Learning to Write Abstractive Summaries Without Examples Jung, T., Kang, D., Mentch, L., & Hovy, E. (2019). Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization 🍬 Rush, A. M., Chopra, S., & Weston, J. (2015). A Neural Attention Model for Abstractive Sentence Summarization
W7: Mar 1	No class
W7: Mar 3	Styles [slides]	Danescu-Niculescu-Mizil, C., Sudhof, M., Jurafsky, D., Leskovec, J., & Potts, C. (2013).Acomputational approach to politeness with application to social factors Kang, D., & Hovy, E. (2019). Style is NOT a single variable: Case Studies for Cross-Style Language Understanding
Mar 8	No class: Spring Break
Mar 10	No class: Spring Break HW3 Due, Mar 11 11:59pm (extended)
W8: Mar 15	Mid-way Project Presentation (Group A)	Knowledge Representation With Semantic Hypergraphs For Text Classification (Carl Molnar and Prithvi Raj Botcha) Contrastive Learning for Rating Prediction with BERT and GraphSAGE (Parker Erickson and Petros Karypis) Improving Unsupervised Machine Translation with Contrastive Learning On Augmented Language Pair (Haoji Hu) Constant Memory Gated Transformers for Long Sequences (James Mooney) Exploring Episodic Memory through Cross-modal representations (Abhiraj Mohan, Emily Mulhall and Jayant Sharma)
W8: Mar 17	Mid-way Project Presentation (Group B)	Evaluating the impact of noisy inputs on textual style transfer (David Ma and Andrew Walker) Narrative Transportation Proposal (Kelsey Neis and Yu Fang) African American English (AAE) Proposal for NLP (Naome Etori, London Lowmanstonem, and Michael Lucke) Multilingual Sarcasm Detection in English and Arabic (Thomas Nguyen, Jeffrey Jia, and Josh Spitzer-Resnick)
W9: Mar 22	Generation [slides] Kathleen McKeown's keynote speech at ACL 2020, Rewriting the Past: Assessing the Field through the Lens of Language Generation	Barzilay, R., & Lapata, M. (2008).Modeling Local Coherence: AnEntity-Based Approach Computational Linguistics, 34(1), 1-34. Hashimoto, T. B., Zhang, H., & Liang, P. (2019). Unifying Human and Statistical Evaluation for Natural Language Generation
W9: Mar 22	Coreference and IE	Daniel Jurafsky & James H. Martin. Information Extraction, Speech and Language Processing. Strubell, E., Verga, P., Andor, D., Weiss, D., & McCallum, A. (2018). Linguistically-Informed Self-Attention for Semantic Role Labeling 🍬 NLP concepts with spaCy 🍬 NeuralCoref 4.0
W9: Mar 24	Dataset Annotation [slides]	Gururangan, S., Swayamdipta, S., Levy, O., Schwartz, R., Bowman, S. R., & Smith, N. A. (2018). Annotation Artifacts in Natural Language Inference Data Zellers, R., Bisk, Y., Schwartz, R., & Choi, Y. (2018). Swag: A large-scale adversarial dataset for grounded commonsense inference McCoy, R. T., Pavlick, E., & Linzen, T. (2019). Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference
W10: Mar 29	Hypothesis testing and Evaluation [slides] Student discussion	Dror, R., Baumer, G., Shlomov, S., & Reichart, R. (2018, July)The Hitchhiker’s Guide to Testing Statistical Significance in Natural Language Processing In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) (pp. 1383-1392). Card, D., Henderson, P., Khandelwal, U., Jia, R., Mahowald, K., & Jurafsky, D. (2020).With Little Power Comes Great Responsibility Guo, C., Pleiss, G., Sun, Y., & Weinberger, K. Q. (2017, July). On Calibration of Modern Neural Networks In International Conference on Machine Learning (pp. 1321-1330). PMLR. GLUE and GEM Benchmarks Clark, E., August, T., Serrano, S., Haduong, N., Gururangan, S., & Smith, N. A. (2021). All That's 'Human' Is Not Gold: Evaluating Human Evaluation of Generated Text Dhole, K. D., Gangal, V., Gehrmann, S., Gupta, A., Li, Z., Mahamood, S., ... & Zhang, Y. (2021).NL-Augmenter: A Framework for Task-Sensitive Natural Language Augmentation
W10: Mar 31	Social NLP Guest lecture by Anjalie Field (CMU) [slides] [recording]	Field, A., Blodgett, S. L., Waseem, Z., & Tsvetkov, Y. (2021). A Survey of Race, Racism, and Anti-Racism in NLP Yates, A., Cohan, A., & Goharian, N. (2017).Depression and Self-Harm Risk Assessment in Online Forums Shmueli, B., Fell, J., Ray, S., & Ku, L. W. (2021). Beyond Fair Pay: Ethical Implications of NLP Crowdsourcing Demszky, D., Garg, N., Voigt, R., Zou, J., Gentzkow, M., Shapiro, J., & Jurafsky, D. (2019). Analyzing polarization in social media: Method and application to tweets on 21 mass shootings
W11: Apr 5	Biases and ethics Guest lecture by Dr. Jieyu Zhao (UMD) [slides] [recording]	Zhao, J., Wang, T., Yatskar, M., Ordonez, V., & Chang, K. W. (2017). Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints Blodgett, S. L., Barocas, S., Daumé III, H., & Wallach, H. (2020). Language (Technology) is Power: A Critical Survey of “Bias” in NLP Hovy, D., & Spruit, S. L. (2016, August). The Social Impact of Natural Language Processing In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) (pp. 591-598). Carlini, N., Tramer, F., Wallace, E., Jagielski, M., Herbert-Voss, A., Lee, K., ... & Raffel, C. (2021). Extracting Training Data From Large Language ModelsIn 30th USENIX Security Symposium (USENIX Security 21) (pp. 2633-2650).
W11: Apr 7	Robust and Adversarial NLP Guest lecture by Eric Wallace (UC Berkeley) [recording]	Ribeiro, M. T., Wu, T., Guestrin, C., & Singh, S. (2020). Beyond Accuracy: Behavioral Testing of NLP Models with CheckList Jia, R., & Liang, P. (2017). Adversarial Examples for Evaluating Reading Comprehension Systems Wallace, E., Feng, S., Kandpal, N., Gardner, M., & Singh, S. (2019). Universal Adversarial Triggers for Attacking and Analyzing NLP
W12: Apr 12	Controlability Guest lecture by Dr. Sumanth Dathathri (Google DeepMind) [recording] HW4 out (Data annotation)	Dathathri, S., Madotto, A., Lan, J., Hung, J., Frank, E., Molino, P., ... & Liu, R. (2019). Plug and Play Language Models: A Simple Approach to Controlled Text Generation Smith, E. M., Gonzalez-Rico, D., Dinan, E., & Boureau, Y. L. (2020). Controlling Style in Generated Dialogue
W12: Apr 14	Data-centric NLP Guest lecture by Dr. Swabha Swayamdipta (AI2/USC) [recording will not be available]	Swayamdipta, S., Schwartz, R., Lourie, N., Wang, Y., Hajishirzi, H., Smith, N. A., & Choi, Y. (2020). Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics Ratner, A., Bach, S. H., Ehrenberg, H., Fries, J., Wu, S., & Ré, C. (2017, November). Snorkel: Rapid training data creation with weak supervision In Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases (Vol. 11, No. 3, p. 269). NIH Public Access. Shaham, U., Segal, E., Ivgi, M., Efrat, A., Yoran, O., Haviv, A., ... & Levy, O. (2022). SCROLLS: Standardized CompaRison Over Long Language Sequences
W13: Apr 19	Language Grounding to Vision and Robotics Guest lecture by Dr. Yonatan Bisk (CMU) [recording will not be available]	Bisk, Y., Holtzman, A., Thomason, J., Andreas, J., Bengio, Y., Chai, J., ... & Turian, J. (2020). Experience Grounds Language Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., ... & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision and Ramesh, A., Pavlov, M., Goh, G., Gray, S., Voss, C., Radford, A., ... & Sutskever, I. (2021). Zero-Shot Text-to-Image Generation Padmakumar, A., Thomason, J., Shrivastava, A., Lange, P., Narayan-Chen, A., Gella, S., ... & Hakkani-Tur, D. (2021). TEACh: Task-driven Embodied Agents that Chat and Grauman, K., Westbury, A., Byrne, E., Chavis, Z., Furnari, A., Girdhar, R., ... & Malik, J. (2021). Ego4D: Around the World in 3,000 Hours of Egocentric Video
W13: Apr 21	Final project presentation (A)	Narrative Transportation Proposal (Kelsey Neis and Yu Fang) African American English (AAE) Proposal for NLP (Naome Etori, London Lowmanstonem, and Michael Lucke) Multilingual Sarcasm Detection in English and Arabic (Thomas Nguyen, Jeffrey Jia, and Josh Spitzer-Resnick)
W14: April 26	Final project presentation (B)	Constant Memory Gated Transformers for Long Sequences (James Mooney) Exploring Episodic Memory through Cross-modal representations (Abhiraj Mohan, Emily Mulhall and Jayant Sharma) Evaluating the impact of noisy inputs on textual style transfer (David Ma and Andrew Walker)
W14: April 28	Final project presentation (C) HW4 Due, May 5, 11:59pm Project report Due, May 5 11:59pm	Knowledge Representation With Semantic Hypergraphs For Text Classification (Carl Molnar and Prithvi Raj Botcha) Contrastive Learning for Rating Prediction with BERT and GraphSAGE (Parker Erickson and Petros Karypis) Improving Unsupervised Machine Translation with Contrastive Learning On Augmented Language Pair (Haoji Hu)

Important but non-covered topics:

	Interpretability	Madsen, A., Reddy, S., & Chandar, S. (2021). Post-hoc Interpretability for Neural NLP: A Survey Wallace, E., Tuyls, J., Wang, J., Subramanian, S., Gardner, M., & Singh, S. (2019). AllenNLP Interpret: AFramework for Explaining Predictions of NLP Models and Demo
	Human-in-the-loop and Interactive NLP	Kiela, D., Bartolo, M., Nie, Y., Kaushik, D., Geiger, A., Wu, Z., ... & Williams, A. (2021). Dynabench: Rethinking Benchmarking in NLP and Nie, Y., Williams, A., Dinan, E., Bansal, M., Weston, J., & Kiela, D. (2019).Adversarial NLI: A New Benchmark for Natural Language Understanding Liu, A., Swayamdipta, S., Smith, N., Choi, Y. WANLI:Worker and AI Collaboration for Natural Language Inference Dataset Creation

Grading and Late Policy

Grading

40% Homeworks (total four homeworks)
50% Final Project
10% (potential bonus) Class Participation

Active participation in class discussions and project presentations

Late policy for deliverables

Each student will be granted 3 late days to use for homeworks over the duration of the semester. After all free late days are used up, penalty is 25% for each additional late day. However, projects submitted late after all late days have been used will receive no credit.

Homework Details (40%)

HW1: Paper Presentation (10%)

Please check the list of papers in the Readings tab in the schedule and place your name on two papers this sheet. Presenters are limited to two per class, so do not assign yourself if there are already two presenters except for Jan 27 (first two papers on Jan 27 will be presented on Jan 25). .

You are responsible for presenting the papers in class and leading the discussion. During every class, two students present for 20 minutes each, including QA and discussion. First, make an overview of the paper (10 minutes) and prepare three points for discussion (10 minutes), such as limitations of the proposed method, future directions, links to other similar papers, etc.

Please upload your slides here before the class. It is possible to borrow slides from authors, but you must have a deep understanding of the work and provide potential discussion points. The filename of your slides should be 0120_{Paper Title}_{Your Name}_{first,second}.{pptx,pdf} Sometimes there are more than two comparative or incremental papers assigned in one bullet point; then, you have to make a comparison between them and get a bonus point (2%). In some classes like Jan 25, there are no specific papers to read so we discuss papers from next lecture's topics.

HW2: Paper Replication (10%)

Due: Feb 10 11:59pm

You will get a taste of NLP leaderboard culture in this homework. You need to choose one of the following NLP tasks and replicate/reimplement the model. I strongly recommend you use existing code written by authors that appear in Papers-with-Code leaderboard or to use some basic Transformer models implemented in HuggingFace libraries. Do not spend too much time replicating the code. Instead, run an existing code on your target dataset, ensure you use the same evaluation metrics as the paper, and compare your results to the paper's.

Note that you will be treated as cheating if you do not correctly cite any tool or paper you consulted. The homework would serve as the foundation for your homework 3 and 4, and possibly your project.

Choose one of the following models and datasets. If you would like to choose other tasks and datasets, please talk to DK by Jan 28.

Tasks	Datasets
Sentiment classification	SST2 (leaderboard, paper)
Natural Language Inference	SNLI (leaderboard, paper) MultiNLI (leaderboard, paper) MRPC (leaderboard, paper)
Commonsense Reasoning	Winograd Challenge (leaderboard, paper) CommonsenseQA (leaderboard, paper)
Dialogue, Summarization, and Style Transfer	PersonaChat (leaderboard, paper) XSum (leaderboard, paper) CNN / Daily Mail (leaderboard, paper) GYAFC (leaderboard, paper)
QA and Visual QA	HotpotQA (leaderboard, paper) SQuAD 2.0 (leaderboard, paper) GQA (leaderboard, paper) VQA 2.0 (leaderboard, paper)
Semantic Evaluation	SemEval 2020 SemEval 2021 SemEval 2022

Please upload your code and report to Canvas by Feb 10 11:59pm.

Code: a zipped file containing your training/inference scripts.
Report: 2 - 3 pages, including model description with references, link to the original codes you referred to, evaluation metrics, performance comparison with other models in the leaderboard, training/inference time, sweeping of hyperparameters (e.g., learning parameter, dropout rate), and other details of the experiment.

Resources:

Papers-with-Code for leaderboards and state-of-the-art models
Google Colab for model training
ParlAI and ConvKit for dialogue
HuggingFace Datasets and HuggingFace Models for Transformer models

HW3: Error Analysis (10%)

Due: Mar 11 11:59pm

You will now analyze the errors of your model implemented in the previous homework. HW3 consists of four steps where each step has a bonus point so please be creative and try other analysis techniques.

Step #1: collect and featurize errors

Input text
Ground-truth label
Predicted label with confidence score (i.e., softmax output from your classifier to the ground-truth label)
(bonus points) other metadata or linguistic features using Spacy or other tools
E.g., length of the sentence, POS tags, named entities, sentiment score, etc

Step #2: label error types and fixes

Types/Causes of errors, e.g., incorrect annotation and over-generalization
Potential solutions to fix the cause, e.g., more training samples and some rules

Step #3: visualize errors

Take vector representations of correct and incorrect samples from the classifier’s output (HuggingFace's model output class)
Project them in reduced dimensions using PCA or t-SNE (paper, code) (i.e., 768 dimension -> 2 dimension)

Project the samples in dataset map space (paper1 and paper2)
where x-axis is confidence scores to ground-truth label and y-axis is variances of classifier’s prediction over the epochs of training

Step #4: analyze

e.g., apply movie review to sentiment classifier trained on SST2
e.g., apply medical text to entailment classifier trained on MNLI

Please upload your annotated spreadsheet and report to Canvas by March 3 11:59pm.

Spreadsheet: maximum 500 error samples with annotated errors, extracted features, and labeled types/fixes.
Report: maximm 4 pages, including distribution of features/types/fixes, visualizations, and in-depth analysis and discussion.

Resources:

Find some examples from the lecture slides: S2-14

HW4: Data Annotation (10%)

Due: May 5, 11:59pm
n this assignment, you will learn how data annotation works in NLP research and how important they are in NLP model development. Your group will form a group of three or four people, collect 300 adversarial samples on a target task that can fool the system you built in homework 2, annotate them by each of your team members, calculate inter-annotator agreement (IAA), and write down a short report on your experience.
Please read this description carefully.

Project Details (50%)

The class project is meant for a group of students (2~3) to taste a full pipeline of NLP research, from data annotation to model development to experiment and error analysis to visualization to discussion on limitations and ethical issues. Please read the project description slides.

A course project would be one of the following types:

New research results judged suitable for acceptance to a top NLP or ML conference like ACL/EMNLP/NeurIPS/ICLR,
Evaluation and critical analysis of existing work on a new dataset,
An in-depth literature survey, or
New open-source repository or dataset with a high impact on the community

Your project will be evaluated in the following criteria:

Proposal and literature review (10%), Due: Feb 17, 11:50pm

Maximum 3 pages

Midterm presentation (10%), Mar 15 and 17

10-min presnetation and 5-min QA
Check out the presentation schedule
Upload your slides here before the class
Expected content to be presented:
- Specific feedback you like to get from audience
- Motivation
- Problem definition
- Novel contribution compared to prior work
- Proposed methods
- Initial results
- Plan for the second half of the semester

Final presentation (10%), April 21, 26, and 28

15-min presnetation and 10-min QA
Check out the final presentation schedule
Upload your slides here before your presentation
Expected content to be presented:
- Motivation, problem definition, and novel contribution compared to prior work
- Proposed methods with "motivational examples"
- Experimental setups and final results
- Discussion on limitations, ethical issues, etc
- Conclusion and future directions

Final report and code (20%), Due: May 5, 11:50pm

Maximum 8 pages
Rubrick for evaluation

Every group member should submit their report, link to code, and presentation slides on Canvas before the deadline. For both proposal and final reports, please use official ACL style templates (Overleaf or links). Note that your report and slides would be publicly shared on this page.

Prerequisites

CSCI 5521 Introduction to Machine Learning or any other course that covers fundamental machine learning algorithms.

Furthermore, this course assumes:

Good coding ability, corresponding to at least a third or fourth-year undergraduate CS major. Assignments will be in Python.
Background in basic probability, linear algebra, and calculus.

Notes to students

Academic Integrity

Assignments and project reports for the class must represent individual effort unless group work is explicitly allowed. Verbal collaboration on your assignments or class projects with your classmates and instructor is acceptable. But, everything you turn in must be your own work, and you must note the names of anyone you collaborated with on each problem and cite resources that you used to learn about the problem. If you have any doubts about whether a particular action may be construed as cheating, ask the instructor for clarification before you do it. Cheating in this course will result in a grade of F for course and the University policies will be followed.

Students with Disabilities

If you have a disability for which you are or may be requesting an accommodation, you are encouraged to contact both your instructor and Disability Resources Center (DRC).

COVID-19

All students are expected to abide by campus policies regarding COVID-19 including masking and vaccination requirements. This is an in-person class with daily in-person activities, but we may consider a hybrid or online option. If you're feeling sick, stay at home and catch up with the course materials instead of coming to class!

Book

Textbook is not required but the following books are primarily referred:

Jurafsky and Martin, Speech and Language Processing, 3rd edition [online]
Jacob Eisenstein. Natural Language Processing

The course materials are inspired by the slides of Dan Jurafsky at Stanford, David Bamman at UC Berkeley, and Noah Smith at University of Washington.

Resources

From Languages to Information, Dan Jurafsky, Stanford
Natural Language Understanding, Christopher Potts, Stanford, Spring 2021
Natural Language Processing, David Bamman, UC Berkeley
Algorithms for NLP , Yulia Tsvetkov and David Mortensen, CMU