There has been remarkable progress in the capabilities of computers to process and generate language in the last five years. This course is about learning and applying Deep Learning approaches in Natural Language Processing (NLP), from word vectors to Transformers, including GPT-3, GPT-4 and chatGPT. It is a project-based course that teaches the fundamentals of neural-network-based NLP and gives students the opportunity to pursue a unique project.
The course lecture material begins with the basics of word-level embeddings – their properties and training. These form the basis of neural-network-based classifiers employed to do classification of sentiment, named entity recognition and many other language tasks. A significant part of the course is about the Transformer architecture – its structure, and training. This will include the use of the transformer as a classifier, but also as in generative mode, in which language is produced in response to input language. Much of the learning will be applied in three or four hands-on programming assignments. Students will also do a major project of their own choosing to make use of these capabilities.
Instructor:
Jonathan Rose (Jonathan.Rose@ece.utoronto.ca) - Department of Electrical and Computer Engineering. Office: Engineering Annex, Room 319.
Teaching Assistants:
Andrew Brown (andrewm.brown@mail.utoronto.ca)
Zining Zhu (zining@cs.toronto.edu)
Text
Speech and Language Processing, 3rd Edition Draft (January 2022 version)
https://web.stanford.edu/~jurafsky/slp3/ed3book_jan122022.pdf
This text is missing a first chapter, but you can grab a reasonable version of the first chapter from the second edition, here: https://github.com/rain1024/slp2-pdf
Lectures
Lecture 0 - Introduction to the Course and Pre-requisites, Course Structure Notes Video
Lecture 1 - Word Embedding/Vector Properties and Meanings Notes Video
Lecture 2 - Training of Word Embeddings Notes Video
Lecture 3 - Classification of Language using Word Embeddings Notes Video
Lecture 4 - Introduction to Transformers and Project Structure Notes Project Slides Video
Lecture 5 - The Core Mechanism of Transformers: Attention Notes Video
Lecture 6 - Language Generation Using Transformers and Project Ideation Notes Project Slides Video
Lecture 7 - Understanding Transformers & Tokenization Notes Video
Lecture 8 - Proposal Presentations
Lecture 9 - How GPT-3 is Trained to Respond to Human Intent Notes Video
Lecture 10/11 - Project Consultations Notes
Lecture 12 - Project Presentations Notes
Assignments
# Date Assigned Assignment Due
1 13-Sep Word Embeddings – Properties, Meaning and Training 26-Sept
2 27-Sep Classification of Subjective/Objective Text 10-Oct
3 11-Oct Understanding, Training and Using Transformers for Classification 25-Oct
4 25-Oct Generation of Text Using Transformers: Question Answering 15-Nov
A.I. Meet Yu-Gi-Oh! - Generate Card Text for Yu-Gi-Oh! Game
Aladdin Recommender - Movie Recommendation System based on Other Movies
Antiqu-ator - Generate Shakespearian Language from Regular English
Argument Gate - Rate how convincing an argument is
Artistic GENREator - Generate Song Lyrics in a Specific Song Genre
Caption Me - Generate Captions of Pictures with Dogs
Coding Challenge Generator - Generate Software Coding Challenge Problems
DeCo - Classify Businesses into Multiple Categories for Investment Purposes
Destructive Language Rater - Identify Different Levels of Toxic Language
Emojimotion - Generate Emojis for a given text and specified emotion
Eng2Py - Generate Sorting Algorithms with Natural Language Queries
EZPaperSearch - Search for Papers related to a Given Paper
Fairytale - Generate Fairytales that Illustrate Specific Values and at Specific Grade Level
Fake News Detection - that’s what it does
GOTalk - A Choose-your-own Chat with Game of Thrones Characters
Grammar Error Correction - Correcting Errors in Writing
Hive Mind Investor - Stock Market Prediction Based on Social Media Stock Sentiment
IELTS composition score predictor - Predicting the score of a English language Essay
Newsify - Generate a News Article
Noffence - Recognizing Hate Speech
PyOverflow - Answer Questions about Python Using the Information from the Python Manual
Remy - Generate Recipes for Cake and Cocktails
SafeChat - Identify Chat Text that has Suicidal Ideation
Sensify - Measure the Emotion in a Text
Song Genre Classifier - Classify Song Genre based on the Lyric Text
SoulsGen - Generate Cards in the Dark Souls card game
Sum(Text) - Automatic Text Summarization of News Articles
The Survey Insider - Automating Training Survey Response Classification
TrainAssist - Automating Training Survey Response Classification