The project in this course is an opportunity to develop an NLP application in an area of your own choosing.  It also provides the chance to do a full engineering project that is closer to real-world engineering and research than typical course assignments.  While this project has some structure, you will be required to deal with the ambiguity and significant decision making that are part of projects.


This kind of experience brings these beneficial learning outcomes:

1.    Experience with open-ended project in the area of NLP, and practice with the associated decisions and ambiguities of such projects.

2.    Practice of oral and written communication skills.  

3.    Opportunity to learn and showcase engineering project skills.


Project Rules


1.    Projects must be done in groups of 2. You may select your own partner.   A Piazza ‘find project partner’ post has been set up to help you do this.  No groups of 3 or 1 will be allowed, unless necessary (integer-division-wise) one group of 3 or 1 will be permitted.

2.    The project topic is of your own choosing.  It must make use of neural-network-based Natural Language Processing, as taught in this course, and the training/fine-tuning, validation and testing of NLP-based neural-network system should form an important part of the project.  You have significant latitude in your choice of project topic, but it should most typically be an application of NLP. If you wish to do research on new methods for neural-net based NLP, you should have a longer conversation (at the uniqueness-approval stage) with the instructor.

3.    Novelty of the project:

a.    The project must be unique within the class - no two projects can be on the same topic, as determined by a ‘uniqueness approval’ step in the project.

b.    The project is not required to be novel/new in the world.  Indeed we suggest looking around the internet (see pointers below) to see the types of things others have done in courses like this.  However, you cannot copy code from another project and present it, or something like it, as your own, as that is plagiarism and is a violation of the UofT academic code of conduct.  We do encourage you to contemplate your own ideas first, before looking elsewhere, and to consider the set of projects that we are seeking from outside sources at the University and elsewhere.  Other parts of the novelty will come from your work to both collect and label data.

4.    There should be some data collection and/or possibly labelling that makes up a meaningful part of the training process.  You will not be allowed to simply use someone else’s data set and run a few models on it.  This is a tricky requirement that may require discussion with the instructor because it can be also true that you can spend years collecting and labelling data, so a proper balance has to be set.  

5.    University of Toronto rules on plagiarism apply.  We are aware that there are many machine learning projects already posted on the internet, and these will be checked for plagiarism.

6.    All teams must use a specific GitHub Repository provided to them to store their source code throughout the term.  Note that the University does not claim any ownership of the software produced.