Deep Learning in NLP (Winter 2019/2020)

Quick Links

General Information
Course Description
Schedule

General Information

Instructors:	Christian Wurm
	Jakub Waszczuk
Theoretical sessions:	Monday, 14:30 – 16:00, room 24.21.05.61
Practical sessions:	Tuesday, 14:30 – 16:00, room 24.21.03.62-64
Course web page:	https://user.phil.hhu.de/~waszczuk/teaching/hhu-dl-wi19/
	(This web page, which will be updated throughout the course.)
Office hours:	by appointment
Languages:	German and English

Course Description

The aim of this course is to understand the state-of-the-art techniques of neural networks and to apply them in practice, to natual language processing problems in particular.

Monday sessions will be typically dedicated to theory, Tuesday sessions – programming. During the practical sessions, we will mostly use the PyTorch framework to implement our networks. Instructions on how to install all the necessesary tools on Ubuntu are here.

Script

The theoretical content can be found in the script (caution, frequent updates!). Last updated: January 20, 2020.

Requirements

BN: Complete the theoretical and the programming homework exercises. The homeworks will be published on this web page as we go.
AP: Term paper: 4-5 pages for undergrad students, 7-10 pages for master students. Please use the ACL 2020 stylesheet (Latex, Word). You can pick a topic of your choice, it does not necessarily have to be NLP-related. Documented code and running/installation instructions make part of the deliverables.
- UPDATE 14.02: The code should be documented the standard way: you should provide docstrings/comments which explain what it does and why, especially for more complicated chunks of code.

Schedule

Preliminary schedule of the practical sessions:

8 Oct	Introduction and Overview
15 Oct	Python refresher Recall how to program in Python and get familiar with the development environment (VSCode, IPython). Python refresher and some hints on using VSCode and IPython. Homework exercise (updated on 15/10), the partial solution (unpack it first), and the dataset with person names. General feedback and solution to the first homework.
21 Oct	Vektoren Matrizen
22 Oct	Basic end-to-end example An end-to-end example of applying a neural network to a simple classification task. We will implement a feed-forward network using the basic PyTorch primitives. Homework and the corresponding code (unpack it first). Additional material on using tensors in PyTorch. General feedback and solution to the second homework.
28 Oct	Lineare Regression Theoretical homework and the corresponding solution.
29 Oct	Language classification (I): Application design Tackle the simple NLP task of classifying person names according to their language (English, German, French, …). Implement a couple of higher-level modules/classes on top of the basic primitives provided by the PyTorch framework, which will allow us to build more complex deep learning models. Practical session, the corresponding code (original), as well as the version we worked on during the class. (Link to the theoretical homework moved up, see 28 Oct)
4 Nov	Lineare Separierbarkeit
5 Nov	Application design continued We continue working on the practical session. Download the partial solution (zip) as we left it last week. The additional explanations on github may also be helpful. Homework (updated on 11/11), based on the practical session. The solution can be found on github.
11 Nov	Das einfache Neuron und Tiefe Architekturen Theoretical homework and the corresponding solution.
12 Nov	Stochastic gradient descent Implement stochastic gradient descent, learn about PyTorch optimizers (e.g. Adam). Github material on SGD. Full train/dev/test split. With SGD, we should be able to train on the entire training set. We have already seen the dev part (dev80.csv + dev20.csv). You should avoid doing any experiments with test.csv yet.
19 Nov	Batching Batching is the technique of specifying neural computations over batches (i.e., sets) of dataset elements. It allows for better parallelization and, hence, faster computations. Homework, the corresponding code, and the solution. Additionally, Ex. 1 solution notes. Ideas for further improving the language prediction model.
25 Nov	Backpropagation (theory)
26 Nov	Backpropagation (practical aspects) Manually specifying backpropagation procedures.
3 Dec	POS tagging (I): Embedding Exercises and the corresponding code (also includes the UD dataset sample). The code (without the dataset) is also on github. UPDATE: version of the code with modifications implemented during the session.
9 Dec	LSTM (theory) Theoretical homework
10 Dec	POS tagging (II): Scoring and Training Exercises and the corresponding code (also includes the UD dataset sample). UPDATE: version of the code with modifications implemented during the session.
17 Dec	POS tagging (III): Training and LSTMs Exercises and the corresponding code (also includes the UD dataset sample). UPDATE: version of the code with modifications implemented during the session. UPDATE 31.12.2019: optimized version of the code. See also the description of the optimization steps.
7 Jan	POS tagging (IV): pre-trained word embeddings + dropout Finalize the implementation of the POS tagger. Exercises and the corresponding code (also includes the UD dataset sample). Note that the code contains certain optimizations implemented during the break. These optimizations speed up training without changing the underlying model (the accuracy should be roughly the same). English fastText word vectors: the 10^5 most frequent words and the words present in the English UD treebank (both files are based on wiki-news-300d-1M-subword.vec.zip). UPDATE: version of the code after the session (with the fastText vectors included).
14 Jan	Dependency parsing (I) Task description (updated on 21.01.2020) and the corresponding code (includes the UD dataset sample and the fastText vectors). Version of the code after the session (WARNING: this is the last time the code is published on this webpage, make sure to keep track of your own code from now on!)
21 Jan	Dependency parsing (II)
27 Jan	Project proposal presentations
28 Jan	Dependency parsing (III) Dependency-aware loss function (click raw at the top of the page to copy-and-paste the code)

Some topics we may consider later on:

Self-attention
Structured prediction
Regularization (dropout)
,,Recursive’’ (tree-structured) networks
Language modeling with neural networks
Unsupervised learning of word embeddings
Multi-task learning
Neural machine translation

Misc

EtherPad