Deep Learning in NLP (Winter 2019/2020)

General Information

Instructors: Christian Wurm
Jakub Waszczuk
Theoretical sessions: Monday, 14:30 – 16:00, room
Practical sessions: Tuesday, 14:30 – 16:00, room
Course web page:
(This web page, which will be updated throughout the course.)
Office hours: by appointment
Languages: German and English

Course Description

The aim of this course is to understand the state-of-the-art techniques of neural networks and to apply them in practice, to natual language processing problems in particular.

Monday sessions will be typically dedicated to theory, Tuesday sessions – programming. During the practical sessions, we will mostly use the PyTorch framework to implement our networks. Instructions on how to install all the necessesary tools on Ubuntu are here.


The theoretical content can be found in the script (caution, frequent updates!). Last updated: January 20, 2020.


  • BN: Complete the theoretical and the programming homework exercises. The homeworks will be published on this web page as we go.
  • AP: Term paper: 4-5 pages for undergrad students, 7-10 pages for master students. Please use the ACL 2020 stylesheet (Latex, Word). You can pick a topic of your choice, it does not necessarily have to be NLP-related. Documented code and running/installation instructions make part of the deliverables.
    • UPDATE 14.02: The code should be documented the standard way: you should provide docstrings/comments which explain what it does and why, especially for more complicated chunks of code.


Preliminary schedule of the practical sessions:

8 Oct

Introduction and Overview

15 Oct

Python refresher

Recall how to program in Python and get familiar with the development environment (VSCode, IPython).

Python refresher and some hints on using VSCode and IPython.

Homework exercise (updated on 15/10), the partial solution (unpack it first), and the dataset with person names.

General feedback and solution to the first homework.

21 Oct Vektoren Matrizen
22 Oct

Basic end-to-end example

An end-to-end example of applying a neural network to a simple classification task. We will implement a feed-forward network using the basic PyTorch primitives.

Homework and the corresponding code (unpack it first).

Additional material on using tensors in PyTorch.

General feedback and solution to the second homework.

28 Oct

Lineare Regression

Theoretical homework and the corresponding solution.
29 Oct

Language classification (I): Application design

Tackle the simple NLP task of classifying person names according to their language (English, German, French, …). Implement a couple of higher-level modules/classes on top of the basic primitives provided by the PyTorch framework, which will allow us to build more complex deep learning models.

Practical session, the corresponding code (original), as well as the version we worked on during the class.

(Link to the theoretical homework moved up, see 28 Oct)

4 Nov Lineare Separierbarkeit
5 Nov

Application design continued

We continue working on the practical session. Download the partial solution (zip) as we left it last week. The additional explanations on github may also be helpful.

Homework (updated on 11/11), based on the practical session. The solution can be found on github.

11 Nov

Das einfache Neuron und Tiefe Architekturen

Theoretical homework and the corresponding solution.
12 Nov

Stochastic gradient descent

Implement stochastic gradient descent, learn about PyTorch optimizers (e.g. Adam).

Github material on SGD.

Full train/dev/test split. With SGD, we should be able to train on the entire training set. We have already seen the dev part (dev80.csv + dev20.csv). You should avoid doing any experiments with test.csv yet.

19 Nov


Batching is the technique of specifying neural computations over batches (i.e., sets) of dataset elements. It allows for better parallelization and, hence, faster computations.

Homework, the corresponding code, and the solution. Additionally, Ex. 1 solution notes.

Ideas for further improving the language prediction model.

25 Nov

Backpropagation (theory)

26 Nov

Backpropagation (practical aspects)

Manually specifying backpropagation procedures.

3 Dec

POS tagging (I): Embedding

Exercises and the corresponding code (also includes the UD dataset sample).

The code (without the dataset) is also on github.

UPDATE: version of the code with modifications implemented during the session.

9 Dec

LSTM (theory)

Theoretical homework

10 Dec

POS tagging (II): Scoring and Training

Exercises and the corresponding code (also includes the UD dataset sample).

UPDATE: version of the code with modifications implemented during the session.

17 Dec

POS tagging (III): Training and LSTMs

Exercises and the corresponding code (also includes the UD dataset sample).

UPDATE: version of the code with modifications implemented during the session.

UPDATE 31.12.2019: optimized version of the code. See also the description of the optimization steps.

7 Jan

POS tagging (IV): pre-trained word embeddings + dropout

Finalize the implementation of the POS tagger.

Exercises and the corresponding code (also includes the UD dataset sample).

Note that the code contains certain optimizations implemented during the break. These optimizations speed up training without changing the underlying model (the accuracy should be roughly the same).

English fastText word vectors: the 10^5 most frequent words and the words present in the English UD treebank (both files are based on

UPDATE: version of the code after the session (with the fastText vectors included).

14 Jan

Dependency parsing (I)

Task description (updated on 21.01.2020) and the corresponding code (includes the UD dataset sample and the fastText vectors).

Version of the code after the session (WARNING: this is the last time the code is published on this webpage, make sure to keep track of your own code from now on!)

21 Jan

Dependency parsing (II)

27 Jan

Project proposal presentations

28 Jan

Dependency parsing (III)

Dependency-aware loss function (click raw at the top of the page to copy-and-paste the code)

Some topics we may consider later on:

  • Self-attention
  • Structured prediction
  • Regularization (dropout)
  • ,,Recursive’’ (tree-structured) networks
  • Language modeling with neural networks
  • Unsupervised learning of word embeddings
  • Multi-task learning
  • Neural machine translation