Deep Learning in NLP (Winter 2019/2020)

General Information

Instructors: Christian Wurm
Jakub Waszczuk
Theoretical sessions: Monday, 14:30 – 16:00, room
Practical sessions: Tuesday, 14:30 – 16:00, room
Course web page:
(This web page, which will be updated throughout the course.)
Office hours: by appointment
Languages: German and English

Course Description

The aim of this course is to understand the state-of-the-art techniques of neural networks and to apply them in practice, to natual language processing problems in particular.

Monday sessions will be typically dedicated to theory, Tuesday sessions – programming. During the practical sessions, we will mostly use the PyTorch framework to implement our networks. Instructions on how to install all the necessesary tools on Ubuntu are here.


The theoretical content can be found in the script (caution, frequent updates!). Last updated: November 12, 2019.


  • BN: Complete the theoretical and the programming homework exercises. The homeworks will be published on this web page as we go.
  • AP: Term paper (4-5 pages for undergrad students, 7-10 pages for master students). OR: final exam.


Preliminary schedule of the practical sessions:

8 Oct

Introduction and Overview

15 Oct

Python refresher

Recall how to program in Python and get familiar with the development environment (VSCode, IPython).

Python refresher and some hints on using VSCode and IPython.

Homework exercise (updated on 15/10), the partial solution (unpack it first), and the dataset with person names.

General feedback and solution to the first homework.

21 Oct Vektoren Matrizen
22 Oct

Basic end-to-end example

An end-to-end example of applying a neural network to a simple classification task. We will implement a feed-forward network using the basic PyTorch primitives.

Homework and the corresponding code (unpack it first).

Additional material on using tensors in PyTorch.

General feedback and solution to the second homework.

28 Oct

Lineare Regression

Theoretical homework and the corresponding solution.
29 Oct

Language classification (I): Application design

Tackle the simple NLP task of classifying person names according to their language (English, German, French, …). Implement a couple of higher-level modules/classes on top of the basic primitives provided by the PyTorch framework, which will allow us to build more complex deep learning models.

Practical session, the corresponding code (original), as well as the version we worked on during the class.

(Link to the theoretical homework moved up, see 28 Oct)

4 Nov Lineare Separierbarkeit
5 Nov

Application design continued

We continue working on the practical session. Download the partial solution (zip) as we left it last week. The additional explanations on github may also be helpful.

Homework (updated on 11/11), based on the practical session. The solution can be found on github.

11 Nov

Das einfache Neuron und Tiefe Architekturen

Theoretical homework.
12 Nov

Stochastic gradient descent

Implement stochastic gradient descent, learn about PyTorch optimizers (e.g. Adam).

Github material on SGD.

Full train/dev/test split. With SGD, we should be able to train on the entire training set. We have already seen the dev part (dev80.csv + dev20.csv). You should avoid doing any experiments with test.csv yet.

19 Nov


Batching is the technique of specifying neural computations over batches (i.e., sets) of dataset elements. It allows for better parallelization and, hence, faster computations.

26 Nov


Manually specifying backpropagation procedures.

3 Dec

Convolutional networks

Implement and apply a convolutional neural network (CNN) to the language classification task (TODO: or POS tagging).

CNN is designed to identify indicative local features in a large input structure, and to combine them into a fixed size vector representation. It can thus serve as an ,,n-gram detector’’ and should improve the language classification performance over the simple continuous bag-of-words (CBOW) input representation.

10 Dec

Recurrent networks (I)

Implement recurrent neural networks (RNNs) and apply them to the task of language classification.

One of the main properties of RNNs is that they can be applied to sequential input, which makes them prevalent in NLP (both written and spoken language is sequential in nature, at least on the surface). In contrast to CNNs, which only capture local input patterns, RNNs are in principle able to handle long-distance dependencies. Since the output of a RNN is a sequence, it can be also used as a component in sequence labeling tasks (such as part-of-speech tagging, named entity recognition, etc.).

17 Dec

Recurrent networks (II): LSTM

Introduce the LSTM variant of a recurrent network.
7 Jan

Pre-trained word embeddings

Using pre-trained word embeddings in NLP tasks on the example of POS tagging.

Some topics we may consider later on:

  • Self-attention
  • Structured prediction
  • Regularization (dropout)
  • ,,Recursive’’ (tree-structured) networks
  • Language modeling with neural networks
  • Unsupervised learning of word embeddings
  • Multi-task learning
  • Neural machine translation