Deep Learning in NLP (Winter 2019/2020)
|Theoretical sessions:||Monday, 14:30 – 16:00, room 24.21.05.61|
|Practical sessions:||Tuesday, 14:30 – 16:00, room 24.21.03.62-64|
|Course web page:||https://user.phil.hhu.de/~waszczuk/teaching/hhu-dl-wi19/|
|(This web page, which will be updated throughout the course.)|
|Office hours:||by appointment|
|Languages:||German and English|
The aim of this course is to understand the state-of-the-art techniques of neural networks and to apply them in practice, to natual language processing problems in particular.
Monday sessions will be typically dedicated to theory, Tuesday sessions – programming. During the practical sessions, we will mostly use the PyTorch framework to implement our networks. Instructions on how to install all the necessesary tools on Ubuntu are here.
The theoretical content can be found in the script (caution, frequent updates!). Last updated: November 12, 2019.
- BN: Complete the theoretical and the programming homework exercises. The homeworks will be published on this web page as we go.
- AP: Term paper (4-5 pages for undergrad students, 7-10 pages for master students). OR: final exam.
Preliminary schedule of the practical sessions:
Introduction and Overview
Recall how to program in Python and get familiar with the development environment (VSCode, IPython).
General feedback and solution to the first homework.
|21 Oct||Vektoren Matrizen|
Basic end-to-end example
An end-to-end example of applying a neural network to a simple classification task. We will implement a feed-forward network using the basic PyTorch primitives.
Additional material on using tensors in PyTorch.
General feedback and solution to the second homework.
|28 Oct||Theoretical homework and the corresponding solution.|
Language classification (I): Application design
Tackle the simple NLP task of classifying person names according to their language (English, German, French, …). Implement a couple of higher-level modules/classes on top of the basic primitives provided by the PyTorch framework, which will allow us to build more complex deep learning models.
(Link to the theoretical homework moved up, see 28 Oct)
|4 Nov||Lineare Separierbarkeit|
Application design continued
|11 Nov||Theoretical homework.|
Stochastic gradient descent
Implement stochastic gradient descent, learn about PyTorch optimizers (e.g. Adam).
Full train/dev/test split. With SGD, we should be able to train on the entire training set. We have already seen the dev part (dev80.csv + dev20.csv). You should avoid doing any experiments with test.csv yet.
Batching is the technique of specifying neural computations over batches (i.e., sets) of dataset elements. It allows for better parallelization and, hence, faster computations.
Manually specifying backpropagation procedures.
Implement and apply a convolutional neural network (CNN) to the language classification task (TODO: or POS tagging).
CNN is designed to identify indicative local features in a large input structure, and to combine them into a fixed size vector representation. It can thus serve as an ,,n-gram detector’’ and should improve the language classification performance over the simple continuous bag-of-words (CBOW) input representation.
Recurrent networks (I)
Implement recurrent neural networks (RNNs) and apply them to the task of language classification.
One of the main properties of RNNs is that they can be applied to sequential input, which makes them prevalent in NLP (both written and spoken language is sequential in nature, at least on the surface). In contrast to CNNs, which only capture local input patterns, RNNs are in principle able to handle long-distance dependencies. Since the output of a RNN is a sequence, it can be also used as a component in sequence labeling tasks (such as part-of-speech tagging, named entity recognition, etc.).
Recurrent networks (II): LSTMIntroduce the LSTM variant of a recurrent network.
Pre-trained word embeddingsUsing pre-trained word embeddings in NLP tasks on the example of POS tagging.
Some topics we may consider later on:
- Structured prediction
- Regularization (dropout)
- ,,Recursive’’ (tree-structured) networks
- Language modeling with neural networks
- Unsupervised learning of word embeddings
- Multi-task learning
- Neural machine translation