Introduction to Formal Language Theory (NASSLLI 2014)

The course provides an introduction to the theory of formal languages, grammars and automatons for participants with no or little prior knowledge of formal systems. It concentrates on the question of "How complex are natural languages?" and introduces formal languages from a linguistic point of view. The following topics will be covered by the course: modeling natural languages as formal languages, the Chomsky hierarchy and the properties of its language classes, grammars and automatons for language generation and acceptance, and decision problems and the notion of reducibility. At the end of the course the participants will know the fundamental concepts of formal language theory, the central results and the basic proof techniques. They will gain the necessary methods to dive deeper into the subject by self-study. The course offers supplementary learning modules, such as short quizzes, take-home exercises and audio-visual material.


Wiebke Petersen

is associate professor in mathematical linguistics. She has taught several courses on the theory of formal languages in German at the University of Düsseldorf, and two at the University of Riga and at the Latvian National Library in English.


Kata Balogh

is postdoctoral fellow in computational linguistics with special interests in formal and computational semantics, formal languages and other formal approaches of natural language processing. She is currently teaching regular BA and MA courses at the University of Düsseldorf. She was also teaching at the ILLC / University of Amsterdam.


Institute of Linguistics and Information
Department of Computational Linguistics
Heinrich-Heine-Universität Düsseldorf
Universitütsstr. 1, 40225 Düsseldorf


Day 0

notions of basic set theory

Day 1

Introduction: Languages, Grammars and the Chomsky-hierarchy
Hilbert's Hotel

Day 2

Regular languages

Day 3

Context-free languages

Day 4

Context-sensitive and Mildly context-sensitive languages, Turing-machine
SLIDES animated version
(for displaying the animated Turing machines correctly)
SLIDES print version
(to print; without animated Turing machines)

Day 5

Decision problems
SLIDES animated version
(for displaying the animated PCPs correctly)
SLIDES print version
(to print; without animated PCPs)


Exorciser: exorciser.jar
to exercises finite-state machines, CYK parsing etc.

Turing machine simulator
example inputs for the simulator: TM examples
(copy one of the TMs into the simulator to run)

a "real" Turing Machine by Mike Davey


day 1: cross-word puzzle

day 2: exercise FSAs with Exorciser; you can create your own exercises for constructing an FSA for a given language, converting regular expression to FSA, converting FSA to regular expression, removing epsilon-transitions and converting a non-deterministic FSA to a deterministic FSA

day 3: exercise minimalizing deterministic finite-state-machines with Exorciser; give a context-free grammar for language L2 and L3 on slide 19 and transform one of these grammars into Chomsky normal form;

day 4: using the Pumping lemma for CF languages prove that anbncn is not context-free; create a Turing machine for recognizing anbncn and one for generating anbncn (use the simulator and the example TMs)


Chomsky, Noam (1956): Three models for the description of language. IRE Transactions on Information Theory (2): 113-124. [LINK]

Hopcroft, John E. & Jeffrey D. Ullman (1979): Introduction to Automata Theory, Languages, and Computation (1st ed.). Addison-Wesley.

Hopcroft, John E. & Rajeev Motwani & Jeffrey D. Ullman (2013): Introduction to Automata Theory, Languages, and Computation (3rd ed.). Pearson.

Partee, Barbara H. & Alice ter Meulen & Robert E. Wall (1990): Mathematical Methods in Linguistics. Kluwer Academic Publishers. chapters 16-21.