Linguistic Resources (Summer 2020)

General Information

Instructors: Regina Stodden
Tatiana Bladier
Sessions: Wednesday, 10:30 – 12:00, Room 24.21.03.61
Office hours: By appointment
Email: stodden[at]phil.hhu.de
bladier[at]phil.hhu.de

Course Description

The focus of the course is on linguistic resources, such as web corpora, language models, (lexical & syntactical) databases, treebanks, etc.
The focus is on answering the following questions:

  • How can suitable data sources be found?
  • What types of data sources are available?
  • In which formats are the data provided and how can it be handled?
  • How and for what can these data sets be used?
  • How can own data sets be created?
  • How can the data be analyzed?

These topics are first theoretically introduced and then practically addressed. Prior knowledge of Python is therefore desirable but not mandatory. The goal is to support students in selecting, creating, and processing suitable linguistic resources and analysis methods for quantitative questions, such as for term papers, theses, or research papers.

Requirements (BN-Scheine)

Following the handbook of modules, there will be only an opportunity for a proof of active participation (BN) and no opportunity for an exam.
For the obtainment of the BN, the completion of regular homework is required.


Schedule

08 Apr Introduction and Overview
15 Apr CELEX and POS tagging

Link to Celex

22 Apr Corpora Types

29 Apr Practical Session: Spacy for Small Annotations

6 May CONLL and UDs

13 May XML Annotations

20 May FrameNet

27 May Treebanks and NLTK Tools

03 Jun Web Corpora and HTML Annotation

10 Jun CoWeb

17 Jun SketchEngine

24 Jun Annotation Tools

01 Jul Language Models

08 Jul Corpora Standarts, Copyrights, and Licenses

15 Jul Final Session