Linguistic Resources (Summer 2020)
General Information
Instructors: | Regina Stodden Tatiana Bladier |
Sessions: | Wednesday, 10:30 – 12:00, Room 24.21.03.61 |
Office hours: | By appointment |
Email: | stodden[at]phil.hhu.de bladier[at]phil.hhu.de |
Course Description
The focus of the course is on linguistic resources, such as web corpora, language models, (lexical & syntactical) databases, treebanks, etc.The focus is on answering the following questions:
- How can suitable data sources be found?
- What types of data sources are available?
- In which formats are the data provided and how can it be handled?
- How and for what can these data sets be used?
- How can own data sets be created?
- How can the data be analyzed?
These topics are first theoretically introduced and then practically addressed. Prior knowledge of Python is therefore desirable but not mandatory. The goal is to support students in selecting, creating, and processing suitable linguistic resources and analysis methods for quantitative questions, such as for term papers, theses, or research papers.
Requirements (BN-Scheine)
Following the handbook of modules, there will be only an opportunity for a proof of active participation (BN) and no opportunity for an exam.For the obtainment of the BN, the completion of regular homework is required.
Schedule
08 Apr | Introduction and Overview |
15 Apr | CELEX and POS tagging |
22 Apr | Corpora Types |
29 Apr | Practical Session: Spacy for Small Annotations |
6 May | CONLL and UDs |
13 May | XML Annotations |
20 May | FrameNet |
27 May | Treebanks and NLTK Tools |
03 Jun | Web Corpora and HTML Annotation |
10 Jun | CoWeb |
17 Jun | SketchEngine |
24 Jun | Annotation Tools |
01 Jul | Language Models |
08 Jul | Corpora Standarts, Copyrights, and Licenses |
15 Jul | Final Session |