Corpus Linguistics

Part I. Corpus-linguistics basics & SFB Corpora
WS 2012/13 Tuesdays (8.1.2013, 22.1.2013, 29.1.2013, 5.2.2013)
10:30 - 12:00 46.21.04.13 (Kruppstr. 108)


Lecture Slides & Materials

Course book:

  • Tony McEnery, Richard Xiao and Yukio Tono (2006). Corpus-Based Language Studies. London: Routledge

  • 8.1.2013
    Course overview.
    Corpus linguistics basics:

  • definitions of a corpus;
  • corpus design;
  • taxonomies of corpora;
  • methodology or a theory;
  • corpus-based vs. corpus-driven approach;
  • main fields of application of corpus linguistics;
  • data-intensive linguistics.

  • Corpus Linguistics Basiscs I
  • Chris Brew and Marc Moens. Data-Intensive Linguistics

  • 22.1.2013
    Corpus linguistics basics (cont.):

  • representativeness;
  • ballance & sampling;
  • mark-up & annotation;
  • The British National Corpus (BNC);
  • case study: forensic linguistics.

  • Corpus Linguistics Basiscs II
  • BNC User Reference Guide
  • "... and then ... Language description and author attribution" by Malcolm Coulthard
  • statement of: DEREK WILLIAM BENTLEY, aged 19, 1 Fairview Road, London Road

  • 29.1.2013
    Multilingual corpora in SFB 991:

  • JRC-Acquis (EC documents in all member-states languages - Bulgarian, Czech, Danish, German, Greek, English, Spanish, Estonian, Finnish, French, Hungarian, Italian, Lithuanian, Latvian, Maltese, Dutch, Polish, Portuguese, Romanian, Slovak, Slovene and Swedish)
  • LCC (mostly newspaper texts in Catalan, Danish, Dutch, English, Estonian, Finnish, French, German, Icelandic, Italian, Japanese, Korean, Norwegian, Serbian, Sorbian, Spanish, Swedish, Turkish, etc.)
  • MultextEast (Orwell’s “1984” in Bulgarian, Czech, English, Estonian, Hungarian, Macedonian, Persian, Polish, Romanian, Serbian, Slovak and Slovene, aligned at sentence level)

  • Monolingual corpora in SFB 991:
  • Chinese
  • Polish
  • Russian
  • Bulgarian
  • Macedonian

  • AntConc - a freeware concordance program


  • Corpus Linguistics Basiscs III

  • 5.2.2013
    Monolingual corpora in SFB 991 (cont.):

  • German corpora: Mannheimer Korpus 1 & 2, Bonner Zeitungskorpus, LCC (Leipzig Corpora Collection) German corpus, political speeches (president & government); Negra & TiGer
  • English corpora: BNC (British National Corpus), Penn Treebank, Penn Discourse Treebank, OntoNotes English Corpus, Park 700 Dependency Bank

  • BootCaT: Simple Utilities to Bootstrap Corpora And Terms from the Web


  • Corpus Linguistics Basiscs IV


  • Part II. Basic programming skills
    WS 2012/13 Monday & Tuesday (25.2.2013, 26.2.2013)
    9:00 - 12:00 (on the campus)



    Home


    Last updated on 13.2.2013