Corpus of Historical Low German

Introduction to the corpus

The language

Middle Low German is a cover term for a group of related dialects spoken between 1250 and 1600 in northern Germany. Among other things, it served as an international lingua franca around the North and Baltic Seas in the 14th and 15th centuries in connection with the Hanseatic League; during this period, Low German enjoyed much more prestige than it does today. It features a certain partial standardization of written forms incorporating features of different dialects (regionale Schreibsprachen). It was replaced as the written language by (Early New) High German between 1550 and 1650 for sociopolitical reasons, though Low German continues to exist in spoken dialects.

Historical Low German syntax is an under-researched field. Unlike for e.g. older English, French, Icelandic, Portuguese and the classical languages, there are no tagged and parsed corpora available yet. A few texts are available in TITUS, but these are searchable only for word forms, not syntactically parsed. Though the forthcoming Atlas spätmittelalterlicher Schreibsprachen des niederdeutschen Altlandes und angrenzender Gebiete (ASnA) (Peters et al., to appear 2013) will be an invaluable and long-needed resource, its focus on delineating scribal dialects, i.e., phonological and morphological variation; by its creators' admission, it is not designed for the investigation of syntax.

The CHLG

Our plan is to build a modern Corpus of Historical Low German (CHLG), covering two discrete and separate periods:

  • Old Low German (OLG)/Old Saxon (OS) c. 800–1050
  • Middle Low German (MLG) c. 1250–1600

Text selection

Our three key criteria for text selection are that the texts for inclusion be a) in prose, b) not translated and c) clearly dated and localized. Such a corpus would be impossible to create for many historically attested languages (e.g. Old English), but for Middle Low German we are fortunate in that texts that meet these criteria are key text types in the language. We will be including texts of three types:

  • Charters (Urkundenbücher)
  • Laws (Stadtrechte)
  • Chronicles

In this methodology we follow corpora of historical Dutch (cf. Reenen and Mulder 2000; Coupé and van Kemenade 2009).

As regards the geography, we will be taking texts from five data points:

  • Westphalian (Münster)
  • Eastphalian Altland (Braunschweig)
  • Eastphalian Neuland (Magdeburg)
  • North Low Saxon Altland (Oldenburg)
  • North Low Saxon Neuland (Lübeck)
Map showing the locations of the five data points

Work so far has focused on the Oldenburg charters as a pilot for the larger project.