Corpus of Historical Low German

HeliPaD: Morphological annotation

Where the HeliPaD follows other Penn historical corpora, the text will be marked like this.

Where the HeliPaD does its own thing, the text will be marked like this.

There's also a page with a summary of these differences.

See also the List of tags and empty categories.


Verbs

All finite verbs have person and number marked as attributes. See Additional attributes.

The ambiguity tags VBP and VBD etc., for formally ambiguous indicative/subjunctive/imperative verbs, are not used. (Verb form classification follows Köbler.)

The tag AX*, for auxiliary verbs, is not used.

Participles are divided into inflected (VGI, VNI, etc.) and uninflected (VG, VN, etc.) tags. The inflected tags take nominal attributes.

Modal verbs (MD, etc.)

  • MD: infinitive
  • MDI: imperative
  • MDPI: present indicative
  • MDPS: present subjunctive
  • MDDI: past indicative
  • MDDS: past subjunctive
  • MG: present participle, uninflected
  • MGI: present participle, inflected (not used)
  • MN: past participle, uninflected (not used)
  • MNI: past participle, inflected (not used)

Modals are a closed class including (always and) only the following: kunnan, motan, mugan, skulan, thurvan, willian.

Wita (UTP)

Wita, which is used to introduce hortative clauses ("let us..."), is tagged UTP.

Have, be and become (HV, BE, RD, etc.)

  • HV: infinitive
  • HVI: imperative
  • HVPI: present indicative
  • HVPS: present subjunctive
  • HVDI: past indicative
  • HVDS: past subjunctive
  • HG: present participle, uninflected (not used)
  • HGI: present participle, inflected (not used)
  • HN: past participle, uninflected (not used)
  • HNI: past participle, inflected (not used)
  • BE: infinitive
  • BEI: imperative
  • BEPI: present indicative
  • BEPS: present subjunctive
  • BEDI: past indicative
  • BEDS: past subjunctive
  • BG: present participle, uninflected (not used)
  • BGI: present participle, inflected (not used)
  • BN: past participle, uninflected (not used)
  • BNI: past participle, inflected (not used)
  • RD: infinitive
  • RDI: imperative (not used)
  • RDPI: present indicative
  • RDPS: present subjunctive
  • RDDI: past indicative
  • RDDS: past subjunctive
  • RG: present participle, uninflected (not used)
  • RGI: present participle, inflected (not used)
  • RN: past participle, uninflected
  • RNI: past participle, inflected (not used)

There is a one-to-one mapping between the lemma hebbian and the tags HV*/HG*/HN*, between wesan and BE*/BG*/BN*, and between werthan and RD*/RG*/RN*.

Forms of werthan are tagged RD*/RG*/RN*, as in the IcePaHC and ENHG Parsed Corpus.

Lexical verbs (VB, etc.)

  • VB: infinitive
  • VBI: imperative
  • VBPI: present indicative
  • VBPS: present subjunctive
  • VBDI: past indicative
  • VBDS: past subjunctive
  • VG: present participle, uninflected
  • VGI: present participle, inflected
  • VN: past participle, uninflected
  • VNI: past participle, inflected

All remaining verbs are labelled VB*/VG*/VN*.

Particles, prefixes, clitics (RP, GE, NEG)

The tag RP is closed class, and used for the particles an, to, up and ut. It does not occur prefixed to verbs as in the YCOE.

The tag GE has a one-to-one mapping with the prefix gi-. It never occurs independently, but always prefixed/cliticized to a verbal form. Nominal gi- is not tagged in this way, except in the context of gihwilik and gihwe.

The negative particle ne (NEG) can occur either independently or prefixed/cliticized to a verbal, adverbial, or nominal form, or to a conjunction.

Clitic forms (separated by a plus sign) are not reflected in the lemmatization.

To-infinitives (TO)

The tag TO is used for forms of to when co-occurring with inflected infinitives.

Inflected infinitives are not given special treatment, unlike in the YCOE. They can always be retrieved due to their co-occurrence with TO within an IP-INF.


Nominal words

All nominal words have case and number marked as attributes. Pronouns also have person marked as an attribute. See Additional attributes. Attribute annotation follows Köbler. Where in doubt, nominative has been preferred over accusative over dative over genitive over instrumental.

NP-internal agreement is forced wherever possible, thus allowing ambiguous elements to be tagged (the HeliPaD's approach to attributes is "maximalist"). The main exception to this is with instrumental elements, which often co-occur with formally dative elements. This is not treated as a case clash, and instrumental is only preferred where unambiguous.

Nouns (N, NPR)

All singular, plural, collective, and compound nouns are tagged as N. See the syntactic manual on Noun Phrases for details of compounding.

Proper nouns are tagged as NPR (not NR as in the YCOE).

Pronouns (PRO, PRO$, MAN)

MAN is used for singular, unmodified man subjects.

PRO is used for personal pronouns. They can be used as reflexives in the HeliPaD, but this is indicated at phrase level. PRO is a closed class consisting of the following lemmas: ik, wit, we, thu, git, gi, he, siu, it.

Subject pronouns enclitic to verbs are always separated out for the purposes of parsing. This is done using the dollar sign ($) and without adding or removing segments.

PRO$ is used for possessive "pronouns", and is also a closed class. Like other nominal categories, it is annotated for case and number, and also for person. The third person forms is and iru are never inflected, but receive attributes anyway, in agreement with other elements in their phrase. (When there are no such other elements these are treated as genitive pronoun forms.) The other forms - min, unka, usa, thin, inka, iuwa, and the reflexive sin - are formally inflected and this is reflected in the annotation for attributes.

Adjectives (ADJ, ADJR, ADJS)

ADJR and ADJS are used for comparative and superlative adjectives respectively. All other adjectives are ADJ.

Unlike in the YCOE, where a weak adjective is used nominally (i.e. without a noun head), it normally retains its adjectival tag.

Ordinal numbers, including othar, are also tagged ADJ (and not NUM). erist is treated as superlative, and may also be tagged ADVS when it is a temporal adverb.

The adjectives mikil and luttil are tagged as adjectives, even when they are clearly quantifiers. Cognates in the YCOE and other Penn corpora are treated in the exact opposite way.

Self and sulik are tagged as ADJ.

Annotation as an inflected participle (VGI, VNI, etc.) is always preferred to annotation as an adjective, if possible.

Quantifiers (Q, QR, QS)

Quantifiers are a closed class, and include all and only the following: al, bethia, enhwilik, enig, filo, manag, sum. These elements have a tendency not to inflect (and filo never does). When they do inflect, they do so in a similar way to adjectives, but are never tagged as such. They always bear case and number attributes in the annotation. al can also be an adverb on occasion.

Wh-indefinites such as hwilik are always tagged as W* and not as Q, regardless of their syntactic role, which is disambiguated at phrasal level.

nigen is treated as NEG+Q, and neowiht as NEG+N.

Some apparently quantificational elements such as wiht, eowiht etc. are treated as nouns in the HeliPaD rather than as quantifiers as the corresponding items are in the YCOE.

Numerals (NUM)

Numerals, when cardinal, are tagged NUM; this is in principle a closed class. half is treated as a numeral. en is treated as a numeral, even when it seems to be an indefinite determiner, means "alone" and/or has a focus reading.

Determiners (D)

Determiners are a closed class which includes the (in many forms), the distal demonstrative/article, and these, the proximal demonstrative.


Other words

The FP (focus particle) and XX (problematic word) tags are not used in the HeliPaD, mainly since there is no call for them in the current material.

Adverbs (ADV, ADVR, ADVS, ALSO)

Adverbs do not bear extended tags ^T, ^L and ^D for temporal, locative and directional, as they do in the YCOE. This information is retrievable from the phrasal extended label and from the lemma.

The following adverbs always head ADVP-TMP: aftar, eft, eo, er, erist, forn, get, ju, hald, hindag, hiudu, hwanna, lang, lango, noh, nu, oft, san, sana, simbla, simblon, sith, sithor, sniumo, tho, und.

The following adverbs always head ADVP-DIR: angegin, ellior, fer, ferran, forana, herod, herodwardes, hinan, in, nithana, nithar, north, ostana, ostar, tegegnes, thanan, tharod, thurh, towardes, westan, westar, witharwardes.

The following adverbs always head ADVP-LOC: bihindan, biovan, foran, her, innan, nithara, sundar, thar, uppan, up, uta, wido.

The following adverbs may head either ADVP-TMP or ADVP-DIR: forth, forthward, furthor.

The following adverbs may head either ADVP-DIR or ADVP-LOC: hoho, nah, ostan, ovana.

biforan may be temporal or locative. than may be temporal or atemporal. neo (lemma eo) is tagged NEG+ADV.

The word ok is tagged ALSO (the cognate is ADV in the YCOE). ALSO does not head a phrase, may modify adjectives, and often co-occurs with conjunctions within a CONJP.

Some words function as both adverbs and prepositions. An analysis as a (stranded) preposition is always preferred when a possible complement is present.

Prepositions (P)

The P tag is always and only used for prepositions with a complement. Otherwise, these elements are labelled ADV or RP.

Subordinators are not tagged as P in the HeliPaD. They are treated as either complementizers or adverbs.

The following words may be tagged P: af, aftar, an, and, angegin, ano, at, bi, biforan, butan, er, fan, farutar, for, fram, furi, in, innan, inne, mid, newa, newan, ovar, te, thurh, to, twisk, um, umbi, und, undar, up, uppa, uppan, uta, with, withar.

butan, newa and newan can take a that-clause complement, though newa is more usually C.

Interjections (INTJ)

INTJ is a small closed class only used when no other analysis is available. The HeliPaD contains the following interjections: ja, nen, sinu, wela, wola. "Interjectional" hwat (lemma hwe) is WADV and dominated by an INTJP.

Complementizers and conjunctions (C, CONJ)

The following may be tagged as co-ordinating conjunctions: ak, eftha, endi, ge, ja, jak, noh, the.

ne and nek, when used as conjunctions, are tagged NEG+CONJ.

The word ok is tagged ALSO (the cognate is ADV in the YCOE). ALSO does not head a phrase, may modify adjectives, and often co-occurs with conjunctions within a CONJP.

Subordinating conjunctions are treated as adverbs in adverb phrases if they are homophonous with adverbs, as they usually are, and never as prepositions.

The genuine complementizer tag (C) is limited to ef, hwand, newa, so, than in comparatives, that, the, and untat. In addition, ne when used as a complementizer is tagged NEG+C. Most of the time, in the HeliPaD, C is null.

Foreign words (FW)

Unintegrated foreign words, always Latin in the HeliPaD, are labelled FW.

Wh-words

All wh-words are closed class. The following tags exist: WADJ, WADV, WPRO, WPRO$ (not used), WQ.

Morphologically wh- elements are tagged using the W* tags even when they are not part of an extraction structure. Since Old Saxon is particularly flexible in using wh-words as indefinites, this is quite important.

WADJ is used for hwilik, which may head a (W)ADJP, or may be part of a (W)NP (when it is not the head). WADJ takes the attributes of an adjective. In non-extraction structures, gihwilik occurs and is tagged GE+WADJ. The tag WADJ is also used for hwethar when it means "which of two".

WADV is used for bihwi, hwan (projecting (W)ADVP-TMP), hwanan (projecting (W)ADVP-DIR), hwar (projecting (W)ADVP-LOC), hwarod (projecting (W)ADVP-DIR), and hwo. It is also used for "interjectional" hwat (lemma hwe), which is dominated by INTJP.

The instrumental form hwi (lemma hwe) meaning "why", when used alone to form a question, is treated as WADV as in the YCOE. When it occurs with a preposition (e.g. te hwi), it is treated as WPRO. The fixed combination bihwi is always WADV.

WPRO behaves like other pronouns in terms of its morphology and attributes, though does not take person or number. It is used for the generic wh-element hwe, except for "interjectional" hwat. In non-extraction structures, gihwe occurs and is tagged GE+WPRO.

WPRO$; is the wh-counterpart of PRO$, and could in principle be used for non-head instances of the genitive hwes. No instances have been found in the HeliPaD.

When it introduces a yes-no question, hwethar receives the tag WQ, which takes no attributes.


Non-words

Punctuation (, . ' ")

Editorial speech marks, either single or double, are tagged as themselves (' or ").

All other punctuation is either tagged as . (if it is token-final, modulo speech marks and CODE elements) or as , (otherwise). Remember that CorpusSearch ignores punctuation by default.

Metalinguistic information (CODE)

The tag CODE is used for the following things (which, when they occur, occur in the following order):

  • Sievers edition page: e.g. P_7
  • Manuscript page: e.g. MS_5a
  • Fitt: e.g. F_1
  • Line: e.g. R_1
  • Caesura (half-line break): C
  • Other comments (mostly omissions): e.g. COM:OMISSION