Brief project description
The ultimate goal of this project is to create thorough, corpus-based descriptions of syntactic patterns in Pite Saami, a highly endangered Uralic language spoken in Swedish Lapland. The corpus will consist of Pite Saami texts in the spoken mode representing more than a century of language use from the late 19th up to the early 21st centuries. With this in mind, the age of source data will be treated as a potential factor in explaining variation in attested patterns, thus allowing for the investigation of structural changes through time. Specifically, the project will use quantitative methods in attempting to answer the following main research questions. 1. Which constituents are possible and/or required in the various Pite Saami phrase and clause types? Is there a preference for certain structures? 2. What effect does information structure have on constituent structure? 3. Does the corpus provide evidence for diachronic changes in syntactic patterns? If so, which patterns are affected?
In order to carry out the investigation, an annotated corpus must first be created. To do this efficiently, extant language technology tools will be refined to automatically tag Pite Saami texts for lexeme, morphological categories and part-of-speech. The results of the project will be three- fold: 1. a book-length description of attested syntactic patterns; 2. a thoroughly annotated, digital spoken language corpus spanning more than a century of texts for an endangered Saami language, to be available for further research; and 3. a model for the use of language technology tools to automatically annotate a spoken language corpus for an endangered language. The planned syntactic description will provide new data concerning a hitherto under-described language. These data will not only be of interest to Uralic language scholars, particularly for historical-comparative studies, but also to synchronic comparative theoretical linguists with both formal and functional approaches.