About – A parsed corpus of Southern Dutch dialects

The parsed corpus of Southern Dutch Dialects (GCND) is the first corpus of spoken Dutch dialects. The project makes a unique collection of dialect recordings accessible from 768 places in Belgium, the north of France and the south of the Netherlands with speakers that are generally non-mobile, rural, unschooled and born around 1900. For this project the collection Voices from the Past was supplemented with 30 new recordings from Brussels, Flemish Brabant and Limburg and 73 existing recordings from the Meertens Institute from the south of the Netherlands.

The recordings were transcribed according to a newly developed transcription protocol – highly urgent in times of rapidly progressing dialect loss! – to then be linguistically enriched using existing tools with information on the word type of the individual words (‘pos-tags’) and with information on the syntactic functions of the word groups and their interrelationships (‘parsing’).

Compared to existing data collections on Dutch dialects, the GCND is unique in that it contains only spontaneous speech. Since the dialect recordings represent a historical stage of the language (in the case of Franco-Flemish even the last testimonies of a now almost extinct language variety) and the recordings are now efficiently searchable, the GCND makes it possible to (i) map language change processes geographically, (ii) quantitatively investigate the functionality of dialect features, and (iii) detect new, previously unnoticed and thus unrequested structures. Audio, transcriptions and annotations are freely available and searchable online. The GCND thus constitutes an unprecedented historical dialect corpus.

A follow-up project (GCND+) has started in cooperation with the Instituut voor de Nederlandse Taal (INT) and LT3 – Language and Translation Technology Team (UGent) to extend the collection further north.

Using the GCND

The corpus can be accessed via the link below: https://gcnd.ivdnt.org

It can only be accessed with a username and password. Users employed by a university, a college or a research institute can log in with their organisation’s username and password. Users not associated with an academic institution can also access the corpus, but they must first apply for an account at www.clarin.eu.

Detailed information on the Spoken Corpus of Southern Dutch Dialects (GCND) can be found here.

Funding

2024-2028: FWO medium-sice research infrastructure grant I.0.021.24N (GCND+)

2020-2024: FWO medium-size research infrastructure grant I.0.101.20N (GCND)

2018-2020: FWO small research grant 1.5.310.18N to A. Breitbarth (pilot project)

2018-2021: FWO postdoctoral mandate junior 1.2.P79.19N to M. Farasyn (French Flemish recordings)

2021-2024: FWO postdoctoral mandate senior 1.2.P79.22N to M. Farasyn (French Flemish recordings)

2019-2021: Subsidies from the provinces of Zeeland, West-Flanders and East-Flanders (pilot project)