A syllable frequency list for Dutch
Willem Zuidema

Abstract:
The Corpus Gesproken Nederlands (CGN) is a large corpus of spoken
Dutch, partly annotated with syntactic and phonological information
(see http://lands.let.kun.nl/cgn/). Although it contains files with
syllabified words, and word frequency counts, there is no direct way to
extract from it a list of syllable frequencies. This document describes
some simple scripts to combine the relevant information from various CGN
files (using version 6 and the linux utilities grep, sed, sort, uniq,
awk, cut and paste), and gives a complete list of syllable frequencies
obtained by running the scripts. The list is made available in the hope
that it might be helpful, for instance for experimental studies where
one must control for syllable frequency.  Depending on the intended use
or required level of accuracy, the scripts might have to be adapted and
the frequency counts changed accordingly.