Bayesian Model Merging for Unsupervised Constituent Labeling and Grammar Induction
Gideon Borensztajn, Willem Zuidema

Abstract:
Recent research on unsupervised grammar induction has focused on
inducing accurate bracketing of sentences. Here we present an
efficient, Bayesian algorithm for the unsupervised induction of
syntactic categories from such bracketed text. Our model gives
state-of-the-art results on this task, using gold-standard bracketing,
outperforming the recent semi-supervised approach of (Haghighi &
Klein, 2006), obtaining an F_1 of 76.8% (when appropriately
relabeled).  Our algorithm assigns comparable likelihood to unseen
text as the treebank PCFG. Finally, we discuss the metrics used and
linguistic relevance of the results.