Is the End of Supervised Parsing in Sight?
Rens Bod

Abstract:
How far can we get with unsupervised parsing if we make our training
corpus several orders of magnitude larger than has hitherto be
attempted? We present a new algorithm for unsupervised parsing using
an all-subtrees model, termed U-DOP*, which parses directly with
packed forests of all binary trees. We train both on Penn's WSJ data
and on the (much larger) NANC corpus, showing that U-DOP* outperforms
a treebank-PCFG on the standard WSJ test set. While U-DOP* performs
worse than state-of-the-art supervised parsers on handannotated
sentences, we show that the model outperforms supervised parsers when
evaluated as a language model in syntax-based machine translation on
Europarl. We argue that supervised parsers miss the fluidity between
constituents and non-constituents and that in the field of
syntax-based language modeling the end of supervised parsing has come
in sight.