Evaluating Long Range Reordering with Permutation-Forests
Milos Stanojevic, Khalil Sima'an

Abstract:
Automatically evaluating the quality of word order of MT systems is
challenging yet crucial for MT evaluation. Existing approaches em-
ploy string-based metrics, which are computed over the permutations
of word positions in system output relative to a reference transla-
tion. We introduce a new metric computed over Permutation Forests
(PEFs), tree-based representations that decompose permutations re-
cursively. Relative to string-based metrics, PEFs offer advantages
for evaluating long range reordering. We compare the present PEFs
metric against five known reordering metrics on WMT13 data for
ten language pairs. The PEFs metric shows better correlation with
human ranking than the other metrics almost on all language pairs.
None of the other metrics exhibits as stable behavior across language
pairs.