In this paper we show how labelled dependencies produced by a Lexical-Functional Grammar parser can be used in Machine Translation evaluation. In contrast to most popular evaluation metrics based on surface string comparison, our dependency-based method does not unfairly penalize perfectly valid syntactic variations in the translation, shows less bias towards statistical models, and the addition of WordNet provides a way to accommodate lexical differences. In comparison with other metrics on a Chinese-English newswire text, our method obtains high correlation with human scores, both on a segment and system level. © 2008 Springer Science+Business Media B.V.