Peer-Reviewed Journal Details
Mandatory Fields
Cahill A.;Burke M.;Forst M.;O'Donovan R.;Rohrer C.;Van Genabith J.;Way A.
2005
July
Research on Language and Computation
Treebank-based acquisition of multilingual unification grammar resources
Published
()
Optional Fields
3
2
247
279
Deep unification- (constraint-)based grammars are usually hand-crafted. Scaling such grammars from fragments to unrestricted text is time-consuming and expensive. This problem can be exacerbated in multilingual broad-coverage grammar development scenarios. Cahill et al. (2002, 2004) and O'Donovan et al. (2004) present an automatic f-structure annotation-based methodology to acquire broad-coverage, deep, Lexical-Functional Grammar (LFG) resources for English from the Penn-II Treebank. In this paper we show how this model can be adapted to a multilingual grammar development scenario to induce robust, wide-coverage, PCFG-based LFG approximations for German from the TIGER Treebank. We show how the architecture of LFG, in particular the distinction between c-structure and f-structure representations, facilitates multilingual, treebank-based unification grammar induction, allowing us to cross-linguistically reuse the lexical extraction and parsing modules from O'Donovan et al. (2004) and Cahill et al. (2004), respectively. We evaluate our grammars against the PARC 700 Dependency Bank (King et al., 2003), against dependency structures for 2000 held-out sentences from the TIGER Corpus as well as against a hand-crafted dependency gold standard for 100 TIGER trees. Currently, our resources achieve 81.79% f-score against the PARC 700, a 2.19% improvement over the best result reported for a hand-crafted grammar in Kaplan et al. (2004), 74.6% against the 2000 held-out TIGER dependency structures and 71.08% against the 100-sentence TIGER gold standard, with substantially improved coverage compared to hand-crafted resources. We have since applied our methodology to induce wide-coverage LFG resources for Chinese (Burke et al., 2004b) from the Penn Chinese Treebank (Xue et al., 2002) and for Spanish from the CAST3LB Treebank (Civit, 2003). © Springer 2005.
1570-7075
10.1007/s11168-005-1296-y
Grant Details