Sciweavers

ACL
2001

Low-cost, High-Performance Translation Retrieval: Dumber is Better

13 years 11 months ago
Low-cost, High-Performance Translation Retrieval: Dumber is Better
In this paper, we compare the relative effects of segment order, segmentation and segment contiguity on the retrieval performance of a translation memory system. We take a selection of both bag-of-words and segment order-sensitive string comparison methods, and run each over both characterand word-segmented data, in combination with a range of local segment contiguity models (in the form of N-grams). Over two distinct datasets, we find that indexing according to simple character bigrams produces a retrieval accuracy superior to any of the tested word Ngram models. Further, in their optimum configuration, bag-of-words methods are shown to be equivalent to segment ordersensitive methods in terms of retrieval accuracy, but much faster. We also provide evidence that our findings are scalable.
Timothy Baldwin
Added 31 Oct 2010
Updated 31 Oct 2010
Type Conference
Year 2001
Where ACL
Authors Timothy Baldwin
Comments (0)