Sciweavers

34 search results - page 2 / 7
» Mining the Web to Create Minority Language Corpora
Sort
View
SIGIR
2004
ACM
13 years 10 months ago
Translating unknown queries with web corpora for cross-language information retrieval
It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries1 due to that real queries might be short. The purpose of this...
Pu-Jen Cheng, Jei-Wen Teng, Ruey-Cheng Chen, Jenq-...
EACL
2006
ACL Anthology
13 years 6 months ago
Web Text Corpus for Natural Language Processing
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
Vinci Liu, James R. Curran
AMTA
1998
Springer
13 years 9 months ago
Parallel Strands: A Preliminary Investigation into Mining the Web for Bilingual Text
Abstract. Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genreand domain-speci city, licensing restri...
Philip Resnik
LREC
2008
108views Education» more  LREC 2008»
13 years 6 months ago
A Lightweight and Efficient Tool for Cleaning Web Pages
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Stefan Evert
OTM
2005
Springer
13 years 11 months ago
Creating Ontologies for Content Representation-The OntoSeed Suite
Abstract. Due to the inherent difficulties associated with manual ontology building, knowledge acquisition and reuse are often seen as methods that can make this tedious process ea...
Elena Paslaru Bontas, David Schlangen, Thomas Schr...