Sciweavers

63 search results - page 1 / 13
» Large Linguistically-Processed Web Corpora for Multiple Lang...
Sort
View
EACL
2006
ACL Anthology
13 years 6 months ago
Large Linguistically-Processed Web Corpora for Multiple Languages
The Web contains vast amounts of linguistic data. One key issue for linguists and language technologists is how to access it. Commercial search engines give highly compromised acc...
Marco Baroni, Adam Kilgarriff
LREC
2010
217views Education» more  LREC 2010»
13 years 6 months ago
Building a Web Corpus of Czech
Large corpora are essential to modern methods of computational linguistics and natural language processing. In this paper, we describe an ongoing project whose aim is to build a l...
Drahomíra "johanka" Spoustová, Miros...
LREC
2010
200views Education» more  LREC 2010»
13 years 6 months ago
A Corpus Factory for Many Languages
For many languages there are no large, general-language corpora available. Until the web, all but the richest institutions could do little but shake their heads in dismay as corpu...
Adam Kilgarriff, Siva Reddy, Jan Pomikálek,...
WWW
2006
ACM
14 years 5 months ago
WebKhoj: Indian language IR from multiple character encodings
Today web search engines provide the easiest way to reach information on the web. In this scenario, more than 95% of Indian language content on the web is not searchable due to mu...
Prasad Pingali, Jagadeesh Jagarlamudi, Vasudeva Va...
CIKM
2001
Springer
13 years 9 months ago
Mining the Web to Create Minority Language Corpora
The Web is a valuable source of language speci c resources but the process of collecting, organizing and utilizing these resources is di cult. We describe CorpusBuilder, an approa...
Rayid Ghani, Rosie Jones, Dunja Mladenic