Sciweavers

23 search results - page 2 / 5
» Focused web crawling in the acquisition of comparable corpor...
Sort
View
ICDM
2008
IEEE
186views Data Mining» more  ICDM 2008»
13 years 11 months ago
xCrawl: A High-Recall Crawling Method for Web Mining
Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The first step in the Information Extract...
Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...
ADMA
2009
Springer
142views Data Mining» more  ADMA 2009»
13 years 11 months ago
Crawling Deep Web Using a New Set Covering Algorithm
Abstract. Crawling the deep web often requires the selection of an appropriate set of queries so that they can cover most of the documents in the data source with low cost. This ca...
Yan Wang, Jianguo Lu, Jessica Chen
ERCIMDL
2003
Springer
106views Education» more  ERCIMDL 2003»
13 years 10 months ago
Topical Crawling for Business Intelligence
Abstract. The Web provides us with a vast resource for business intelligence. However, the large size of the Web and its dynamic nature make the task of foraging appropriate inform...
Gautam Pant, Filippo Menczer
KCAP
2005
ACM
13 years 10 months ago
Collecting paraphrase corpora from volunteer contributors
Extensive and deep paraphrase corpora are important for a variety of natural language processing and user interaction tasks. In this paper, we present an approach which i) collect...
Timothy Chklovski
COLING
2010
12 years 11 months ago
Automatic Acquisition of Lexical Formality
There has been relatively little work focused on determining the formality level of individual lexical items. This study applies information from large mixedgenre corpora, demonst...
Julian Brooke, Tong Wang, Graeme Hirst