Sciweavers

563 search results - page 65 / 113
» Crawling the web for structured documents
Sort
View
ACL
1998
15 years 3 months ago
Automatic Text Summarization Based on the Global Document Annotation
The GDA (Global Document Annotation) project proposes a tag set which allows machines to automatically infer the underlying semantic/pragmatic structure of documents. Its objectiv...
Katashi Nagao, Kôiti Hasida
NAACL
2010
14 years 12 months ago
Extracting Parallel Sentences from Comparable Corpora using Document Level Alignment
The quality of a statistical machine translation (SMT) system is heavily dependent upon the amount of parallel sentences used in training. In recent years, there have been several...
Jason R. Smith, Chris Quirk, Kristina Toutanova
CEEMAS
2005
Springer
15 years 7 months ago
Selection in Scale-Free Small World
Abstract. In this paper we compare our selection based learning algorithm with the reinforcement learning algorithm in Web crawlers. The task of the crawlers is to find new inform...
Zsolt Palotai, Csilla Farkas, András Lö...
CIKM
2005
Springer
15 years 7 months ago
Maximal termsets as a query structuring mechanism
Search engines process queries conjunctively to restrict the size of the answer set. Further, it is not rare to observe a mismatch between the vocabulary used in the text of Web p...
Bruno Pôssas, Nivio Ziviani, Berthier A. Rib...
WWW
2003
ACM
16 years 2 months ago
Improving pseudo-relevance feedback in web information retrieval using web page segmentation
In contrast to traditional document retrieval, a web page as a whole is not a good information unit to search because it often contains multiple topics and a lot of irrelevant inf...
Shipeng Yu, Deng Cai, Ji-Rong Wen, Wei-Ying Ma