Sciweavers

21 search results - page 4 / 5
» Title extraction from bodies of HTML documents and its appli...
Sort
View
WWW
2008
ACM
14 years 6 months ago
As we may perceive: finding the boundaries of compound documents on the web
This paper considers the problem of identifying on the Web compound documents (cDocs) ? groups of web pages that in aggregate constitute semantically coherent information entities...
Pavel Dmitriev
TREC
2008
13 years 6 months ago
UTDallas at TREC 2008 Blog Track
This paper describes our participation in the 2008 TREC Blog track. Our system consists of 3 components: data preprocessing, topic retrieval, and opinion finding. In the topic ret...
Bin Li, Feifan Liu, Yang Liu
CIDR
2003
164views Algorithms» more  CIDR 2003»
13 years 6 months ago
Capacity Bound-free Web Warehouse
Web cache technologies have been developed as an extension of CPU cache, by modifying LRU (Least Recently Used) algorithms. Actually in web cache systems, we can use disks and ter...
Yahiko Kambayashi, Kai Cheng
BMCBI
2007
177views more  BMCBI 2007»
13 years 5 months ago
The BioPrompt-box: an ontology-based clustering tool for searching in biological databases
Background: High-throughput molecular biology provides new data at an incredible rate, so that the increase in the size of biological databanks is enormous and very rapid. This sc...
Claudio Corsi, Paolo Ferragina, Roberto Marangoni
WWW
2005
ACM
14 years 6 months ago
The infocious web search engine: improving web searching through linguistic analysis
In this paper we present the Infocious Web search engine [23]. Our goal in creating Infocious is to improve the way people find information on the Web by resolving ambiguities pre...
Alexandros Ntoulas, Gerald Chao, Junghoo Cho