Sciweavers

492 search results - page 17 / 99
» Data quality in web archiving
Sort
View
INCDM
2010
Springer
125views Data Mining» more  INCDM 2010»
14 years 11 months ago
Web-Site Boundary Detection
Defining the boundaries of a web-site, for (say) archiving or information retrieval purposes, is an important but complicated task. In this paper a web-page clustering approach to...
Ayesh Alshukri, Frans Coenen, Michele Zito
SIGMOD
2000
ACM
85views Database» more  SIGMOD 2000»
15 years 2 months ago
Finding Replicated Web Collections
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Junghoo Cho, Narayanan Shivakumar, Hector Garcia-M...
EDBT
2006
ACM
137views Database» more  EDBT 2006»
15 years 9 months ago
IQN Routing: Integrating Quality and Novelty in P2P Querying and Ranking
Abstract. We consider a collaboration of peers autonomously crawling the Web. A pivotal issue when designing a peer-to-peer (P2P) Web search engine in this environment is query rou...
Sebastian Michel, Matthias Bender, Peter Triantafi...
ICDM
2008
IEEE
235views Data Mining» more  ICDM 2008»
15 years 4 months ago
DECK: Detecting Events from Web Click-Through Data
In the past few years there has been increased research interest in detecting previously unidentified events from Web resources. Our focus in this paper is to detect events from ...
Ling Chen 0002, Yiqun Hu, Wolfgang Nejdl
BMCBI
2004
106views more  BMCBI 2004»
14 years 9 months ago
ESTIMA, a tool for EST management in a multi-project environment
Background: Single-pass, partial sequencing of complementary DNA (cDNA) libraries generates thousands of chromatograms that are processed into high quality expressed sequence tags...
Charu G. Kumar, Richard LeDuc, George Gong, Levan ...