Sciweavers

1319 search results - page 118 / 264
» Using the Structure of HTML Documents to Improve Retrieval
Sort
View
134
Voted
CLEF
2005
Springer
15 years 9 months ago
UNED at ImageCLEF 2005: Automatically Structured Queries with Named Entities over Metadata
In this paper, we present our participation in the ImageCLEF 2005 ad-hoc track. First, we describe a preliminary pool of cross-language experiments with the ImageCLEF 2004 testbed...
Víctor Peinado, Fernando López-Osten...
143
Voted
SIGIR
2006
ACM
15 years 9 months ago
Near-duplicate detection by instance-level constrained clustering
For the task of near-duplicated document detection, both traditional fingerprinting techniques used in database community and bag-of-word comparison approaches used in information...
Hui Yang, James P. Callan
127
Voted
ER
2006
Springer
123views Database» more  ER 2006»
15 years 7 months ago
A Quantitative Summary of XML Structures
Statistical summaries in relational databases mainly focus on the distribution of data values and have been found useful for various applications, such as query evaluation and data...
Zi Lin, Bingsheng He, Byron Choi
128
Voted
WEBDB
2004
Springer
125views Database» more  WEBDB 2004»
15 years 9 months ago
Best-Match Querying from Document-Centric XML
On the Web, there is a pervasive use of XML to give lightweight semantics to textual collections. Such documentcentric XML collections require a query language that can gracefully...
Jaap Kamps, Maarten Marx, Maarten de Rijke, Bö...
DAS
2008
Springer
15 years 5 months ago
A Fast Preprocessing Method for Table Boundary Detection: Narrowing Down the Sparse Lines Using Solely Coordinate Information
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...
Ying Liu, Prasenjit Mitra, C. Lee Giles