Sciweavers

1319 search results - page 173 / 264
» Using the Structure of HTML Documents to Improve Retrieval
Sort
View
ACSC
2006
IEEE
15 years 9 months ago
Improvements of TLAESA nearest neighbour search algorithm and extension to approximation search
Nearest neighbour (NN) searches and k nearest neighbour (k-NN) searches are widely used in pattern recognition and image retrieval. An NN (k-NN) search finds the closest object (...
Ken Tokoro, Kazuaki Yamaguchi, Sumio Masuda
WWW
2007
ACM
16 years 4 months ago
Detecting near-duplicates for web crawling
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
Gurmeet Singh Manku, Arvind Jain, Anish Das Sarma
131
Voted
GRID
2006
Springer
15 years 3 months ago
A Parallel Approach to XML Parsing
A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document p...
Wei Lu, Kenneth Chiu, Yinfei Pan
139
Voted
SIGIR
2008
ACM
15 years 3 months ago
A study of learning a merge model for multilingual information retrieval
This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we also present a large number of f...
Ming-Feng Tsai, Yu-Ting Wang, Hsin-Hsi Chen
114
Voted
ACL
2010
15 years 1 months ago
Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we ...
Valentin I. Spitkovsky, Daniel Jurafsky, Hiyan Als...