Nearest neighbour (NN) searches and k nearest neighbour (k-NN) searches are widely used in pattern recognition and image retrieval. An NN (k-NN) search finds the closest object (...
Near-duplicate web documents are abundant. Two such documents differ from each other in a very small portion that displays advertisements, for example. Such differences are irrele...
A language for semi-structured documents, XML has emerged as the core of the web services architecture, and is playing crucial roles in messaging systems, databases, and document p...
This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we also present a large number of f...
We show how web mark-up can be used to improve unsupervised dependency parsing. Starting from raw bracketings of four common HTML tags (anchors, bold, italics and underlines), we ...
Valentin I. Spitkovsky, Daniel Jurafsky, Hiyan Als...