Existing HTML mark-up is used only to indicate the structure and lay-out of documents, but not the document semantics. As a result web documents are difficult to be semantically p...
In this paper, we propose a novel dependency language modeling approach for information retrieval. The approach extends the existing language modeling approach by relaxing the ind...
Recent work on language models for information retrieval has shown that smoothing language models is crucial for achieving good retrieval performance. Many different effective smo...
We present in this paper a combination of Machine Learning based Information Retrieval (IR) techniques and stochastic language modelling in a hierarchical system that extracts sur...
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the p...