We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
Today, news browsing and searching is one of the most important Internet activity. This paper introduces a general framework to build a News search engine by describing Velthune, ...
The process of recruiting employees has changed since the internet entered enterprises. From simply posting job ads and information on the internet to online application forms and...
Sven Laumer, Alexander von Stetten, Andreas Eckhar...
Abstract. Information retrieval can contribute towards the construction of ontologies and the effective usage of ontologies. We use collocation-based keyword extraction to suggest ...
Willem Robert van Hage, Maarten de Rijke, Maarten ...
We have studied the automatic construction of a multilingual citation index by collecting Postscript and PDF files from the Internet. We propose a method to identify duplicate bibl...