We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
We introduce CiteSeer-API, a public API to CiteSeer-like services. CiteSeer-API is SOAP/WSDL based and allows for easy programatical access to all the specific functionalities off...
Yves Petinot, C. Lee Giles, Vivek Bhatnagar, Prade...
Some large scale topical digital libraries, such as CiteSeer, harvest online academic documents by crawling open-access archives, university and author homepages, and authors’ s...
This paper presents the LIG contribution to the CLEF 2007 medical retrieval task (i.e. ImageCLEFmed). The main idea in this paper is to incorporate medical knowledge in the langua...
This paper focuses on the creation of a first order predicate calculus based regulation compliance-assistance system built upon an XML framework. Two areas of research that suppor...