We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closel...
User-centered consultation systems can be viewed as systems with trust in the competence of the user, whereas conventional knowledge systems are often based on some mistrust. By o...
Georg Buscher, Joachim Baumeister, Frank Puppe, Di...
Web Directories are repositories of Web pages organized in a hierarchy of topics and sub-topics. In this paper, we present DirectoryRank, a ranking framework that orders the pages...
Vlassis Krikos, Sofia Stamou, Pavlos Kokosis, Alex...
This paper presents a novel formulation and approach to the minimal document set retrieval problem. Minimal Document Set Retrieval (MDSR) is a promising information retrieval task...