This paper identifies and explores the problem of seed selection in a web-scale crawler. We argue that seed selection is not a trivial but very important problem. Selecting proper...
Multimedia uploaded content is tagged and recommended by users of collaborative systems, resulting in informal classifications also known as folksonomies. Faceted web ranking has ...
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the...
Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Tr...
Current-day crawlers retrieve content only from the publicly indexable Web, i.e., the set of Web pages reachable purely by following hypertext links, ignoring search forms and pag...
— This paper introduces a novel keyword searching paradigm in Relational Databases (DBs), where the result of a search is a ranked set of Object Summaries (OSs). An OS summarizes...