In this paper we address the problem of organizing hidden-Web databases. Given a heterogeneous set of Web forms that serve as entry points to hidden-Web databases, our goal is to ...
Data acquisition is a major concern in text classification. The excessive human efforts required by conventional methods to build up quality training collection might not always b...
In this paper we will describe the Berkeley approaches to the GeoCLEF tasks for CLEF 2006. This year we used two separate systems for different tasks. Although of the systems both...
Collaborative annotation tools are in widespread use. The metadata from these systems can be mined to induce semantic relationships among Web objects (sites, pages, tags, concepts...
Research shows that comment spamming (comments which are unsolicited, unrelated, abusive, hateful, commercial advertisements etc) in online discussion forums has become a common p...