Abstract. Different strategies to learn user semantic queries from dissimilarity representations of video audio-visual content are presented. When dealing with large corpora of vi...
Automatically generated HTML, as produced by WYSIWYG programs, typically contains much repetitive and unnecessary markup. This paper identifies aspects of such HTML that may be al...
An information retrieval IR engine can rank documents based on textual proximityof keywords within each document. In this paper we apply this notion to search across an entire dat...
Roy Goldman, Narayanan Shivakumar, Suresh Venkatas...
In this paper we explore the effectiveness of three clustering methods used to perform word image indexing. The three methods are: the Self-Organazing Map (SOM), the Growing Hiera...
The main text content of an HTML document on the WWW is typically surrounded by additional contents, such as navigation menus, advertisements, link lists or design elements. Conte...