As ancient documents are being digitized, systems for retrieving documents or images can now be found in Digital Libraries. With regard to illustrations, the content-based image r...
In this paper we study the problem of collecting training samples for building enterprise taxonomies. We develop a computer-aided tool named InfoAnalyzer, which can effectively as...
Abstract. Latent Semantic Indexing(LSI) has been proved to be effective to capture the semantic structure of document collections. It is widely used in content-based text retrieval...
We propose a new webpage ranking algorithm which is personalized. Our idea is to rely on the attention time spent on a document by the user as the essential clue for producing the...
Web content is notoriously difficult to capture on a printed page due to inconsistent and undesired results. Items that users may not want to print, such as media, navigation menu...