Data storage has become an important issue in sensor networks as a large amount of collected data need to be archived for future information retrieval. This paper introduces stora...
Abstract. Since current search engines employ link-based ranking algorithms as an important tool to decide a ranking of sites, Web spammers are making a significant effort to man...
Latent Semantic Indexing (LSI) has been validated to be effective on many small scale text collections. However, little evidence has shown its effectiveness on unsampled large sca...
Sensemaking tasks require users to perform complex research behaviors to gather and comprehend information from many sources. Such tasks are common and include, for example, resea...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...