Click data captures many users’ document preferences for a query and has been shown to help significantly improve search engine ranking. However, most click data is noisy and of...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
- We present an architecture for data streams based on structures typically found in web cache hierarchies. The main idea is to build a meta level analyser from a number of levels ...
Geoffrey Holmes, Bernhard Pfahringer, Richard Kirk...
Documents in languages such as Chinese, Japanese and Korean sometimes annotate terms with their translations in English inside a pair of parentheses. We present a method to extrac...
Dekang Lin, Shaojun Zhao, Benjamin Van Durme, Mari...
RESTful services on the Web expose information through retrievable resource representations that represent self-describing descriptions of resources, and through the way how these...