Language models used in current automatic speech recognition systems are trained on general-purpose corpora and are therefore not relevant to transcribe spoken documents dealing w...
Abstract. Cross-lingual event tracking from a very large number of information sources (thousands of Web sites, for example) is an open challenge. In this paper we investigate effe...
Abstract— Recent advances in graph-based search techniques derived from Kleinberg’s work [1] have been impressive. This paper further improves the graph-based search algorithm ...
Abstract. Newistic is a web mining platform that collects and analyses documents crawled from the Internet. Although it currently processes news articles, it can be easily adapted ...
A number of web cache-related algorithms, such as replacement and prefetching policies, rely on specific characteristics present in the sequence of requests for efficient performa...
Konstantinos Psounis, An Zhu, Balaji Prabhakar, Ra...