Using finite-state automata for the text analysis component in a text-to-speech system is problematic in several respects: the rewrite rules from which the automata are compiled a...
In this paper we provide a fast, data-driven solution to the failing query problem: given a query that returns an empty answer, how can one relax the query's constraints so t...
Information extraction from HTML pages has been conventionally treated as plain text documents extended with HTML tags. However, the growing maturity and correct usage of HTML/XHT...
We present an application of the analytical inductive programming system Igor to learning sets of recursive rules from positive experience. We propose that this approach can be us...
Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...