Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
With the explosion of the Internet the World Wide Web today has become an infinite source of information. Hence, it is important that one be able to categorize, understand and be a...
Vishal Anand, Keith Hansen, Radu Jianu, Adrian Rus...
Due to their capability for expressing semantics and relationships among data objects, semi-structured documents have become a common way of representing domain knowledge. Compari...
Henry Tan, Tharam S. Dillon, Fedja Hadzic, Elizabe...
We present a theoretical analysis of supervised ranking, providing necessary and sufficient conditions for the asymptotic consistency of algorithms based on minimizing a surrogate...
In this paper, we develop a novel Web Usage Manipulation Language (WUML) which is a declarative language for manipulating Web log data. We assume that a set of trails formed by use...