Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
Recent experiments and analysis suggest that there are about 800 million publicly-indexable Web pages. However, unlike books in a traditional library, Web pages continue to change...
An approach for reorganizing a Web site based on user access patterns is proposed. The Web server's log les and the Web pages on the site are rst preprocessed to obtain the ac...
This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...
As the amount of textual information available through the World Wide Web grows, there is a growing need for high-precision IR systems that enable a user to nd useful information ...
Mandar Mitra, Chris Buckley, Amit Singhal, Claire ...