This paper1 presents an empirical approach to mining parallel corpora. Conventional approaches use a readily available collection of comparable, nonparallel corpora to extract par...
We investigate three methods for defining a session on Web search engines. We examine 2,465,145 interactions from 534,507 Web searchers. We compare defining sessions using: 1) Int...
Query logs of a Web search engine have been increasingly used as a vital source for data mining. This paper presents a study on largescale domain-independent entity extraction fro...
This paper describes an intelligent agent to facilitate bitext mining from the Web via automatic discovery of URL pairing patterns (or keys) for retrieving parallel web pages. The...
Implicit user feedback, including click-through and subsequent browsing behavior, is crucial for evaluating and improving the quality of results returned by search engines. Severa...