Abstract. The “Semantic Mirrors Method” (Dyvik, 1998) is a means for automatic derivation of thesaurus entries from a word-aligned parallel corpus. The method is based on the c...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current release, CzEng was extended by significant amount of texts from various types of so...
Statistical Machine Translation (MT) systems have achieved impressive results in recent years, due in large part to the increasing availability of parallel text for system trainin...
Zhiyi Song, Stephanie Strassel, Gary Krug, Kazuaki...
Abstract. Internet users are becoming increasingly concerned about their personal information being collected and used by Web service providers. They want to ensure that it is stor...
Web spam research has been hampered by a lack of statistically significant collections. In this paper, we perform the first large-scale characterization of web spam using conten...