Recent email spam filtering evaluations, such as those conducted at TREC, have shown that near-perfect filtering results are attained with a variety of machine learning methods wh...
Matrix factorization is a fundamental technique in machine learning that is applicable to collaborative filtering, information retrieval and many other areas. In collaborative fil...
This paper presents experiments on classifying web pages by genre. Firstly, a corpus of 1539 manually labeled web pages was prepared. Secondly, 502 genre features were selected ba...
Identification of transliterations is aimed at enriching multilingual lexicons and improving performance in various Natural Language Processing (NLP) applications including Cross ...
Reflecting the rapid growth of science, technology, and culture, it has become common practice to consult tools on the World Wide Web for various terms. Existing search engines pr...