Accelerated by the technological advances in the domain, the size of the biomedical literature has been growing rapidly. As a result, it is not feasible for individual researchers...
In this paper, we prove for the first time that the learning complexity of Rocchio’s algorithm is O(d+d2 (log d+log n)) over the discretized vector space {0, . . . , n − 1}d ,...
Clustering short length texts is a difficult task itself, but adding the narrow domain characteristic poses an additional challenge for current clustering methods. We addressed thi...
We address the problem of minimizing the communication involved in the exchange of similar documents. We consider two users, A and B, who hold documents x and y respectively. Neit...
The problem of version detection is critical in many important application scenarios, including software clone identification, Web page ranking, plagiarism detection, and peer-to-...
Deise de Brum Saccol, Nina Edelweiss, Renata de Ma...