Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
Clustering layouts of software systems combine two important aspects: they reveal groups of related artifacts of the software system, and they produce a visualization of the resul...
The World-Wide-Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics...
Medical data is often presented as free text in the form of medical reports. Such documents contain important information about patients, disease progression and management, but ar...
Fathi H. Saad, Beatriz de la Iglesia, Duncan G. Be...
Systems based on statistical and machine learning methods have been shown to be extremely effective and scalable for the analysis of large amount of textual data. However, in the r...