We apply statistical machine translation (SMT) tools to generate novel paraphrases of input sentences in the same language. The system is trained on large volumes of sentence pair...
Sentiment detection automatically identifies emotions in textual data. The increasing amount of emotive documents available in corporate databases and on the World Wide Web calls f...
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
Source code on the web is a widely available and potentially rich learning resource for nonprogrammers. However, unfamiliar code can be daunting to end-users without programming e...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...