Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
Ever since the boom of World Wide Web, profiling online users' interests has become an important task for content providers. The traditional approach involves manual entry of...
Oxford Health Plans, Inc. is a managed care organization whose goal is to deliver cost-effective, high-quality health care. Oxford’s product lines include traditional health mai...
Howard Marmorstein, Jayesh Ghia, Sandeep Sathaye, ...
Web forums have become an important data resource for many web applications, but extracting structured data from unstructured web forum pages is still a challenging task due to bo...
Jiang-Ming Yang, Rui Cai, Yida Wang, Jun Zhu, Lei ...