Marking up search queries with linguistic annotations such as part-of-speech tags, capitalization, and segmentation, is an important part of query processing and understanding in ...
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...
Multiple-topic and varying-length of web pages are two negative factors significantly affecting the performance of web search. In this paper, we explore the use of page segmentati...
In this paper, we propose a machine learning approach to title extraction from general documents. By general documents, we mean documents that can belong to any one of a number of...
Yunhua Hu, Hang Li, Yunbo Cao, Dmitriy Meyerzon, Q...