The meaning conveyed by documents and their markup often goes well beyond what can be inferred from the markup alone. It often depends on context, so that to interpret document ma...
In information retrieval, the cluster hypothesis states: closely related documents tend to be relevant to the same request. We exploit this hypothesis directly by adjusting queryb...
Intelligent information processing systems, such as digital libraries or search engines index web-pages according to their informative content. However, web-pages contain several n...
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Charts are common graphic representation for scientific data in technical and business papers. We present a robust system for detecting and recognizing bar charts. The system incl...