Abstract. This paper summarises our work in textual Case-Based Reasoning within jCOLIBRI. We use Information Extraction techniques to annotate web pages to facilitate semantic retr...
The sipping of ink through the pages of certain double-sided handwritten documents after long periods of storage poses a serious problem to human readers or OCR systems. This pape...
The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured in...
Cindy Xide Lin, Bo Zhao, Tim Weninger, Jiawei Han,...
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is pro...
Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
Today's Web sites are intricate but not intelligent; while Web navigation is dynamic and idiosyncratic, all too often Web sites are fossils cast in HTML. In response, this pa...