Recently, the credibility of information on the Web has become an important issue. In addition to telling about content of source documents, indicating how to interpret the conten...
Abstract. Linear Discriminant (LD) techniques are typically used in pattern recognition tasks when there are many (n >> 104 ) datapoints in low-dimensional (d < 102 ) spac...
Given a noisy text page, a word recognizer can generate a set of candidates for each word image. A relaxation algorithm was proposed previously by the authors that uses word collo...
We describe research carried out as part of a text summarisation project for the legal domain for which we use a new XML corpus of judgments of the UK House of Lords. These judgmen...
Accurate web page classification often depends crucially on information gained from neighboring pages in the local web graph. Prior work has exploited the class labels of nearby p...