Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
In the third edition of WePS campaign we have undertaken the person name disambiguation problem referred to as a clustering task. Our aim was to make use of intrinsic link relation...
Elena Smirnova, Konstantin Avrachenkov, Brigitte T...
Unlike conventional data or text, Web pages typically contain a large amount of information that is not part of the main contents of the pages, e.g., banner ads, navigation bars, ...
We consider the problem of improving the performance of web access by proposing a reconstruction of the internal link structure of a web site in order to match the quality of the ...
John D. Garofalakis, Panagiotis Kappos, Christos M...
In this poster we present an overview of the techniques we used to develop and evaluate a text categorisation system for the PRINCIP project which sets out to automatically classi...