This paper lies on the field of ancient patrimonial books valorization: it precisely relates to the development of suitable assistance tools for humanists and historians to help t...
We describe an HTML web page segmentation algorithm, which is applied to segment online medical journal articles (regular HTML and PDF-Converted-HTML files). The web page content ...
Tasks that take place over a long period of time or collaborative tasks where participants are required to develop an understanding of each other’s effort benefit from better co...
The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human and ...
Statistical topic models provide a general data-driven framework for automated discovery of high-level knowledge from large collections of text documents. While topic models can p...
Chaitanya Chemudugunta, Padhraic Smyth, Mark Steyv...