This paper is concerned with automatic extraction of titles from the bodies of HTML documents. Titles of HTML documents should be correctly defined in the title fields; however, i...
Almost all current anti spam measures are reactive, filtering being the most common. But to react means always to be one step behind. Reaction requires to predict the next action ...
In this paper, we propose a new approach to discover informative contents from a set of tabular documents (or Web pages) of a Web site. Our system, InfoDiscoverer, first partition...
Text mining appliesthe sameanalytical functions of datamining to the domainof textual information, relying on sophisticatedtext analysis techniques that distill information from f...
We study the optimization of the expected number of bytes that must be transferred by the Web server when a user visits one of its pages. Given a Web site, we want to find an assi...
Evangelos Kranakis, Danny Krizanc, Miguel Vargas M...