Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
This work addresses the challenge of extracting structure in educational and training media based on the type of material that is presented during lectures and training sessions. ...
Abstract The Internet and the Web are increasingly used to disseminate fast changing data such as sensor data, traffic and weather information, stock prices, sports scores, and eve...
Content-based image search on the Internet is a challenging problem, mostly due to the semantic gap between low-level visual features and high-level content, as well as the excess...
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...