It has frequently been observed that most of the world’s data lies outside database systems. The reason is that database systems focus on structured data, leaving the unstructur...
Alon Y. Halevy, Oren Etzioni, AnHai Doan, Zachary ...
We discuss Image Sense Discrimination (ISD), and apply a method based on spectral clustering, using multimodal features from the image and text of the embedding web page. We evalu...
Nicolas Loeff, Cecilia Ovesdotter Alm, David A. Fo...
Metasearch engine, Comparison-shopping and Deep Web crawling applications need to extract search result records enwrapped in result pages returned from search engines in response ...
Thanks to the WebGL graphics API specification for the JavaScript programming language, the possibility of using the GPU capabilities in a web browser without the need for an ad-...
Marco Di Benedetto, Federico Ponchio, Fabio Ganove...
The PDF format is commonly used for the exchange of documents on the Web and there is a growing need to understand and extract or repurpose data held in PDF documents. Many system...