Sciweavers

2337 search results - page 267 / 468
» Extracting Sequences from the Web
Sort
View
ECIR
2010
Springer
14 years 11 months ago
Analyzing Information Retrieval Methods to Recover Broken Web Links
In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the p...
Juan Martinez-Romo, Lourdes Araujo
IVC
2007
111views more  IVC 2007»
14 years 10 months ago
Colour text segmentation in web images based on human perception
There is a significant need to extract and analyse the text in images on Web documents, for effective indexing, semantic analysis and even presentation by non-visual means (e.g....
Dimosthenis Karatzas, Apostolos Antonacopoulos
PKDD
2004
Springer
205views Data Mining» more  PKDD 2004»
15 years 4 months ago
Breaking Through the Syntax Barrier: Searching with Entities and Relations
The next wave in search technology will be driven by the identification, extraction, and exploitation of real-world entities represented in unstructured textual sources. Search sy...
Soumen Chakrabarti
WWW
2003
ACM
15 years 11 months ago
Efficient URL caching for world wide web crawling
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...
Andrei Z. Broder, Marc Najork, Janet L. Wiener
INFOCOM
2010
IEEE
14 years 8 months ago
Surfing the Blogosphere: Optimal Personalized Strategies for Searching the Web
We propose a distributed mechanism for finding websurfing strategies that is inspired by the StumbleUpon recommendation engine. Each day, a websurfer visits a sequence of websites ...
Stratis Ioannidis, Laurent Massoulié