Sciweavers

2190 search results - page 103 / 438
» Unweaving a web of documents
Sort
View
102
Voted
ICDAR
2009
IEEE
15 years 7 months ago
User-Guided Wrapping of PDF Documents Using Graph Matching Techniques
There are a number of established products on the market for wrapping—semi-automatic navigation and extraction of data—from web pages. These solutions make use of the inherent...
Tamir Hassan
93
Voted
IEEEICCI
2002
IEEE
15 years 5 months ago
An Agent-Assisted Document Storage for Software Process Environments
Traditional software process environment stores documents using either centralized or distributed approach. With the assistance of web agent, this paper presents a new document st...
Jason Jen-Yen Chen, Chun-Yi Lin
93
Voted
DEXAW
2008
IEEE
123views Database» more  DEXAW 2008»
15 years 7 months ago
Text Extraction from the Web via Text-to-Tag Ratio
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
Tim Weninger, William H. Hsu
205
Voted
ICDE
2008
IEEE
218views Database» more  ICDE 2008»
16 years 2 months ago
AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections
The nature of semistructured data in web collections is evolving. Increasingly, XML web documents (or documents exchanged via web services) are valid with regard to a schema, yet ...
Mariano P. Consens, Flavio Rizzolo, Alejandro A. V...
110
Voted
CIKM
2009
Springer
15 years 7 months ago
Improving web page classification by label-propagation over click graphs
In this paper, we present a semi-supervised learning method for web page classification, leveraging click logs to augment training data by propagating class labels to unlabeled si...
Soo-Min Kim, Patrick Pantel, Lei Duan, Scott Gaffn...