Sciweavers

80 search results - page 11 / 16
» Web Page Segmentation Based on Gestalt Theory
Sort
View
CICLING
2009
Springer
15 years 1 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus
KDD
1997
ACM
169views Data Mining» more  KDD 1997»
15 years 1 months ago
Learning to Extract Text-Based Information from the World Wide Web
Thereis a wealthof informationto be minedfromnarrative text on the WorldWideWeb.Unfortunately, standard natural language processing (NLP)extraction techniques expect full, grammat...
Stephen Soderland
WWW
2007
ACM
15 years 10 months ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
WWW
2004
ACM
15 years 10 months ago
Link fusion: a unified link analysis framework for multi-type interrelated data objects
Web link analysis has proven to be a significant enhancement for quality based web search. Most existing links can be classified into two categories: intra-type links (e.g., web h...
Wensi Xi, Benyu Zhang, Zheng Chen, Yizhou Lu, Shui...
WWW
2004
ACM
15 years 10 months ago
Continuous web: a new image-based hypermedia and scape-oriented browsing
Conventionally, Web pages have been recognized as documents described by HTML. Image data, such as photographs, logos, maps, illustrations, and decorated text, have been treated a...
Hiroya Tanaka, Katsumi Tanaka