Sciweavers

708 search results - page 28 / 142
» Identifying Content Blocks from Web Documents
Sort
View
WWW
2006
ACM
16 years 13 days ago
Robust web content extraction
We present an empirical evaluation and comparison of two content extraction methods in HTML: absolute XPath expressions and relative XPath expressions. We argue that the relative ...
Marek Kowalkiewicz, Maria E. Orlowska, Tomasz Kacz...
WWW
2006
ACM
16 years 13 days ago
Using graph matching techniques to wrap data from PDF documents
Wrapping is the process of navigating a data source, semiautomatically extracting data and transforming it into a form suitable for data processing applications. There are current...
Tamir Hassan, Robert Baumgartner
WSDM
2010
ACM
261views Data Mining» more  WSDM 2010»
15 years 9 months ago
Learning Similarity Metrics for Event Identification in Social Media
Social media sites (e.g., Flickr, YouTube, and Facebook) are a popular distribution outlet for users looking to share their experiences and interests on the Web. These sites host ...
Hila Becker, Mor Naaman, Luis Gravano
99
Voted
WWW
2004
ACM
16 years 13 days ago
Learning block importance models for web pages
Some previous works show that a web page can be partitioned to multiple segments or blocks, and usually the importance of those blocks in a page is not equivalent. Also, it is pro...
Ruihua Song, Haifeng Liu, Ji-Rong Wen, Wei-Ying Ma
BXML
2003
15 years 1 months ago
An XML-based Component Architecture for Personalized Adaptive Web Applications
: Developing personalized applications for the ubiquitous Web assumes to create content that can be automatically adapted to both different presentation platforms and user preferen...
Zoltán Fiala, Michael Hinz, Frank Wehner