Sciweavers

2677 search results - page 215 / 536
» Extracting Structured Data from Web Pages
Sort
View
156
Voted
BMCBI
2008
142views more  BMCBI 2008»
15 years 3 months ago
Microarray data mining: A novel optimization-based approach to uncover biologically coherent structures
Background: DNA microarray technology allows for the measurement of genome-wide expression patterns. Within the resultant mass of data lies the problem of analyzing and presenting...
Meng Piao Tan, Erin N. Smith, James R. Broach, Chr...
CIKM
2008
Springer
15 years 5 months ago
Identifying table boundaries in digital documents via sparse line detection
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Ying Liu, Prasenjit Mitra, C. Lee Giles
108
Voted
WSDM
2010
ACM
215views Data Mining» more  WSDM 2010»
16 years 1 months ago
Boilerplate Detection using Shallow Text Features
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
Christian Kohlschütter, Peter Fankhauser, Wol...
139
Voted
WWW
2010
ACM
15 years 10 months ago
CETR: content extraction via tag ratios
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
Tim Weninger, William H. Hsu, Jiawei Han
148
Voted
MIR
2006
ACM
172views Multimedia» more  MIR 2006»
15 years 9 months ago
Combining audio-based similarity with web-based data to accelerate automatic music playlist generation
We present a technique for combining audio signal-based music similarity with web-based musical artist similarity to accelerate the task of automatic playlist generation. We demon...
Peter Knees, Tim Pohle, Markus Schedl, Gerhard Wid...