Sciweavers

26 search results - page 4 / 6
» Information extraction from structured documents using k-tes...
Sort
View
WWW
2005
ACM
14 years 6 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
WWW
2009
ACM
13 years 10 months ago
Extracting data records from the web using tag path clustering
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
Gengxin Miao, Jun'ichi Tatemura, Wang-Pin Hsiung, ...
AAAI
2007
13 years 7 months ago
Recognizing Textual Entailment Using a Subsequence Kernel Method
We present a novel approach to recognizing Textual nt. Structural features are constructed from abstract tree descriptions, which are automatically extracted from syntactic depend...
Rui Wang 0005, Günter Neumann
IJCNN
2006
IEEE
13 years 11 months ago
A Self-Organising Map Approach for Clustering of XML Documents
— The number of XML documents produced and available on the Internet is steadily increasing. It is thus important to devise automatic procedures to extract useful information fro...
Francesca Trentini, Markus Hagenbuchner, Alessandr...
WWW
2005
ACM
14 years 6 months ago
Automatically learning document taxonomies for hierarchical classification
While several hierarchical classification methods have been applied to web content, such techniques invariably rely on a pre-defined taxonomy of documents. We propose a new techni...
Kunal Punera, Suju Rajan, Joydeep Ghosh