Sciweavers

11 search results - page 1 / 3
» Distant IE by Bootstrapping Using Lists and Document Structu...
Sort
View
WEBI
2004
Springer
13 years 10 months ago
Semi-Structured Complex List Extraction
The semi-structured information available in HTML and similar documents provide valuable information that can be used for information extraction applications. This information tog...
Anders Arpteg
EP
1998
Springer
13 years 9 months ago
Measuring Structural Similarity Among Web Documents: Preliminary Results
When we describe a Web page informally, we often use phrases like it looks like a newspaper site", there are several unordered lists" or it's just a collection of li...
Isabel F. Cruz, Slava Borisov, Michael A. Marks, T...
KDD
2007
ACM
231views Data Mining» more  KDD 2007»
14 years 5 months ago
Xproj: a framework for projected structural clustering of xml documents
XML has become a popular method of data representation both on the web and in databases in recent years. One of the reasons for the popularity of XML has been its ability to encod...
Charu C. Aggarwal, Na Ta, Jianyong Wang, Jianhua F...
IPM
2008
141views more  IPM 2008»
13 years 4 months ago
Towards a unified approach to document similarity search using manifold-ranking of blocks
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches...
Xiaojun Wan, Jianwu Yang, Jianguo Xiao
ESWS
2004
Springer
13 years 10 months ago
Learning to Harvest Information for the Semantic Web
Abstract. In this paper we describe a methodology for harvesting information from large distributed repositories (e.g. large Web sites) with minimum user intervention. The methodol...
Fabio Ciravegna, Sam Chapman, Alexiei Dingli, Yori...