Sciweavers

2677 search results - page 394 / 536
» Extracting Structured Data from Web Pages
Sort
View
TREC
2007
15 years 2 months ago
DUTIR at TREC 2007 Blog Track
This paper describes DUTIR at TREC 2007 Blog Track. In data preprocessing, a non English language list created from the corpus was used to remove the non English blogs, blog templ...
Rui Song, Qin Tang, Daming Shi 0002, Hongfei Lin, ...
ESWS
2010
Springer
15 years 6 months ago
LESS - Template-Based Syndication and Presentation of Linked Data
Recently, the publishing of structured, semantic information as linked data has gained quite some momentum. For ordinary users on the Internet, however, this information is not yet...
Sören Auer, Raphael Doehring, Sebastian Dietz...
PLDI
2010
ACM
15 years 10 months ago
A Context-free Markup Language for Semi-structured Text
An ad hoc data format is any non-standard, semi-structured data format for which robust data processing tools are not available. In this paper, we present ANNE, a new kind of mark...
Qian Xi, David Walker
WWW
2008
ACM
16 years 2 months ago
Efficient similarity joins for near duplicate detection
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
KDD
2009
ACM
156views Data Mining» more  KDD 2009»
16 years 1 months ago
Query result clustering for object-level search
Query result clustering has recently attracted a lot of attention to provide users with a succinct overview of relevant results. However, little work has been done on organizing t...
Jongwuk Lee, Seung-won Hwang, Zaiqing Nie, Ji-Rong...