Sciweavers

85 search results - page 14 / 17
» Extracting unstructured data from template generated web doc...
Sort
View
ICDM
2008
IEEE
186views Data Mining» more  ICDM 2008»
15 years 4 months ago
xCrawl: A High-Recall Crawling Method for Web Mining
Web Mining Systems exploit the redundancy of data published on the Web to automatically extract information from existing web documents. The first step in the Information Extract...
Kostyantyn M. Shchekotykhin, Dietmar Jannach, Gerh...
81
Voted
FCT
2001
Springer
15 years 2 months ago
Polynomial Time Algorithms for Finding Unordered Tree Patterns with Internal Variables
Many documents such as Web documents or XML files have tree structures. A term tree is an unordered tree pattern consisting of internal variables and tree structures. In order to ...
Takayoshi Shoudai, Tomoyuki Uchida, Tetsuhiro Miya...
SIGIR
2004
ACM
15 years 3 months ago
Learning to cluster web search results
Organizing Web search results into clusters facilitates users' quick browsing through search results. Traditional clustering techniques are inadequate since they don't g...
Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, ...
110
Voted
DOCENG
2008
ACM
14 years 11 months ago
A concise XML binding framework facilitates practical object-oriented document engineering
Semantic web researchers tend to assume that XML Schema and OWL-S are the correct means for representing the types, structure, and semantics of XML data used for documents and int...
Andruid Kerne, Zachary O. Toups, Blake Dworaczyk, ...
SIGIR
2011
ACM
14 years 12 days ago
Social context summarization
We study a novel problem of social context summarization for Web documents. Traditional summarization research has focused on extracting informative sentences from standard docume...
Zi Yang, Keke Cai, Jie Tang, Li Zhang, Zhong Su, J...