Sciweavers

70 search results - page 13 / 14
» Machine Learning for Information Extraction from XML marked-...
Sort
View
WWW
2008
ACM
14 years 6 months ago
Automatically refining the wikipedia infobox ontology
The combined efforts of human volunteers have recently extracted numerous facts from Wikipedia, storing them as machine-harvestable object-attribute-value triples in Wikipedia inf...
Fei Wu, Daniel S. Weld
AGENTS
1997
Springer
13 years 9 months ago
A Scalable Comparison-Shopping Agent for the World-Wide Web
The World-Wide-Web is less agent-friendly than we might hope. Most information on the Web is presented in loosely structured natural language text with no agent-readable semantics...
Robert B. Doorenbos, Oren Etzioni, Daniel S. Weld
SIGIR
2008
ACM
13 years 5 months ago
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization
Multi-document summarization aims to create a compressed summary while retaining the main characteristics of the original set of documents. Many approaches use statistics and mach...
Dingding Wang, Tao Li, Shenghuo Zhu, Chris H. Q. D...
WSDM
2010
ACM
204views Data Mining» more  WSDM 2010»
14 years 4 days ago
Learning URL patterns for webpage de-duplication
Presence of duplicate documents in the World Wide Web adversely affects crawling, indexing and relevance, which are the core building blocks of web search. In this paper, we pres...
Hema Swetha Koppula, Krishna P. Leela, Amit Agarwa...
AND
2009
13 years 3 months ago
Digital weight watching: reconstruction of scanned documents
A web-portal providing access to over 250.000 scanned and OCRed cultural heritage documents is analyzed. The collection consists of the complete Dutch Hansard from 1917 to 1995. E...
Tim Gielissen, Maarten Marx