Sciweavers

38 search results - page 2 / 8
» Mining Tables from Large Scale HTML Texts
Sort
View
WSDM
2012
ACM
252views Data Mining» more  WSDM 2012»
12 years 3 days ago
WebSets: extracting sets of entities from the web using unsupervised information extraction
We describe a open-domain information extraction method for extracting concept-instance pairs from an HTML corpus. Most earlier approaches to this problem rely on combining cluste...
Bhavana Bharat Dalvi, William W. Cohen, Jamie Call...
WWW
2007
ACM
14 years 5 months ago
Towards domain-independent information extraction from web tables
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of <table> tags. A mul...
Bernhard Krüpl, Bernhard Pollak, Marcus Herzo...
COLING
2010
12 years 11 months ago
Large Scale Parallel Document Mining for Machine Translation
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
KDD
2008
ACM
128views Data Mining» more  KDD 2008»
14 years 5 months ago
Scaling up text classification for large file systems
: We combine the speed and scalability of information retrieval with the generally superior classification accuracy offered by machine learning, yielding a two-phase text classifie...
George Forman, Shyamsundar Rajaram
OSDI
2008
ACM
14 years 4 months ago
Mining Console Logs for Large-Scale System Problem Detection
The console logs generated by an application contain messages that the application developers believed would be useful in debugging or monitoring the application. Despite the ubiq...
Wei Xu, Ling Huang, Armando Fox, David A. Patterso...