Sciweavers

2827 search results - page 220 / 566
» Marking Text Documents
Sort
View
170
Voted
WWW
2009
ACM
16 years 6 months ago
Combining anchor text categorization and graph analysis for paid link detection
In order to artificially boost the rank of commercial pages in search engine results, search engine optimizers pay for links to these pages on other websites. Identifying paid lin...
Kirill Nikolaev, Ekaterina Zudina, Andrey Gorshkov
SAC
2009
ACM
16 years 27 days ago
Combining statistics and semantics via ensemble model for document clustering
Incorporating background knowledge into data mining algorithms is an important but challenging problem. Current approaches in semi-supervised learning require explicit knowledge p...
Samah Jamal Fodeh, William F. Punch, Pang-Ning Tan
CIKM
2003
Springer
15 years 11 months ago
Extracting unstructured data from template generated web documents
We propose a novel approach that identifies web page templates and extracts the unstructured data. Extracting only the body of the page and eliminating the template increases the ...
Ling Ma, Nazli Goharian, Abdur Chowdhury, Misun Ch...
ICIP
2000
IEEE
15 years 10 months ago
Hough Technique for Bar Charts Detection and Recognition in Document Images
Charts are common graphic representation for scientific data in technical and business papers. We present a robust system for detecting and recognizing bar charts. The system incl...
Yan Ping Zhou, Chew Lim Tan
156
Voted
ECML
2006
Springer
15 years 9 months ago
Efficient Prediction-Based Validation for Document Clustering
Recently, stability-based techniques have emerged as a very promising solution to the problem of cluster validation. An inherent drawback of these approaches is the computational c...
Derek Greene, Padraig Cunningham