Sciweavers

611 search results - page 47 / 123
» Random web crawls
Sort
View
WWW
2005
ACM
15 years 10 months ago
Adaptive query routing in peer web search
An unstructured peer network application was proposed to address the query forwarding problem of distributed search engines and scalability limitations of centralized search engin...
Le-Shin Wu, Ruj Akavipat, Filippo Menczer
WWW
2005
ACM
15 years 3 months ago
An information extraction engine for web discussion forums
In this poster, we present an information extraction engine for web-based forums. The engine analyzes the HTML files crawled from web forums, deduces the wrapper (template) of the...
Hanny Yulius Limanto, Nguyen Ngoc Giang, Vo Tan Tr...
WWW
2007
ACM
15 years 10 months ago
Efficient search in large textual collections with redundancy
Current web search engines focus on searching only the most recent snapshot of the web. In some cases, however, it would be desirable to search over collections that include many ...
Jiangong Zhang, Torsten Suel
JMLR
2008
159views more  JMLR 2008»
14 years 9 months ago
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
Existing template-independent web data extraction approaches adopt highly ineffective decoupled strategies--attempting to do data record detection and attribute labeling in two se...
Jun Zhu, Zaiqing Nie, Bo Zhang, Ji-Rong Wen
NDT
2010
14 years 8 months ago
Web Document Classification by Keywords Using Random Forests
Web directory hierarchy is critical to serve user’s search request. Creating and maintaining such directories without human experts involvement requires good classification of we...
Myungsook Klassen, Nikhila Paturi