Sciweavers

48 search results - page 8 / 10
» Language Based Crawling: Crawling the Arabic Content of the ...
Sort
View
EMNLP
2009
13 years 4 months ago
Web-Scale Distributional Similarity and Entity Set Expansion
Computing the pairwise semantic similarity between all words on the Web is a computationally challenging task. Parallelization and optimizations are necessary. We propose a highly...
Patrick Pantel, Eric Crestan, Arkady Borkovsky, An...
ECIR
2008
Springer
13 years 7 months ago
The Importance of Link Evidence in Wikipedia
Wikipedia is one of the most popular information sources on the Web. The free encyclopedia is densely linked. The link structure in Wikipedia differs from the Web at large: interna...
Jaap Kamps, Marijn Koolen
INTERSPEECH
2010
13 years 1 months ago
Text normalization based on statistical machine translation and internet user support
In this paper, we describe and compare systems for text normalization based on statistical machine translation (SMT) methods which are constructed with the support of internet use...
Tim Schlippe, Chenfei Zhu, Jan Gebhardt, Tanja Sch...
ECIR
2006
Springer
13 years 7 months ago
Automatic Document Organization in a P2P Environment
Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We co...
Stefan Siersdorfer, Sergej Sizov
CIKM
2009
Springer
14 years 27 days ago
Vetting the links of the web
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...
Na Dai, Brian D. Davison