Sciweavers

461 search results - page 1 / 93
» Cleaning search results using term distance features
Sort
View
AIRWEB
2008
Springer
13 years 7 months ago
Cleaning search results using term distance features
The presence of Web spam in query results is one of the critical challenges facing search engines today. While search engines try to combat the impact of spam pages on their resul...
Josh Attenberg, Torsten Suel
KDD
2005
ACM
125views Data Mining» more  KDD 2005»
14 years 5 months ago
Email data cleaning
Addressed in this paper is the issue of `email data cleaning' for text mining. Many text mining applications need take emails as input. Email data is usually noisy and thus i...
Jie Tang, Hang Li, Yunbo Cao, ZhaoHui Tang
DMKD
2004
ACM
139views Data Mining» more  DMKD 2004»
13 years 10 months ago
Iterative record linkage for cleaning and integration
Record linkage, the problem of determining when two records refer to the same entity, has applications for both data cleaning (deduplication) and for integrating data from multipl...
Indrajit Bhattacharya, Lise Getoor
NLPRS
2001
Springer
13 years 9 months ago
Automatically Harvesting Katakana-English Term Pairs from Search Engine Query Logs
This paper describes a method of extracting katakana words and phrases, along with their English counterparts from non-aligned monolingual web search engine query logs. The method...
Eric Brill, Gary Kacmarcik, Chris Brockett
PVLDB
2008
136views more  PVLDB 2008»
13 years 4 months ago
Keyword query cleaning
Unlike traditional database queries, keyword queries do not adhere to predefined syntax and are often dirty with irrelevant words from natural languages. This makes accurate and e...
Ken Q. Pu, Xiaohui Yu