Essentially all data mining algorithms assume that the datagenerating process is independent of the data miner's activities. However, in many domains, including spam detectio...
Nilesh N. Dalvi, Pedro Domingos, Mausam, Sumit K. ...
Structured peer-to-peer (P2P) overlays have been successfully employed in many applications to locate content. However, they have been less effective in handling massive amounts o...
Real-world data -- especially when generated by distributed measurement infrastructures such as sensor networks -- tends to be incomplete, imprecise, and erroneous, making it impo...
Text is ubiquitous and, not surprisingly, many important applications rely on textual data for a variety of tasks. As a notable example, information extraction applications derive...
Panagiotis G. Ipeirotis, Eugene Agichtein, Pranay ...
Database selection is an important step when searching over large numbers of distributed text databases. The database selection task relies on statistical summaries of the databas...