We discuss the problem of Web data extraction and describe an XML-based methodology whose goal extends far beyond simple "screen scraping." An ideal data extraction proc...
Real-time surveillance systems, network and telecommunication systems, and other dynamic processes often generate tremendous (potentially infinite) volume of stream data. Effectiv...
Y. Dora Cai, David Clutter, Greg Pape, Jiawei Han,...
Commercial relational databases currently store vast amounts of real-world data. The data within these relational repositories are represented by multiple relations, which are int...
Most data mining operations include an integral search component at their core. For example, the performance of similarity search or classification based on Nearest Neighbors is ...
Published data is prone to privacy attacks. Sanitization methods aim to prevent these attacks while maintaining usefulness of the data for legitimate users. Quantifying the trade-...