There is an increasing need for sharing data repositories containing personal information across multiple distributed and private databases. However, such data sharing is subject t...
As the use of Electronic Medical Records (EMRs) becomes more widespread, so does the need for effective information discovery on them. Recently proposed EMR standards are XML-based...
The detection of duplicate tuples, corresponding to the same real-world entity, is an important task in data integration and cleaning. While many techniques exist to identify such...
Real-world data -- especially when generated by distributed measurement infrastructures such as sensor networks -- tends to be incomplete, imprecise, and erroneous, making it impo...
Technology in the field of digital media generates huge amounts of nontextual information, audio, video, and images, along with more familiar textual information. The potential for...