Massive amounts of raw data are currently being generated by biologists while sequencing organisms. Outside of the largest, high-pro le projects such as the Human Genome Project, ...
Several pattern discovery methods proposed in the data mining literature have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverag...
We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions ...
Polina Kuznetsova, Vicente Ordonez, Alexander C. B...
We consider the problem of indexing high-dimensional data for answering (approximate) similarity-search queries. Similarity indexes prove to be important in a wide variety of sett...
In this paper, we consider the problem of combining link and content analysis for community detection from networked data, such as paper citation networks and Word Wide Web. Most ...