The recognition of text in everyday scenes is made difficult by viewing conditions, unusual fonts, and lack of linguistic context. Most methods integrate a priori appearance info...
David Smith, Jacqueline Feild, Eric Learned-Miller
Similarity search in time series data is an active area of research. In this paper, we introduce the novel concept of threshold-similarity queries in time series databases which r...
Alexey Pryakhin, Hans-Peter Kriegel, Johannes A&sz...
With the increasing amount of data and the need to integrate data from multiple data sources, a challenging issue is to find near duplicate records efficiently. In this paper, we ...
Chuan Xiao, Wei Wang 0011, Xuemin Lin, Jeffrey Xu ...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
There are several pieces of information that can be utilized in order to improve the efficiency of similarity searches on high-dimensional data. The most commonly used information...