When is it safe to use synthetic data in supervised classification? Trainable classifier technologies require large representative training sets consisting of samples labeled with...
To exploit the road network in raster maps, the first step is to extract the pixels that constitute the roads and then vectorize the road pixels. Identifying colors that represent...
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if...
We consider the problem of building a P2P-based search engine for massive document collections. We describe a prototype system called ODISSEA (Open DIStributed Search Engine Archi...
Emerging applications such as personalized portals, enterprise search and web integration systems often require keyword search over semi-structured views. However, traditional inf...
Feng Shao, Lin Guo, Chavdar Botev, Anand Bhaskar, ...