Sciweavers

Free Online Productivity Tools i2Speak i2Symbol i2OCR iTex2Img iWeb2Print iWeb2Shot i2Type iPdf2Split iPdf2Merge i2Bopomofo i2Arabic i2Style i2Image i2PDF iLatex2Rtf Sci2ools

143

ECIR
2007
Springer

120views Information Technology» more ECIR 2007»

Similarity Measures for Short Segments of Text

15 years 5 months ago

Similarity Measures for Short Segments of Text

Download ciir.cs.umass.edu

Measuring the similarity between documents and queries has been extensively studied in information retrieval. However, there are a growing number of tasks that require computing the similarity between two very short segments of text. These tasks include query reformulation, sponsored search, and image retrieval. Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context. In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures. We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures. We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log. Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.

Donald Metzler, Susan T. Dumais, Christopher Meek

Real-time Traffic

ECIR 2007 | Information Retrieval | Information Technology | Similarity Measures | Text Similarity Measures |

claim paper

Related Content

» Improving Similarity Measures for Short Segments of Text

» A webbased kernel function for measuring the similarity of short text snippets

» Proximity Estimation and Hardness of ShortText Corpora

» Corpusbased and Knowledgebased Measures of Text Semantic Similarity

» Clustering NarrowDomain Short Texts by Using the KullbackLeibler Distance

» Using corpus and knowledgebased similarity measure in Maximum Marginal Relevance for meeti...

» Topic Segmentation Algorithms for Text Summarization and Passage Retrieval An Exhaustive E...

» Using Multiple Discriminant Analysis Approach for Linear Text Segmentation

» A Comparative Study of Ontology Based Term Similarity Measures on PubMed Document Clusteri...

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ECIR
Authors	Donald Metzler, Susan T. Dumais, Christopher Meek

Comments (0)