Presentation of search results in Web-based information retrieval (IR) systems has been dominated by a textual form of information such as the title, snippet, URL, and/or file type...
In this study, we explore the construction of a faceted vocabulary that can be used as a mechanism for organizing Web-based resources. After analyzing the manual process of facete...
Kiduk Yang, Elin K. Jacob, Aaron Loehrlein, Seungm...
In this paper we process and analyze web search engine query and click data from the perspective of the documents (URL’s) selected. We initially define possible document categor...
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recent...
Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closel...