Abstract. We present a clustering method for continuous data. It defines local clusters into the (primary) data space but derives its similarity measure from the posterior distribu...
— Results of queries by personal names often contain documents related to several people because of the namesake problem. In order to differentiate documents related to different...
One viewpoint of a knowledge network is a knowledge map that clusters similar knowledge sources into knowledge domains. What is needed is an automatic mapping tool that 1) takes t...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Tasks of information retrieval depend on a good distance function for measuring similarity between data instances. The most effective distance function must be formulated in a con...