In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
Focused crawlers are considered as a promising way to tackle the scalability problem of topic-oriented or personalized search engines. To design a focused crawler, the choice of s...
Readers interested in the context of an event covered in the news such as the dismissal of a lawsuit can benefit from easily finding out about the overall news situation, the lega...
Debugging inconsistent OWL ontologies is a timeconsuming task. Debugging services included in existing ontology engineering tools are still far from providing adequate support to ...
Many real life datasets have skewed distributions of events when the probability of observing few events far exceeds the others. In this paper, we observed that in skewed datasets...