We present a new class of problems, called resource-bounded information gathering for correlation clustering. Our goal is to perform correlation clustering under circumstances in which accuracy may be improved by augmenting the given graph with additional information. This information is obtained by querying an external source under resource constraints. The problem is to develop the most eﬀective query selection strategy to minimize some loss function on the resulting partitioning. We motivate the problem using an entity resolution task. 1 Problem Deﬁnition The standard correlation clustering problem on a graph with real-valued edge weights is as follows: there exists a fully connected graph G(V, E) with n nodes and edge weights, wij ∈ [−1, +1]. The goal is to partition the vertices in V by minimizing the inconsistencies with the edge weights [1]. That is, we want to ﬁnd a partitioning that maximizes the objective function F = ij wijf(i, j), where f(i, j) = 1 when vi and vj ...

Added
07 Jun 2010

Updated
07 Jun 2010

Type
Conference

Year
2007

Where
COLT

Authors
Pallika Kanani, Andrew McCallum

