We consider the problem of visual categorization with minimal supervision during training. We propose a partbased model that loosely captures structural information. We represent images as a collection of parts characterized by an appearance codeword from a visual vocabulary and by a neighborhood context, organized in an ordered set of bag-of-features representations. These bags are computed in a local overlapping areas around the part. A semantic distance between images is obtained by matching parts associated with the same codeword using their context distributions. The classification is done using SVM with the kernel obtained from the proposed distance. The experiments show that our method outperforms all the classification methods from the PASCAL challenge on half of the VOC2006 categories and has the best average EER. It also outperforms the constellation model learned via boosting, as proposed by Bar-Hillel et al. on their data set, which contains more rigid objects.