This paper argues that tracking, object detection, and model-building are all similar activities. We describe a fully automatic system that builds 2D articulated models known as pictorial structures from videos of animals. The learned model can be used to detect the animal in the original video