Object localisation via action recognition

The aim of this paper is to track objects during their use by humans. The task is difficult because these objects are small, fast-moving and often occluded by the user. We present a novel solution based on cascade action recognition, a learned mapping between body-and object-poses, and a hierarchical extension of importance sampling. During tracking, body pose estimates from a Kinect sensor are classified between action classes by a Support Vector Machine and converted to discriminative object pose hypotheses using a {body, object} pose mapping. They are then mixed with generative hypotheses by the importance sampler and evaluated against the image. The approach out-performs a state of the art adaptive tracker for localisation of 14/15 test implements and additionally gives object classifications and 3D object pose estimates.