Joint Recognition and Segmentation of Objects

Object recognition and segmentation play very interesting roles in human vision and perception. For instance, it is known that early visual areas perform a low level segmentation of the scene, but the final understanding of the scene requires a further symbiotic interplay between segmentation and recognition. While it is not clear whether one of the tasks totally precedes the other, it is apparent that their interaction helps improve the individual performance of either task. Unfortunately, most of the existing literature on recognition and segmentation addresses these problems separately. Only very recently algorithms have started to integrate both tasks, mostly by having recognition algorithms help object segmentation. We believe that the development of a unified approach in which object recognition and segmentation are integrated within the same mathematical framework can significantly improve the performance of both tasks.



We propose to address this problem by casting it as the solution to a combinatorial optimization problem on a graph, which will model the relationships between different levels of perception of an image such as pixels, image patches, object categories, etc. As envisioned by us, the proposed framework will give the user immense freedom to incorporate several levels of perception and model their complex relationships, without any major impact on the optimization framework.

Motion Segmentation

AN EXAMPLE IMAGE

Motion estimation and segmentation is the problem of fitting multiple motion models to every frame of a video sequence, without knowing which pixels are moving according to the same model. Estimating a single motion model for a set of 2-D feature points as they move in a video sequence is a problem in visual motion analysis that has been worked on, in great depth. However, most real world scenes contain multiple objects exhibiting different motions and relatively little attention has been given to such cases. We propose algorithms to solve the challenging problem of segmenting such dynamic scenes consisting of multiple moving objects.

We have developed a closed form solution for segmenting mixtures of 2-D translational and 2-D affine motion models directly from the image intensities. Our approach exploits the fact that the spatial - temporal image derivatives generated by a mixture of these motion models must satisfy a bi-homogeneous polynomial called the Multibody Brightness Constancy Constraint (MBCC). The degrees of the MBCC are related to the number of motions models of each kind and can be automatically computed using a one-dimensional search. We demonstrated that a 3x3 sub-matrix of the Hessian of the MBCC encodes information about the type of motion models. For instance, the matrix is rank-1 for 2-D translational models and rank-3 for 2-D affine models. Once the type of motion model has been identified, the motion model parameters at every image measurement are obtained from the cross products of the derivatives of the MBCC. Further, we proved that accounting for a 2-D translational motion model as a 2-D affine one under such an algebraic framework would result in erroneous estimation of the motions, thus motivating our aim to account for different types of motion models.