Image interpretation, which is effortless and instantaneous for human beings, is the grand challenge of computer vision. The dream is to build a "description machine" which produces a rich semantic description of the underlying scene, including the names and poses of the objects that are present, even "recognizing" other things, such as actions and context. Mathematical frameworks are advanced from time to time, but none is yet widely accepted, and none clearly points the way to closing the gap with natural vision. Researchers at CIS are pursuing several approaches to image interpretation, including approaches based on hierarchical structures and efficient coarse-to-fine search, approaches based on combinatorial optimization techniques on graph for simultaneous object segmentation and categorization, and approaches based on efficient maximum likelihood estimation on graphical models.
Featured Research: A 20,000 Question Game with Images and Videos
The past few decades have witnessed significant advances in object recognition, including both in object identification (finding a specific instance of an object in an image) and object categorization (recognizing an object within a class of objects). However, the recognition of moving nonrigid and/or deforming objects such as fire, water, smoke, steam, etc., is not as well developed. Researchers at CIS are developing algorithms for recognizing categories of moving nonrigid and/or deforming objects from video. For example, recognizing all videos of fire (or water, smoke, steam, etc.) taken under varying viewpoint, scale, and illumination conditions, as elements of the same class.
Humans have a remarkable ability to recognize a wide variety of objects and actions. We can walk into unknown environments and yet seamlessly read directions, recognize signs, streets and buildings. We are also quite perceptive to differences between different types of motions (walking vs. running), emotions (sad vs. happy) and more complex activities (playing vs. dancing). In contrast, state-of-the-art computer vision algorithms perform poorly at these tasks. Researchers at CIS are developing algorithms for automatically recognizing individual and group behaviors in videos. For individuals, the algorithms aim to interpret motions related to gait, face and hand gestures, such as walking, smiling or pointing. For groups of individuals, the algorithms aim to interpret various types of activities, such as dancing, playing, etc.
Featured Research: Bio-inspired Recognition of Human Movements