Donald Genam, Professor of Mathematics Illustration of profile with glowing brain and sight lines
  Johns Hopkins University


Image Interpretation

Image interpretation, which is effortless and instantaneous for human beings, is the grand challenge of computer vision. The dream is to build a "description machine" which produces a rich semantic description of the underlying scene, including the names and poses of the objects that are present, even "recognizing" other things, such as actions and context. Mathematical frameworks are advanced from time to time, but none is yet widely accepted, and none clearly points the way to closing the gap with natural vision. Together with students and colleagues in the Center for Imaging Science, I am pursuing an approach to image interpretation based on hierarchical structures. We develop practical vision algorithms (e.g., for detecting faces) as well as theoretical results (e.g., on the efficiency of hierarchical search).

In our view, no advances in computers or statistical learning will overcome the "small sample dilemma" in vision; some global organizational framework is necessary. As in "twenty questions", the enabling assumption is that interpretations may be grouped into natural subsets with shared features. This suggests a coarse-to-fine modeling of both the interpretations and the computational process, uniting representation and processing using a hierarchy of classifiers. It also provides for progressive annotation and concentrates processing on ambiguous areas. Recently, we have introduced the "trace model," a graphical network based on the history of processing; using the trace as a sufficient statistic, coarse-to-fine search can be characterized as an exact implementation of a likelihood ratio test.

  • S. Gangaputra and D. Geman, "A design principle for coarse-to-fine classification," Proceedings CVPR 2006, 2, 1877-1884, 2006. (pdf)
  • S. Gangaputra and D. Geman, "The trace model for object detection and tracking," Towards Category-Level Object Recognition , Lecture Notes in Computer Science, 4170, 401-420, 2006. (pdf)
  • H. Sahbi and D. Geman, "A hierarchy of support vector machines for pattern detection," Journal of Machine Learning Research, 7, 2087-2123, 2006. (pdf)
  • Y. Amit, D. Geman and X. Fan, "A coarse-to-fine strategy for multi-class shape detection,"  IEEE Trans. PAMI, 28, 1606-1621,  2004. (pdf)

Molecular Cancer Diagnosis

The mRNA counts in gene mircroarrays provide a global view of cellular activity by simultaneously recording the expression levels of thousands of genes. Nonetheless, there are technical and practical limitations in using these data for analyzing disease. Accurate statistical inference is difficult due to the small number of observations, typically tens, relative to the large number of genes, typically thousands. Moreover, conventional methods from machine learning lead to decisions which are usually very difficult to interpret in simple or biologically meaningful terms.

Together with researchers at the Institute for Computational Medicine I am investigating a new approach to molecular classification based entirely on pairwise mRNA (or protein) comparisons ("relative expression reversals"). This provides simple classification rules which involve very few genes and generate specific hypotheses for follow-up studies. Since rank- based methods are largely invariant to normalization, the small-sample problem can be addressed by integrating inter-study microarray data, leading to the discovery of more reliable markers. In particular, in the case of prostate cancer, cross-platform validation indicates that simply comparing the relative expression values of HPN and STAT6 achieves high sensitivity and specificity.

  • L. Xu, A-C Tan, D. Naiman, D. Geman and R. Winslow, "Robust prostate cancer marker genes emerge from direct integration of inter-study microarray data," Bioinformatics 2005 21: 3905-3911. (pdf)
  • A-C Tan, D. Naiman, L. Xu, R. Winslow and D. Geman, "Simple decision rules for classifying human cancers from gene expression profiles," Bioinformatics 2005 21: 3896-3904. (pdf)
  • D. Geman, C. d'Avignon, D. Naiman and R. Winslow, "Classifying gene expression profiles from pairwise mRNA comparisons," Statist. Appl. in Genetics and Molecular Biology, 3, 2004. (pdf)

Small Sample Learning

Very high-dimensional data sets are ubiquitous in computational vision, speech and biology, and raise serious challenges in statistical learning. The difficulties are especially pronounced when the objective is to uncover a complex statistical dependency structure within a large number of variables and yet the number of samples available for learning is relatively limited. This situation frequently arises in computational biology, for instance in predicting phenotypes (e.g., diagnosing cancers) from gene expression data or reverse engineering regulatory networks; when measured against the complexity of the systems being studied, the amount of data available for modeling and inference is minuscule. Similarly, in computational vision, full-scale scene interpretation is likely to be hopeless without substantial "hardwiring" to compensate for the small number of samples relative to the large number of possible descriptions.

Simply penalizing model complexity often biases estimation towards the absence of interactions. Consequently, discovering rich structure from small samples is nearly impossible unless the search is severely restricted, in which case statistical learning may become feasible due to variance reduction. At the Institute for Computational Medicine and the Center for Imaging Science, we are pursuing several projects involving a new class of ``decomposable'' graphical models designed for stepwise dependency pursuit, and for which the number of parameters grows slowly with the number of variables. In the case of modeling gene networks, the embedded hierarchical structure is meant to capture the organization of gene regulation in mammals, especially cascades of transcription factors.

Twenty Questions Theory

Together with Gilles Blanchard, I have explored the theoretical foundations of a "twenty questions" approach to pattern recognition. The object of analysis is the computational process itself rather than probability distributions (generative modeling) or decision boundaries (predictive learning). Our formulation is motivated by applications to scene interpretation in which there are a great many possible explanations for the data, one ("background") is statistically dominant, and it is imperative to restrict intensive computation to genuinely ambiguous regions. We consider sequential testing strategies in which decisions are made iteratively, based on past outcomes, about which test to perform next and when to stop testing.

The key structure is a hierarchy of tests, one binary test ("question") for each cell in a tree-structured, recursive partitioning of the space of hypotheses. A central role in the mathematical analysis is played by the ratio of the "cost" of a test, meaning the amount of computation necessary for execution, to the "selectivity" of the test, meaning the probability of correcting declaring background (i.e., not hallucinating). Our main result is that, under a natural model for total computation and certain statistical assumptions on the joint distribution of the tests, coarse-to-fine strategies are optimal whenever the cost-to-selectivity ratio of the test at every cell is less than the sum of the cost-to-selectivity ratios for the children of the cell. As might be expected, under mild assumptions, good designs exhibit a steady progression from broad scope coupled with low power to high power coupled with dedication to specific explanations.

  • G. Blanchard and D. Geman, "Sequential testing designs for pattern recognition," Annals of Statistics, 33, 1155-1202, June, 2005. (pdf)

Mental Image Matching

The standard problem in image retrieval is "query-by-visual-example": the "query image" resides in an image database and is matched by the system with other images. At INRIA in Paris, France, I am working with researchers in the project IMEDIA on a different scenario: the query image is "external" and the matching is "mental." For instance, an image resides only in the mind of the user (or the user has an actual object or photograph), who may seek versions of the same image, e.g., the same face, or may seek images belonging to the same class, e.g., similar landscapes, and responds to a sequence of machine-generated queries designed to accelerate the search. For example, the user declares which of several displayed images is "closest" to his query. These similarity decisions are entirely subjective. Also, since the images are generally indexed by low-level intensity-based features rather than high-level semantic content, the answers provided by the user may not cohere with the measures of proximity employed by the retrieval system.

We are developing an interactive search engine which is based on information theory and statistical inference. The display algorithm involves a Bayesian relevance feedback model and an optimality criterion based on conditional entropy. The modeling challenge is to account for psycho-visual factors and sources of variability in human decision making. Performance is measured by the expected number of iterations necessary to match the identity (target search) or the class (category search) of the query. Designing metrics and response models which are consistent with human behavior is essential for achieving practical results with large databases (art, faces, etc.).

  • Y. Fang and D. Geman, "Experiments in mental face retrieval," Proceedings AVBPA 2005, Lecture Notes in Computer Science, 637-646, July 2005. (Best Student Paper Award) (pdf)