Learning – Computer Vision and Robotics Laboratory

Sound2Sight: Generating Visual Dynamics from Sound and Context

Posted on August 25, 2021August 25, 2021 by 1qs9y

Learning associations across modalities is critical for robust multimodal reasoning, especially when a modality may be missing during inference. In this paper, we study this problem in the context of audio-conditioned visual synthesis – a task that is important, for example, in occlusion reasoning. Specifically, our goal is to generate future video frames and their motion dynamics conditioned on audio and a few past frames.

Remove to Improve

Posted on August 25, 2021August 25, 2021 by 1qs9y

The workhorses of CNNs are its filters, located at different layers and tuned to different features. Their responses are combined using weights obtained via network training. Training is aimed at optimal results for the entire training data, e.g., highest average classification accuracy. In this paper, we are interested in extending the current understanding of the roles played by the filters, their mutual interactions, and their relationship to classification accuracy.

Visual Scene Graphs for Audio Source Separation

Posted on August 25, 2021August 25, 2021 by 1qs9y

State-of-the-art approaches for visually-guided audio source separation typically assume sources that have characteristic sounds, such as musical instruments. These approaches often ignore the visual context of these sound sources or avoid modeling object interactions that may be useful to characterize the sources better, especially when the same object class may produce varied sounds from distinct interactions.

A Hierarchical Variational Neural Uncertainty Model for Stochastic Video Prediction

Posted on August 25, 2021August 25, 2021 by 1qs9y

Predicting the future frames of a video is a challenging task, in part due to the underlying stochastic real-world phenomena. Prior approaches to solve this task typically estimate a latent prior characterizing this stochasticity, however do not account for the predictive uncertainty of the (deep learning) model. Such approaches often derive the training signal from the mean-squared error (MSE) between the generated frame and the ground truth, which can lead to sub-optimal training, especially when the predictive uncertainty is high.

Scene Classification

Posted on August 23, 2010July 28, 2021 by 1qs9y

We use features of segmentation for semantic classification of real images. We model the image in terms of a probability density function, a Gaussian mixture model (GMM) to be specific, of its region features. This GMM is fit to the image by adapting a universal GMM which is estimated so it fits all images.

Texture Recognition

Posted on September 29, 2009July 28, 2021 by 1qs9y

Given an arbitrary image, our goal is to segment all distinct texture subimages. This is done by discovering distinct, cohesive groups of spatially repeating patterns, called texels, in the image, where each group defines the corresponding texture. Texels occupy image regions, whose photometric, geometric, structural, and spatial-layout properties are samples from an unknown pdf.

Region Based Image Matching

Posted on December 8, 2008July 28, 2021 by 1qs9y

We propose novel approaches to region-based hierarchical image matching, where, given two images, the goal is to identify the largest part in image 1 and its match in image 2 having the maximum similarity measure deﬁned in terms of geometric and photometric properties of regions (e.g., area, boundary shape, and color), as well as region topology (e.g.,

Connected Segmentation Tree For Object Modeling

Posted on June 23, 2008July 28, 2021 by 1qs9y

We propose a new object representation, called connected segmentation tree (CST), which captures canonical characteristics of the object in terms of the photometric, geometric, and spatial adjacency and containment properties of its constituent image regions. CST is obtained by augmenting the object’s segmentation tree (ST) with inter-region neighbor links, in addition to their recursive embedding structure already present in ST.

Object Category Recognition

Posted on June 23, 2008July 28, 2021 by 1qs9y

Low level segmentation based image features are used for the problem of object categorization. In general, object categorization comprises two main research areas: (1) classification or clustering of images containing objects belonging to an object category, and (2) detection, localization, and segmentation of individual object-category instances in images. The first thrust of research is typically concerned with exemplar based methods, where the main focus is to develop an efficient distance measure between two images.

Segmentation Based Object Discovery

Posted on February 2, 2008July 28, 2021 by 1qs9y

Given a set of images, possibly containing objects from an unknown category, determine if a category is present. If a category is present, learn spatial and photometric model of the category. Given an unseen image, segment all occurrences of the category.

S. Todorovic and N. Ahuja, Extracting Subimages of an Unknown Category from a Set of Images, Proceedings IEEE Conference on Computer Vision and Pattern Recognition (CVPR), New York, NY, Vol.

Face Detection

Posted on August 7, 2002July 28, 2021 by 1qs9y

Faces and Gestures

The aforementioned work on representation and learning has contributed to two types of human computer interfaces we have developed. First, learning and classification techniques, including usual statistical classifiers, neural networks, support vector machines and artificial intelligence approaches, have been used to develop new methods for human face detection and hand gesture recognition.

Face Recognition

Posted on August 31, 2001July 28, 2021 by 1qs9y

Faces and Gestures

Learning to Recognize 3D Objects

Posted on June 26, 2000July 28, 2021 by 1qs9y

3D Object Recognition

Recognition is achieved either by explicitly coding the recognition criteria in terms of low level structure, or through learning from examples. Learning algorithms incorporate subspace projections of higher dimensional data symbolically or using neural approaches.

Learning to Recognize 3D Objects

A learning account for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability.

Learning for Object Recognition

Posted on June 15, 2000July 28, 2021 by 1qs9y

3D Object Recognition

Learning for Object Recognition

A learning algorithm accounting for the problem of object recognition is developed within the PAC (Probably Approximately Correct) model of learnability.

Learning of Low-level Spatiotemporal Structural Patterns

Posted on March 1, 1999July 28, 2021 by 1qs9y

3D Object Recognition

Learning of Low-level Spatiotemporal Structural Patterns

Given an image or a video sequence, a prespecified set of low level, spatial and/or temporal descriptors of the image/video structure, and a higher level interpretation of the structure, use computational learning methods to derive a succinct relationship between the interpretation and the low level structural description.

Gesture Recognition

Posted on December 17, 1998July 28, 2021 by 1qs9y