We study pattern recognition methods and their applications to the real world. In particular, our research focuses on algorithms for recognizing and understanding multimedia data such as audio and video data. For example, we are investigating deep learning techniques for learning hierarchical representations from data, and graph signal processing for analyzing time series of graphs. Applications of these methods include robust speech recognition and large-scale video retrieval. See the following for details.

Deep Learning

Deep learning offers a powerful family of methods for learning representations from data. Deep learning architectures process inputs through many layers of nonlinear units and learn a hierarchy of feature representations. Deep learning offers state-of-the-art results in the fields of automatic speech recognition and computer vision, in particular. Our lab is working with deep learning architectures such as Deep Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks on speech and object recognition tasks.

Graph Signal Processing

Graph signal processing provides generalizations of classical signal processing techniques for audio and images that work in the domain of structures that can be represented by arbitrary graphs. This research group focuses on applying graph signal processing to established pattern recognition tasks, such as action recognition from depth cameras, seeking to find a graph signal representation that is naturally adapted to the task at hand.

Robust Speech Recognition

Current speech recognition system is still not robust against either additive noise or channel distortions. Our research group investigates new features and/or methods to improve the robustness of speech recognition system. Speaker verification using very short speech is also one of our research areas. Other topics in our research group include active learning for speech recognition and overlapped speech detection.

Speaker Verification

Automatic speaker verification is the process of automatically deciding whether a voice sample belongs to a claimed identity or not. Speaker verification is used for access control, surveillance and in forensic applications, e.g., using it as evidence in court or for detecting the voice of a wanted criminal in speech recordings. State-of-the-art speaker verification system uses probabilistic models to calculate the probability that the speaker in a given speech sample is the same as in a reference sample. The parameters of these models are estimated from large collections of speech data. In our research, we have developed new methods for estimating the model parameters and for selecting suitable data for the parameter estimation.

Speech Recognition Error Correction

In recent years, speech input interface has become popular in smart phone applications. Speech recognition errors are unavoidable. When high quality transcriptions are needed, users are required to verify the ASR output and correct errors. Therefore, simpler and more efficient error correction interfaces have been strongly demanded. Our goal is to design simpler user interfaces, develop efficient error correction algorithms, and explore how to use the information generated in the human-machine interaction process to reduce the users' effort.


We develop new methods for human-machine interaction as well as spoken language understanding. Our human-machine interaction research aims for natural and efficient user-interfaces by means of better models and algorithms. Our present focus is on Multi-lingual dialog systems, Error correction and Acoustic modeling. Our spoken language understanding research focus on Dialog act recogntion.


This group focuses on research works where a series of images are processed to extract specific information in order to create intelligent systems for tasks such as automatic identity recognition of individuals based on their behavior, recognition and translation of signs and gestures, human tracking and behavior analysis, as well as general image understanding for watermarking.

Video Retrieval

Nowadays, search techniques for multimedia contents have been strongly demanded. However, few multimedia contents are provided with detailed annotations useful for this purpose, since it is practically impossible to annotate all the contents manually. To overcome this problem, we are developing automatic indexing techniques for video data by using probabilistic models.