Sensory Processing & Learning


Online visual object learning

Online learning denotes the ability of a learning system to modify its representation in direct interaction with the environment. This has many advantages compared to fully separated training and application phases. Direct feedback allows correction of system errors and thus enables a rapid adaptation to an unknown or quickly changing system environment. A user can modify the system behavior according to personal preferences during the learning process.

Our approaches use exemplar-based networks with metrical adaptation methods to allow an efficient learning of categories and concepts in high-dimensional sensory input data. This requires methods for identifying relevant subspaces within the feature spaces and choosing the metrical relations between prototypes accordingly. Remembering exemplars is an efficient approach to the stability-plasticity dilemma of online learning: The conflict between keeping acquired knowledge and flexibly adding new knowledge to the representation.

Learning visual features

Systems acting in natural environments have to detect and distinguish a large number of visual categories. This is a difficult problem as these categories show large appearance variations in terms of different brands, colors, lighting conditions, orientations and occlusions. Our analytic feature framework learns for each category a set of typical local features that are often found among the instances of that category and are usually missing in the other categories. Thus the activation of the feature set allows a classifier to easily learn a separation of different categories and later predict their presence.


We successfully applied this framework to distinguish more than one hundred hand held objects and to detect cars and pedestrians in traffic scenes. Currently, we are investigating hierarchical classifiers which better reflect the natural structure of categories. By splitting a difficult decision into a set of easier ones they show an improved scalability to problems with many categories.







Speech auditory features

Human speech is a very dynamic process. While we speak the muscles controlling the articulation process move at a high pace. As a consequence the resonance frequencies of the vocal tract change rapidly. These resonance frequencies are one of the key features which allow us to understand what is said.
To capture these resonance frequencies conventional approaches to automatic speech recognition sample the time-frequency plane at equidistant time intervals. In our approach, we model these dynamic processes more explicitly. We apply a hierarchy of two- dimensional filters which operate on the time- frequency plane and are sensitive to movements in this spectro-temporal domain.
These filters are learnt data-driven in an offline step and can then be adapted to the current situation in an online process. Using this hierarchical spectro- temporal feature extraction we obtain significant improvements compared to conventional approaches, in particular when background noise is present. to conventional approaches, in particular when background noise is present.

Slow feature analysis

Teaching a system manually can be slow and cumbersome. But even without a teacher, learning systems can extract a wealth of information from sensory data. Such unsupervised learning can identify meaningful information from the statistics of the sensors. When the system interacts with the world, it can actively shape the perceived statistics and thus the learned representations.

Imagine a child turning the same toy in its hands, where the visual appearance can change drastically. Even without a teacher the child will learn that all different views of the toy belong to the same object. The child’s visual representation of the toy has become view-invariant. Slow Feature Analysis is such an unsupervised algorithm that uses the temporal statistics of the data. In this way, it is possible to learn invariant object recognition under full 3D rotation with complex objects that are hard even for humans to discern. The same model can be applied to model visual self-localization in space.
With these approaches we are able to build better models of human brain function and at the same time provide new and better functionality for computer vision applications.

Silicon retina

Standard optical sensors still face tough challenges in real-world environments. Vision sensors, for instance, have to provide reliable data across a large range of lighting conditions. Control systems have to respond to fast visual stimuli with minimal latency and estimate object trajectories with a fine time-resolution. All this needs to be done with minimal energy-expenditure.
Off-the-shelf vision-sensors normally do not meet the above mentioned criteria. That is why investigating novel sensor-technology is important. This entails determining what novel approaches there are and how they can be utilized and integrated. Approaches that could face the challenges posed by real-world environments are for instance non-frame-based approaches like the Silicon Retina and biologically inspired high dynamic range camera designs.



Asynchronous temporal contrast silicon-retina (courtesy  
T. Delbruck, Inst. of Neuroinformatics, UZH-ETH Zurich)