Plenary Talk

Detection and Localization of Sound Events Prof. Tuomas Virtanen, Tampere University

Abstract: With the emergence of advanced machine learning techniques and large-scale datasets, holistic analysis of realistic soundscapes becomes more and more appealing. In the case of everyday soundscapes this can mean not only recognizing what are the sounds present in an acoustic scene, but also where they are located and when they occur. This talk will discuss the joint task of detection and localization of sound events addressing the above problem. We will discuss the general problem set up, and its challenges and limitations. We will present general signal processing and machine learning methods used to solve the problem. The state of the art methods typically apply deep neural networks based on convolutional and recurrent layers. We will present deep neural network topologies for joint detection and localization, as well as loss functions used for training the systems which take into account both the predicted sound event classes and their locations. We will also discuss acoustic features that are used to represent multichannel audio that is the input to joint detection and localization systems. Since the performance of the methods is heavily based on the training data used, we will also discuss the datasets that can be used for the development of methods, and metrics for their evaluation. We will discuss the recent DCASE evaluation campaign tasks that addressed the problem of joint detection and localization of sound events, and present findings from them.  
Biography: Tuomas Virtanen is Professor at Tampere University, Finland, where he is leading the Audio Research Group. He received the M.Sc. and Doctor of Science degrees in information technology from Tampere University of Technology in 2001 and 2006, respectively. He has also been working as a research associate at Cambridge University Engineering Department, UK. He is known for his pioneering work on single-channel sound source separation using non-negative matrix factorization based techniques, and their application to noise-robust speech recognition and music content analysis. Recently he has done significant contributions to sound event detection in everyday environments. In addition to the above topics, his research interests include content analysis of audio signals in general and machine learning. He has authored more than 200 scientific publications on the above topics, which have been cited more than 10000 times. He has received the IEEE Signal Processing Society 2012 best paper award for his article “Monaural Sound Source Separation by Nonnegative Matrix Factorization with Temporal Continuity and Sparseness Criteria” as well as five other best paper awards. He is an IEEE Senior Member, member of the Audio and Acoustic Signal Processing Technical Committee of IEEE Signal Processing Society, and recipient of the ERC 2014 Starting Grant “Computational Analysis of Everyday Soundscapes”.