Automatic speech recognition (ASR) systems are finding increasing use in everyday life. Many of the commonplace environments where the systems are used are noisy, for example users calling up a voice search system from a busy cafeteria or a street. This can result in degraded speech recordings and adversely affect the performance of speech recognition systems. As the use of ASR systems increases, knowledge of the state-of-the-art in techniques to deal with such problems becomes critical to system and application engineers and researchers who work with or on ASR technologies. This book presents a comprehensive survey of the state-of-the-art in techniques used to improve the robustness of speech recognition systems to these degrading external influences.
Key features:
- Reviews all the main noise robust ASR approaches, including signal separation, voice activity detection, robust feature extraction, model compensation and adaptation, missing data techniques and recognition of reverberant speech.
- Acts as a timely exposition of the topic in light of more widespread use in the future of ASR technology in challenging environments.
- Addresses robustness issues and signal degradation which are both key requirements for practitioners of ASR.
- Includes contributions from top ASR researchers from leading research units in the field
Content:
Chapter 1 Introduction (pages 1–5): Tuomas Virtanen, Rita Singh and Bhiksha Raj
Chapter 2 The Basics of Automatic Speech Recognition (pages 7–30): Rita Singh, Bhiksha Raj and Tuomas Virtanen
Chapter 3 The Problem of Robustness in Automatic Speech Recognition (pages 31–50): Bhiksha Raj, Tuomas Virtanen and Rita Singh
Chapter 4 Voice Activity Detection, Noise Estimation, and Adaptive Filters for Acoustic Signal Enhancement (pages 51–85): Rainer Martin and Dorothea Kolossa
Chapter 5 Extraction of Speech from Mixture Signals (pages 87–108): Paris Smaragdis
Chapter 6 Microphone Arrays (pages 109–157): John McDonough and Kenichi Kumatani
Chapter 7 From Signals to Speech Features by Digital Signal Processing (pages 159–192): Matthias Wolfel
Chapter 8 Features Based on Auditory Physiology and Perception (pages 193–227): Richard M. Stern and Nelson Morgan
Chapter 9 Feature Compensation (pages 229–250): Jasha Droppo
Chapter 10 Reverberant Speech Recognition (pages 251–281): Reinhold Haeb?Umbach and Alexander Krueger
Chapter 11 Adaptation and Discriminative Training of Acoustic Models (pages 283–310): Yannick Esteve and Paul Deleglise
Chapter 12 Factorial Models for Noise Robust Speech Recognition (pages 311–345): John R. Hershey, Steven J. Rennie and Jonathan Le Roux
Chapter 13 Acoustic Model Training for Robust Speech Recognition (pages 347–368): Michael L. Seltzer
Chapter 14 Missing?Data Techniques: Recognition with Incomplete Spectrograms (pages 369–398): Jon Barker
Chapter 15 Missing?Data Techniques: Feature Reconstruction (pages 399–432): Jort Florent Gemmeke and Ulpu Remes
Chapter 16 Computational Auditory Scene Analysis and Automatic Speech Recognition (pages 433–462): Arun Narayanan and Deliang Wang
Chapter 17 Uncertainty Decoding (pages 463–486): Hank Liao