Camastra F., Vinciarelli A. — Machine Learning for Audio, Image and Video Analysis: Theory and Applications (Advanced Information and Knowledge Processing) :: Электронная библиотека попечительского совета мехмата МГУ

This book is divided into three parts:

From Perception to Computation - Shows how the physical supports our auditory and visual perceptions. In other words, it shows how acoustic waves and electromagnetic radiation are converted into objects that can be manipulated by a computer.

Machine Learning - Provides a rather deep survey of the main techniques used in machine learning. These chapters cover most of the algorithms applied in systems for audio, image, and video analysis. At this point, all of the algorithms are general pattern recognition techniques that could apply to any field.

Applications - This section presents examples of applications using the techniques presented in part two. There is a chapter each dedicated to speech and handwriting recognition, face recognition, and video segmentation and keyframe extraction. Each chapter shows an overall system where analysis and machine learning components interact in order to accomplish a given task. Whenever possible the chapters of this part present results obtained using publicly available data and software package. This enables the reader to perform experiments similar to those presented in this book.

The beginning of each chapter starts with what the reader should understand before getting started, such as calculus and chapter four in the case of chapter eleven. That is followed with a list of what the reaer should know after reading the chapter. I'd say parts one and two are quite good, but things break down a bit in part three. Granted, the subject of each of the three chapters in the final section is complex, but a few more figures and labeled algorithmic steps and maybe a little less prose might have made the specific matters of each task at hand clearer.