Smidl V., Quinn A. — The Variational Bayes Method in Signal Processing (Signals and Communication Technology) :: Электронная библиотека попечительского совета мехмата МГУ

Gaussian linear modelling cannot address current signal processing demands. In modern contexts, such as Independent Component Analysis (ICA), progress has been made specifically by imposing non-Gaussian and/or non-linear assumptions. Hence, standard Wiener and Kalman theories no longer enjoy their traditional hegemony in the field, revealing the standard computational engines for these problems. In their place, diverse principles have been explored, leading to a consequent diversity in the implied computational algorithms. The traditional on-line and data-intensive preoccupations of signal processing continue to demand that these algorithms be tractable.
Increasingly, full probability modelling (the so-called Bayesian approach)—or partial probability modelling using the likelihood function—is the pathway for design of these algorithms. However, the results are often intractable, and so the area of distributional approximation is of increasing relevance in signal processing. The Expectation-Maximization (EM) algorithm and Laplace approximation, for example, are standard approaches to handling difficult models, but these approximations (certainty equivalence, and Gaussian, respectively) are often too drastic to handle the high-dimensional, multi-modal and/or strongly correlated problems that are encountered. Since the 1990s, stochastic simulation methods have come to dominate Bayesian signal processing. Markov Chain Monte Carlo (MCMC) sampling, and related methods, are appreciated for their ability to simulate possibly high-dimensional distributions to arbitrary levels of accuracy. More recently, the particle filtering approach has addressed on-line stochastic simulation. Nevertheless, the wider acceptability of these methods—and, to some extent, Bayesian signal processing itself— has been undermined by the large computational demands they typically make.
The Variational Bayes (VB) method of distributional approximation originates— as does the MCMC method—in statistical physics, in the area known as Mean Field Theory. Its method of approximation is easy to understand: conditional independence is enforced as a functional constraint in the approximating distribution, and the best such approximation is found by minimization of a Kullback-Leibler divergence (KLD). The exact—but intractable—multivariate distribution is therefore factorized into a product of tractable marginal distributions, the so-called VB-marginals. This straightforward proposal for approximating a distribution enjoys certain optiVIII mality properties. What is of more pragmatic concern to the signal processing community, however, is that the VB-approximation conveniently addresses the following key tasks:
The inference is focused (or, more formally, marginalized) onto selected subsets of parameters of interest in the model: this one-shot (i.e. off-line) use of the VB method can replace numerically intensive marginalization strategies based, for example, on stochastic sampling.
Parameter inferences can be arranged to have an invariant functional form when updated in the light of incoming data: this leads to feasible on-line tracking algorithms involving the update of fixed- and finite-dimensional statistics. In the language of the Bayesian, conjugacy can be achieved under the VB-approximation. There is no reliance on propagating certainty equivalents, stochastically-generated particles, etc.
Unusually for a modern Bayesian approach, then, no stochastic sampling is required for the VB method. In its place, the shaping parameters of the VB-marginals are found by iterating a set of implicit equations to convergence. This Iterative Variational Bayes (IVB) algorithm enjoys a decisive advantage over the EM algorithm whose computational flow is similar: by design, the VB method yields distributions in place of the point estimates emerging from the EM algorithm. Hence, in common with all Bayesian approaches, the VB method provides, for example, measures of uncertainty for any point estimates of interest, inferences of model order/rank, etc.
The machine learning community has led the way in exploiting the VB method in model-based inference, notably in inference for graphical models. It is timely, however, to examine the VB method in the context of signal processing where, to date, little work has been reported. In this book, at all times, we are concerned with the way in which the VB method can lead to the design of tractable computational schemes for tasks such as (i) dimensionality reduction, (ii) factor analysis for medical imagery, (iii) on-line filtering of outliers and other non-Gaussian noise processes, (iv) tracking of non-stationary processes, etc. Our aim in presenting these VB algorithms is not just to reveal new flows-of-control for these problems, but—perhaps more significantly—to understand the strengths and weaknesses of the VB-approximation in model-based signal processing. In this way, we hope to dismantle the current psychology of dependence in the Bayesian signal processing community on stochastic sampling methods.Without doubt, the ability to model complex problems to arbitrary levels of accuracy will ensure that stochastic sampling methods—such as MCMC— will remain the golden standard for distributional approximation. Notwithstanding this, our purpose here is to show that the VB method of approximation can yield highly effective Bayesian inference algorithms at low computational cost. In showing this, we hope that Bayesian methods might become accessible to a much broader constituency than has been achieved to date.