Design of IIR filters with Bayesian model selection and parameter estimation

Similar documents
Bayesian room-acoustic modal analysis

STA414/2104 Statistical Methods for Machine Learning II

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Statistical Machine Learning

Computational statistics

Bayesian Inference and MCMC

LTI Systems, Additive Noise, and Order Estimation

Bayesian Regression Linear and Logistic Regression

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

arxiv:astro-ph/ v1 14 Sep 2005

Parameter Estimation. William H. Jefferys University of Texas at Austin Parameter Estimation 7/26/05 1

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian System Identification based on Hierarchical Sparse Bayesian Learning and Gibbs Sampling with Application to Structural Damage Assessment

Dynamic System Identification using HDMR-Bayesian Technique

Bayesian Inference in Astronomy & Astrophysics A Short Course

Density Estimation. Seungjin Choi

Multimodal Nested Sampling

Lecture : Probabilistic Machine Learning

Problem Solving Strategies: Sampling and Heuristics. Kevin H. Knuth Department of Physics University at Albany Albany NY, USA

STA 4273H: Statistical Machine Learning

Bayesian Methods for Machine Learning

Variational Principal Components

On Moving Average Parameter Estimation

Introduction to Bayesian Data Analysis

Hierarchical sparse Bayesian learning for structural health monitoring. with incomplete modal data

Algorithm-Independent Learning Issues

an introduction to bayesian inference

Variational Methods in Bayesian Deconvolution

Design of Higher Order LP and HP Digital IIR Filter Using the Concept of Teaching-Learning Based Optimization

DIGITAL SIGNAL PROCESSING UNIT III INFINITE IMPULSE RESPONSE DIGITAL FILTERS. 3.6 Design of Digital Filter using Digital to Digital

Penalized Loss functions for Bayesian Model Choice

Adaptive beamforming for uniform linear arrays with unknown mutual coupling. IEEE Antennas and Wireless Propagation Letters.

Bayesian search for other Earths

Reminder of some Markov Chain properties:

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

17 : Markov Chain Monte Carlo

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Markov Chain Monte Carlo methods

Who was Bayes? Bayesian Phylogenetics. What is Bayes Theorem?

Bayesian Phylogenetics

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

A Probability Model for Interaural Phase Difference

ECE521 week 3: 23/26 January 2017

arxiv: v1 [stat.co] 23 Apr 2018

Strong Lens Modeling (II): Statistical Methods

Nested Sampling. Brendon J. Brewer. brewer/ Department of Statistics The University of Auckland

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Transdimensional Markov Chain Monte Carlo Methods. Jesse Kolb, Vedran Lekić (Univ. of MD) Supervisor: Kris Innanen

RAO-BLACKWELLISED PARTICLE FILTERS: EXAMPLES OF APPLICATIONS

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

CSC321 Lecture 18: Learning Probabilistic Models

27 : Distributed Monte Carlo Markov Chain. 1 Recap of MCMC and Naive Parallel Gibbs Sampling

Learning Bayesian Networks

Exercises Tutorial at ICASSP 2016 Learning Nonlinear Dynamical Models Using Particle Filters

A REVERBERATOR BASED ON ABSORBENT ALL-PASS FILTERS. Luke Dahl, Jean-Marc Jot

Course content (will be adapted to the background knowledge of the class):

Nonparametric Bayesian Methods (Gaussian Processes)

Advanced Machine Learning

Machine Learning using Bayesian Approaches

A Bayesian Local Linear Wavelet Neural Network

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Bayesian Statistical Methods. Jeff Gill. Department of Political Science, University of Florida

Markov Chain Monte Carlo

David Giles Bayesian Econometrics

Modeling Multiscale Differential Pixel Statistics

p(d θ ) l(θ ) 1.2 x x x

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Monte Carlo in Bayesian Statistics

Converting Infinite Impulse Response Filters to Parallel Form

Linear Programming Algorithms for Sparse Filter Design

Lecture 6: Markov Chain Monte Carlo

Bayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

A Bayesian perspective on GMM and IV

Sequential Monte Carlo methods for filtering of unobservable components of multidimensional diffusion Markov processes

Sparse Linear Models (10/7/13)

Probabilistic Machine Learning

Lecture 4: Probabilistic Learning

MCMC algorithms for fitting Bayesian models

Doing Bayesian Integrals

Regularization Parameter Selection for a Bayesian Multi-Level Group Lasso Regression Model with Application to Imaging Genomics

Markov chain Monte Carlo methods for visual tracking

Data-Driven Bayesian Model Selection: Parameter Space Dimension Reduction using Automatic Relevance Determination Priors

Bayesian Estimation of DSGE Models 1 Chapter 3: A Crash Course in Bayesian Inference

Abstract. Statistics for Twenty-first Century Astrometry. Bayesian Analysis and Astronomy. Advantages of Bayesian Methods

Phylogenetics: Bayesian Phylogenetic Analysis. COMP Spring 2015 Luay Nakhleh, Rice University

Density Propagation for Continuous Temporal Chains Generative and Discriminative Models

Condensed Table of Contents for Introduction to Stochastic Search and Optimization: Estimation, Simulation, and Control by J. C.

STA 4273H: Statistical Machine Learning

Experimental investigation on varied degrees of sound field diffuseness in enclosed spaces

Miscellany : Long Run Behavior of Bayesian Methods; Bayesian Experimental Design (Lecture 4)

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

Advances and Applications in Perfect Sampling

A NEW INFORMATION THEORETIC APPROACH TO ORDER ESTIMATION PROBLEM. Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

1 Using standard errors when comparing estimated values

April 20th, Advanced Topics in Machine Learning California Institute of Technology. Markov Chain Monte Carlo for Machine Learning

The Metropolis-Hastings Algorithm. June 8, 2012

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

Bayesian rules of probability as principles of logic [Cox] Notation: pr(x I) is the probability (or pdf) of x being true given information I

Introduction to Bayesian Statistics and Markov Chain Monte Carlo Estimation. EPSY 905: Multivariate Analysis Spring 2016 Lecture #10: April 6, 2016

Transcription:

Design of IIR filters with Bayesian model selection and parameter estimation Jonathan Botts, José Escolano, and Ning Xiang Abstract Bayesian model selection and parameter estimation are used to address the problem of choosing the most concise filter order for a given application while simultaneously determining the associated filter coefficients. This approach is validated against simulated data and used to generate pole-zero representations of head-related transfer functions. Index Terms IIR filters, Bayesian methods, parameter estimation, model comparison, Monte Carlo methods, headrelated transfer function. EDICS Category: AUD-ROOM Copyright c 21 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. J. Botts and N. Xiang are with the Graduate Program in Architectural Acoustics, Rensselaer Polytechnic Institute, Troy, NY, USA e-mail: (bottsj@rpi.edu, xiangn@rpi.edu). J. Escolano is with the Multimedia and Multimodal Research Group, University of Jaén, Spain. e-mail: (escolano@ujaen.es)

Design of IIR filters with Bayesian model selection and parameter estimation 1 I. INTRODUCTION The design of infinite impulse response (IIR) filters or pole-zero systems can be partially solved with optimization methods; however, the order of the system must be chosen a priori. A complete solution embodies two levels of inference model selection to determine filter order and parameter estimation to determine filter coefficients. This letter uses the tools of Bayesian inference to select the most concise filter order and estimate the associated filter coefficients simultaneously. This approach is useful for applications where the filter represents a pole-zero system like head-related transfer functions [1], [2], pole-zero models for speech analysis [3], or inverse filters for loudspeaker equalization [4]. By defining an error or cost function, optimization methods may be used to minimize the norm of the difference between a model and a desired transfer function, with respect to filter coefficients [1], [3], [5]. The error is nonlinear in the coefficients and may not be solved directly with classical methods, but some optimization techniques like Newton [1], [5] and quasi-newton methods [3], [5] may be used to iteratively approach optimal coefficient values. Heuristic, evolutionary algorithms have also been used to directly minimize the frequency-domain error and solve for optimal coefficients [2], [4], [6]. An assumption of most formulations is that the filter order is known or imposed a priori. This is generally not the case, and often it is advantageous to use the shortest possible filter. A few methods to address this issue have been proposed including a multi-objective optimization formulation [6], selection rules based on information criteria [7], and full Bayesian model selection [8]. As a multi-objective optimization problem, the goal is to minimize a specified cost function for transfer function error as well as filter order [6]. The result is a set of candidate solutions along a Pareto front from which the user must choose, defining the tradeoff between design error and filter order. Though in principle capable of finding the lowest filter order, these methods are not validated with known solutions and are only demonstrated using standard tolerance schemes [6]. Another way to choose the order of a pole-zero or autoregressive moving average (ARMA) system is to use selection rules derived from asymptotic approximations of the Kullback-Leibler information [7]. The Akaike information criterion and Bayesian information criterion penalize log-maximum likelihood values with terms proportional to the number of parameters in the model [7]. These are simple to evaluate, but

2 as described below, a potentially more robust selection criterion is implicit when parameter values are estimated within the Bayesian framework. Bayesian formulations for model selection have been used along with Markov chain Monte Carlo (MCMC) integration for time-domain ARMA modeling and prediction, e.g. [9]. For problems like those in [1] [4], the desired transfer function and its constraints are expressed in the frequency domain. The authors of [8] advocate the interpretation of engineering design problems as inference, applying the tools of Bayesian inference to design signed power-of-two finite impulse response filters. The problems considered focus on producing filters to meet design criteria instead of the approximation problems found in [1] [4]. This work extends the Bayesian methods for time-domain ARMA modeling to frequency domain transfer functions. When constructed in the frequency domain, the desired transfer function may easily be expressed with any combination of magnitude response, phase response, and frequency weighting. In contrast with the optimization methods, the Bayesian framework solves the problems of coefficient estimation and filter order selection simultaneously within a unified framework, without resorting to approximate or ad hoc selection rules. Furthermore, the nested sampling algorithm [1] is shown to be effective for exploring the high-dimensional, multi-modal likelihood surface that is often problematic in pole-zero modeling. This Bayesian approach is first validated on a problem with a known solution and then used to generate pole-zero representations of more complicated systems based on measured data. Section II describes the Bayesian inference framework; Section III describes the details of numerical implementation, including the choice of parametric model and the integration method; Section IV-A presents a numerical example with simulated data to demonstrate model selection and parameter estimation in the context of filter design; and Section IV-B presents examples of filters designed to represent measured head-related transfer function data using the Bayesian framework. II. BAYESIAN INFERENCE Infinite impulse response filters, having transfer functions with both poles and zeros, are formulated as a ratio of polynomials in the complex frequency variable, z = e jω, where ω is angular frequency. This form may be interpreted as a parametric model, with the polynomial coefficients as parameters. The model and parameters are selected to approximate a desired transfer function, or data. In analogy to model-based inference, the goal of filter design is to determine filter coefficients such that the model provides the best approximation to the data. Bayesian inference results in a probability distribution for these parameters, as opposed to single values or a set of acceptable values with the

3 optimization or heuristic methods. If done properly, estimation of this distribution directly estimates the relative probability of competing models, which may be used to compare them. In this case, competing models are filters of different order. A. Parameter Estimation Parameter estimation in the context of filter design is the problem of estimating filter coefficients, given a filter order to achieve a desired response. Bayes theorem, the point of departure for Bayesian inference, follows directly from the product rule of probability theory. Applied to model-based inference, the posterior distribution of model parameters θ, given the data D and a model M, is p(θ M, D, I) = p(d θ, M, I) p(θ M, I). (1) p(d M, I) Each probability is conditioned on I, the relevant background information of the problem. The first term in the numerator is the likelihood function. The likelihood function indicates how well data represent the model M with parameters θ. This distribution must be assigned based on the available information I, which is taken to be that the proposed models are capable of describing the target transfer functions well enough that the associated errors are bounded. No other information like error statistics is available at the stage of this assignment. The only information available is equivalent to a finite, but unknown variance of the errors. Therefore, appealing to the principle of maximum entropy and marginalizing over the unknown variance [11], we may assign a Student s t-distribution, where p(d θ, M, I) L(θ) = (2π) K/2 Γ E = ( ) K E K/2, (2) 2 2 K [ D k H(ω k ) ] 2 + [arg D k arg H(ω k )] 2, (3) k=1 Γ( ) is the gamma function, and D k is the data point corresponding to ω k. Here H represents a candidate transfer function. Equation 3 represents the square of the errors in both magnitude and phase between the two transfer functions, evaluated at the K frequency points of the vector ω. In [8], a Gaussian likelihood with heuristically determined variance was used instead of the Student s t. This could result in more aggressive model comparison results as long as the noise variance can be suitably estimated. The second term in (1), p(θ M i, I), is the prior distribution of the parameters. In practice, this is assigned to be uniform over a reasonable finite range in order to avoid preference and keep the problem

4 tractable. Section III-B will construct the model such that permissible parameter values will dictate a prior with compact support. The term in the denominator is named the marginal likelihood, Bayesian evidence, or just evidence. The evidence may be thought of simply as a normalizing constant, and it is often neglected for applications strictly involving parameter estimation. In order for the posterior distribution to be normalized, the evidence, Z, must be p(d M, I) Z = θ p(d θ, M, I) p(θ M, I) dθ. (4) Only the relative value of the posterior is needed to estimate parameter values. In principle, the evidence term may be neglected, leaving the posterior unnormalized, and the equation may be rewritten as a proportionality; however, for model selection, the evidence term is critical. B. Model Comparison The posterior probability, according to Bayes theorem, for a model M i, given data D, and any relevant background information I, is p(m i D, I) = p(d M i, I) p(m i I), (5) p(d I) If there are multiple competing models, the ratio of their posterior probabilities may be used to rank them or indicate which model the data prefer. Given two models, M i, M j, the posterior ratio is p(m i D, I) p(m j D, I) = p(d M i, I) p(m i I) p(d M j, I) p(m j I) p(d M i, I) p(d M j, I). (6) Here, there is no reason to favor one model over the other based on background information, so the equivalent priors cancel, leaving the ratio of likelihoods. These likelihoods are identical to the evidence term in (1), so the models may be ranked by their evidence values given equal prior probability of each model. The mechanism that favors simpler models appears when the product of likelihood function and prior in (4) are integrated, whether seen as marginalization of the parameters or normalization of the posterior. Conceptually, Bayesian model selection is a comparison of average likelihood values instead of maximum likelihood values [11]. For a more complex model (often more parameters) to be favored, the maximum likelihood values must outweigh averaging over a larger parameter space [11]. Higher-order models will always fit as well or better, giving higher maximum likelihoods, but their larger parameter spaces may result in lower average likelihood. Embodying Ockham s razor, Bayesian model comparison will prefer the simplest model capable of representing the data.

5 The procedure for designing a filter within this framework will be to first select a set of candidate models filters of different order. Then, the posterior distribution in (1) is estimated for each model numerically with an algorithm known as nested sampling. In doing so, the evidence values are calculated and used to rank the models. Estimating the evidence results in estimates of the posterior distribution, which can be used to estimate expected values for the filter coefficients. Thus, the most concise filter order is determined along with coefficient estimates. III. PRACTICAL IMPLEMENTATION The calculations necessary to estimate the distributions in (1) and the integral in (4) are generally not possible analytically. A common approach is to use a Markov chain Monte Carlo method which generates samples approximating the posterior distribution in frequency [9], [11], [12]. Nested sampling is centered on computing the evidence, which is the quantity that enforces parsimonious model selection. In approximating the integral of (4), posterior samples are drawn, providing estimates of parameter values. For efficiency and usability for this type of method, the standard IIR transfer function must be modified slightly. Practical details for applying the Bayesian inference framework to filter design are described below. A. Nested Sampling The strategy behind nested sampling is to integrate (4) numerically such that the evidence and posterior may both be reliably estimated. Details beyond the summary below may be found in [1]. In practice, direct numerical integration over the parameter space may quickly become intractable as its dimension increases beyond two or three. Within the context of filter design, the parameter space occupies 2N + 1 dimensions, where N is the filter order, so an exhaustive search in this case is completely infeasible. The integral can be rewritten and rendered one-dimensional. Consider the constrained prior mass, ξ(λ) (, 1), that represents the amount of the prior in the region where the likelihood is greater than λ. Then, the integral for the evidence in (4) may be written [1] Z = 1 L(ξ) dξ. (7) This integral is now one-dimensional and no longer evaluated explicitly over the parameter space, so the computational effort does not scale exponentially with N. Samples are drawn from the prior mass and sorted from lowest likelihood, and prior mass of one, to the maximum likelihood which occupies

6 zero prior mass, as it is a single point. The integral is approximated by a sum Z = ˆn n L n (ξ n 1 ξ n ), (8) where n is the iteration index, and ˆn is the index where the sum is terminated. The likelihood at a given point in parameter space is easily evaluated using (2), but the prior mass enclosed by that equal-likelihood contour cannot in general be known. With an ensemble of M samples drawn uniformly from the prior, the prior mass induced by the least-likelihood sample of the population can be approximated by the mean of its distribution, e 1/M [1]. With a probabilistic estimate of the prior mass, the evidence accumulation proceeds as follows. A relatively small number of samples are drawn from the prior, and their likelihoods are evaluated. The sample with lowest likelihood, corresponding to the largest prior mass, is discarded from the ensemble and saved. To replace the discarded sample, a new sample is generated under the constraint that its likelihood is higher than the discarded sample. At each iteration one sample is discarded, and a new one is generated, continually approaching areas of smaller prior mass and higher likelihood. [1] The method used to generate a new sample is not specified by nested sampling, but below a fixed number of Metropolis-Hastings, random walk steps are taken from a sample already satisfying the likelihood constraint [1]. Samples drawn from outside the constrained prior are accepted with probability. The step size is adjusted as in [1], decreasing as samples are more frequently rejected. When the bulk of the likelihood has been discovered, the shrinkage will overpower successive increases in likelihood, and the evidence accumulation will level off. The iterations are terminated when log Z n log Z n 1 < δ, i.e. when d (log Z)/dn. A discussion of errors and uncertainties associated with estimation of log-evidence can be found in [1]. B. Choosing a Practical Model A ratio of polynomials in z is not ideal for use with the sampling method or for imposing constraints. The constraints for filters are not imposed on the polynomial coefficients, but rather the poles and zeros. A stable filter will have all poles inside the unit circle, and a minimum phase filter will also have all zeros inside the unit circle. The polynomials are factored so poles p and zeros q are explicit. The model, H, in cascade form and factored into first and second order sections is given by H(z) = G L l=1 z q l z p l M m=1 (z q m )(z q m) (z p m )(z p m), (9)

7 where p l, q l R, p m, q m C, G is a real-valued gain factor, and N = L + M is the order of the filter. Second-order sections allow complex poles and zeros, but in order to maintain real polynomial coefficients, they must come in conjugate pairs. In order to enforce the constraints, it is useful to express the model, and poles and zeros, in polar form. Each complex value has a radius r and phase angle φ. Multiplied out, the model becomes H(z) = G L l=1 z q l z p l M m=1 z 2 2r q,m cos φ q,m z + rq,m 2 z 2 2r p,m cos φ p,m z + rp,m 2. (1) Now the model is expressed in terms of 2N + 1 real-valued parameters, and the stability constraint is simply r p,m < 1, m and p l < 1. The parameters cos φ are bound between -1 and 1, so the only parameter without a well-defined range of possible values is the gain factor. For numerical implementation, the prior distribution of gain parameters must be chosen on a finite interval, which can be chosen to be large enough that no reasonable values are excluded. After these parameters are estimated, the expression in (1) may be multiplied out to recover the polynomial coefficients used in the corresponding time-domain difference equation. In many approaches to filter design the gain parameter is given special treatment because when poles and zeros are fixed, it is a linear parameter and has a least-squares solution. The simplest and most general way to handle the gain is to sample it directly with the other parameters. For most problems, this should result in negligible extra computation. Another way to handle the gain is to marginalize it integrate it out from the likelihood and thus the posterior [11]. The likelihood will differ from (2) and will not depend on the gain parameter. Then, after poles and zeros are estimated, the least-squares gain or some other gain could be applied. The resulting posterior will be more spread than the fully-parameterized posterior [11], but it will not require sampling the gain. Within the Bayesian framework, it is not appropriate to simply ignore the gain and use its least-squares solution at each step of the sampling procedure. It may also be observed that many previous approaches to IIR filter design rely similarly on estimating a portion of the parameters and then use least-squares solutions for the remaining parameters. For instance, an estimate of the denominator polynomial induces a unique, least-squares solution for the numerator coefficients. Nevertheless, in this work, the fully-parameterized posterior is estimated instead of the marginalized version for simplicity and for minimal uncertainty in the final parameter estimates.

8 IV. EXAMPLES First, an example using simulated data is used to demonstrate and validate model selection and parameter estimation with a known solution. The simulated data are generated from a known filter with arbitrarily selected poles and zeros. The selection criterion is expected to indicate the true order of the filter used to generate the data. Then, the Bayesian filter design framework is adapted to represent measured head-related transfer function (HRTF) data [1], [2]. Since the HRTF data do not correspond to a known order of filter, they cannot be used for validation. A. Simulated Data The model used to generate the simulated data is a sixth-order IIR filter with three sets of complex conjugate poles and zeros. The poles, zeros, and gain used to generate this data are listed in Table I. The model is evaluated on 512 linearly spaced frequency points on (,.5). So that the data cannot be identical to one of the candidate models, noise uniformly distributed on (.1,.1) is added to the transfer function. Within the nested sampling algorithm, 24 samples make up the population, and the iterations are terminated when the log-evidence changes less than 1 7 from one iteration to the next. The assumption is that the evidence has accumulated sufficiently such that subsequent iterations contribute negligibly to its final value. The size of the ensemble is not critically important, but if much smaller, the space may not be sufficiently sampled and either errors accumulate or the sampling is more likely to get stuck in a local minimum. On the other hand, a smaller population leads to faster calculation [1]. Five candidate models of orders N = 4 to N = 8 are used for comparison. For this letter, the maximum number of complex poles and zeros are used to increase the variation in achievable transfer functions. Nested sampling is used to estimate the coefficients and accumulate the evidence values for model comparison. The parameters and natural logarithm of the evidence are estimated 2 times for each model. Three representative transfer functions generated by estimated parameters are shown along with the data and corresponding estimated log-evidence in Figure 1. Figure 1(b) also shows maximum log-likelihood values to indicate goodness of fit. The standard deviations of maximum log-likelihood over all runs and for each order are.248,.1695,.16,.256,.1511 respectively, so it is difficult to also represent these with box and whisker plots. This is the expected behavior and indicates that the sampling consistently discovers a maximum likelihood value. The log-likelihood for orders six through eight are approximately equal and are limited by the added noise. Any further increase in log-likelihood will correspond to fitting the noise.

9 The log-evidence values for the fourth- and fifth-order filters are substantially below that of the three higher orders because they cannot entirely represent the data. This is also reflected by lower log-likelihoods. It may also be observed from the box and whisker plots that the calculated evidence values are penalized slightly for orders 7 and 8 even though the their likelihoods are nearly equal. Table I lists the poles and zeros used to generate the simulated data along with the estimated poles and zeros corresponding to the selected model, N = 6. The estimated values deviate slightly from the true values, but their precision is limited by the addition of artificial noise. Figure 2 illustrates the posterior distribution for the estimated poles and zeros, suggesting parameter uncertainties and the multimodal nature of the distribution. Figure 2(a) represents the very early stages of the sampling process where the unit circle is sampled fairly uniformly. As the sampling progresses toward its final stages, Figure 2(b), it converges toward a single set of poles and zeros. The spread of the distribution in higher frequencies reflects the relative prominence of noise in Figure 1(a) and the uncertainty of the associated parameter estimates. The clumping in Figure 2(b) near these poles and zeros suggests a nearby local maximum. This may serve as further motivation for using methods that can escape local maxima, like nested sampling. TABLE I TRUE AND ESTIMATED PARAMETERS, FROM THE POSTERIOR DISTRIBUTION OF THE N = 6 IIR FILTER MODEL. Parameter Actual Value Estimated Value p 1.7938 ± j.4187.794 ± j.4185 p 2.4563 ± j.6437.4565 ± j.6434 p 3.7562 ± j.4187.7542 ± j.421 q 1.9437 ± j.2812.9439 ± j.282 q 2.2812 ± j.9312.2768 ± j.9171 q 3.9312 ± j.2812.919 ± j.2782 G 1 1.351 As an indication of the computational effort required, the number of nested sampling iterations required to reach the stopping criterion ranged from 468, the minimum for the fourth-order filter, to 2553, the maximum for the eighth-order filter. Then, within each iteration, 25 Metropolis-Hastings steps were used to generate a new point within the likelihood constraint. To further improve computational efficiency, the new sample could be generated by means other than random walk Metropolis-Hastings.

1 B. Measured Data For demonstration of a real application, IIR filters are designed to represent measured head-related transfer functions (HRTF). Head-related transfer functions represent the directionally-dependent response between an external source and the eardrum. As they are influenced by reflections and resonances in the ear and around the head, it is reasonable to represent them by a pole-zero model in other words, an IIR filter [1]. To present spatial auditory images over headphones, HRTF filters must be convolved with each source and image source. Measurements are often represented on several hundred frequency points, so approximation by a recursive filter can result in substantially less computation [1], [13], [14]. Several authors have addressed the problem of low-order IIR approximations of HRTF, most using optimization methods [1], [2], [14], [15]. Some use a weighted least-squares approach [14], some use a gradient-based optimization method [1], and some use a combination of both [15]. HRTF approximations have also been approached with evolutionary algorithms, which are capable of global optimization [2]. In [13], balanced model truncation is used to convert FIR HRTF filters to low-order IIR filters. The contribution of the Bayesian approach is the ability to probabilistically estimate the requisite filter order, while also producing globally optimal parameter estimates. As proposed by several authors, deviation of the logarithm of the magnitude response is a more relevant error criterion given the logarithmic nature of the auditory system [1] [3], [14]. For this application, minimum phase filters are designed to represent the logarithmic magnitude response. Thus, (3) becomes K E = [log D k log H(ω k ) ] 2, (11) k=1 The error may also be weighted to account for more or less important regions of the response; however, since the focus of this work is filter design and not psychoacoustics, the frequencies are weighted uniformly. Aside from redefining the error, the coefficients and filter order are determined as above. One important drawback of the optimization methods that have been used to generate pole-zero representations of HRTF is that the order of the system was selected a priori [1], [2], [14]. The Bayesian approach is functionally akin to a heuristic global optimization method [2], [6], but it has the additional, model comparison mechanism. The HRTF data come from measurements of a KEMAR dummy head in an anechoic chamber [16]. The data for the first example are measured with the source at 45, in the horizontal plane, at the microphone in the left ear. Similar to the case with simulated data, six candidate models, N = 5, 6,..., 1, are compared. The distribution of log-evidence values for 15 runs are shown in Figure 2(b). This evidence trend exhibits two plateaus, one between N = 6 and N = 7, and a smaller one between N = 8 and

11 N = 9. In this case, the evidence for N = 9 is distinctly higher than that of the other models, so it is the most concise model that can represent this particular transfer function. Many previous methods report average error or root mean square error (RMSE), so in Table II, mean and standard deviation of RMSE is reported along with median log-likelihood for all of the runs. It can be seen that this error is comparable to that reported in [3], [13], [15], for example. It should be observed that this error depends on the quality of the measurement and amount of noise, so a direct comparison across the literature is difficult. The important distinction is that the Bayesian approach yields comparable parameter estimates but also an estimate of filter order. The results for model comparison are summarized in Figure 3. TABLE II MEDIAN LOG-LIKELIHOOD VALUES FOR THE HRTF EXAMPLE WITH MEAN AND STANDARD DEVIATION RMSE FOR COMPARISON WITH PREVIOUSLY PUBLISHED RESULTS. Filter Order Log-Likelihood RMSE 5 24.686.2166 ±.71 6 62.143.1862 ±..7 7 13.368.1583 ±..7 8 125.912.1464 ±..8 9 152.172.1321 ±..22 1 151.237.1321 ±..17 The filter orders previously reported in the literature vary widely. The weighted least-squares methods often result in overly-complex models of order 3 to 4 [1], [14]. Methods capable of global optimization or that use gradient-based methods in addition or instead of least-squares solutions tend to find good fits with filters in the neighborhood of tenth-order [2], [13], [15]. One point that can be seen from the above data, is that if each HRTF is to be represented most concisely, the same order should not be applied across the database. The model selection procedure is carried out as described above for all available directions, and Figure 4 illustrates the results for six representative directions. The final estimated magnitude spectra are shown with the measured HRTF data used to produce them, but only the models with the greatest evidence are shown, i.e., the selected filter orders. The directions shown lie in the horizontal plane with azimuthal angles as indicated in the figure. These examples demonstrate the method s ability to generate filters to represent a range of transfer functions, not specified by tolerance schemes, while also choosing the best

12 filter order. V. CONCLUSION Most methods used for digital filter design assume that the order of the filter has already been chosen. This work demonstrates a Bayesian model-based approach that provides simultaneous solutions to filter order selection and coefficient estimation. The Bayesian evidence selection criterion implicitly favors simpler models over more complex models given similar explanatory power, which leads to the most concise possible filter order capable of representing the data. Nested sampling is used to perform the numerical calculations and yields estimates of filter coefficients and the best filter order to represent the desired transfer function. A numerical example with simulated, known data is used to demonstrate both aspects of inference-based filter design, in particular that the model selection criterion does in fact choose the most concise filter order. Finally, the method is demonstrated with measured head-related transfer function data to show its range of applicability. Future work could address some of the psychoacoustic implications of Bayesian filter order selection with regard to HRTF filters. This should lead to proper weighting and error definitions to elicit the most appropriate HRTF representation. Bayesian filter design could also be extended to other applications like loudspeaker equalization [4] or speech modeling [3]. REFERENCES [1] M. Blommer and G. Wakefield, Pole-zero approximations for head-related transfer functions using a logarithmic error criterion, IEEE Speech Audio Process., vol. 5, no. 3, pp. 278 287, may 1997. [2] E. Durant and G. Wakefield, Efficient model fitting using a genetic algorithm: pole-zero approximations of HRTFs, IEEE Speech Audio Process., vol. 1, no. 1, pp. 18 27, jan 22. [3] D. Marelli and P. Balazs, On pole-zero model estimation methods minimizing a logarithmic criterion for speech analysis, IEEE Audio, Speech, Language Process., vol. 18, no. 2, pp. 237 248, feb. 21. [4] G. Ramos and J. J. López, Filter design method for loudspeaker equalization based on IIR parametric filters, J. Audio Eng. Soc., vol. 54, no. 12, pp. 1162 1178, 26. [5] A. Antoniou, Digital signal processing: signals, systems, and filters. McGraw-Hill New York, 26. [6] K. Tang, K. Man, S. Kwong, and Z. Liu, Design and optimization of IIR filter structure using hierarchical genetic algorithms, IEEE Trans. Ind. Electron., vol. 45, no. 3, pp. 481 487, 1998. [7] P. Stoica and Y. Selen, Model-order selection: A review of information criterion rules, IEEE Signal Process. Mag., vol. 21, no. 4, pp. 36 47, 24. [8] C.-Y. Chan and P. M. Goggans, Using bayesian inference for the design of fir filters with signed power-of-two coefficients, Signal Process., no., 212. [9] E. Punskaya, C. Andrieu, A. Doucet, and W. Fitzgerald, Bayesian curve fitting using MCMC with applications to signal segmentation, IEEE Trans. Signal Process., vol. 5, no. 3, pp. 747 758, March 22.

13 [1] J. Skilling, Nested sampling for general Bayesian computation, Bayesian Analysis, vol. 1, no. 4, pp. 833 86, 26. [11] P. Gregory, Bayesian logical data analysis for the physical sciences: A comparative approach with Mathematica support. Cambridge University Press, 25. [12] C.-Y. Chan and P. M. Goggans, Using bayesian inference for linear phase log FIR filter design, Bayesian Inference and Maximum Entropy Methods in Sci. and Eng., vol. 1193, no. 1, pp. 329 335, 29. [13] J. Mackenzie, J. Huopaniemi, V. Välimäki, and I. Kale, Low-order modeling of head-related transfer functions using balanced model truncation, IEEE Signal Process. Lett, vol. 4, no. 2, pp. 39 41, feb. 1997. [14] A. Kulkarni and H. S. Colburn, Infinite-impulse-response models of the head-related transfer function, J. Acoust. Soc. Am., vol. 115, no. 4, pp. 1714 1728, 24. [15] M. Queiroz and G. de Sousa, Efficient binaural rendering of moving sound sources using hrtf interpolation, J. New Music Research, vol. 4, no. 3, pp. 239 252, 211. [16] W. Gardner and K. Martin, HRTF measurements of a KEMAR, J. Acoust. Soc. Am., vol. 97, p. 397, 1995.

14 1 (a) Log-Evidence Phase Magnitude 1.1 2 Data N = 6 2 N = 4 N = 8 4.125.25.375.5 Normalized Frequency 1 9 8 7 6 5 4 3 2 (b) 1 Median Log-Likelihood 4 5 6 7 8 Filter Order Fig. 1. Simulated data example. (a) Log-magnitude and phase responses for filters of order 4, 6, and 8 using a posteriori parameter estimates. (b) Estimated log-evidence for 2 runs. The center line indicates the median, the edges of the box indicate quartile values, and the whiskers are maxima and minima. Dashed lines indicate median maximum log-likelihoods.

15 1 (a) 1.5.5 -.5 -.5-1 -1-1 Fig. 2. -.5.5 1 (b) -1 -.5.5 1 Posterior samples from the simulated data example. (a) First 5 samples for the N = 6 model and simulated data. Each sample consists of six complex conjugate poles and zeros. The poles (x) and zeros (o) used to generate the data are superimposed on top of the initial samples. Grayscale indicates relative log-posterior value, black being the maximum and white the minimum. (b) Representative final posterior samples with the first 8 samples omitted for visual clarity.

16 1 1 1 Magnitude 1 1 1 2 1 3 Data N = 6 N = 7 N = 9 1 4.125.25.375.5 Normalized Frequency (a) 1 8 Log Evidence 6 4 2 (b) 5 6 7 8 9 1 Filter Order Fig. 3. Head related transfer function example. (a) Measured data with selected estimated transfer functions. (b) Estimated log-evidence for each candidate model order. The box and whisker plot indicates the distribution of estimated values with the calculation repeated fifteen times.

17 15 15 15 15 15 º 6º N = 11 N = 11 HRTF Magnitude (db) 15 3 15 15 15 12º 18º N = 1 N = 9 15 15 15 24º 3º N = 13 N = 13 3 45 Data Model.125.25.375.5 Normalized Frequency Fig. 4. Head-related transfer function filters along with the data used to produce them for six equally-spaced directions. All measured data are taken in the horizontal plane in the left ear. Only models corresponding to the selected model order are shown for visual clarity. Vertical increments correspond to 15 db.