Boosting Variational Inference

Size: px
Start display at page:

Download "Boosting Variational Inference"

Transcription

1 Boosting Variational Inference Fangjian Guo MIT Xiangyu Wang Duke University Kai Fan Duke University Tamara Broderick MIT David Dunson Duke University Abstract Modern Bayesian inference tyically requires some form of osterior aroximation, and mean-field variational inference (MFVI) is an increasingly oular choice due to its seed. But MFVI can be inaccurate in various asects, including an inability to cature multimodality in the osterior and underestimation of the osterior covariance. These issues arise since MFVI considers aroximations to the osterior only in a family of factorized distributions. We instead consider a much more flexible aroximating family consisting of all ossible finite mixtures of a arametric base distribution (e.g., Gaussian). In order to efficiently find a high-quality osterior aroximation within this family, we borrow ideas from gradient boosting and roose boosting variational inference (BVI). BVI iteratively imroves the current aroximation by mixing it with a new comonent from the base distribution family. We develo ractical algorithms for BVI and demonstrate their erformance on both real and simulated data. 1 Introduction Bayesian inference offers a flexible framework for learning with rich, hierarchical models of data and for coherently quantifying uncertainty in unknown arameters through the osterior distribution. However, for any moderately comlex model, the osterior is intractable to calculate exactly and must be aroximated. Mean-field variational inference (MFVI) has grown in oularity as a method for aroximating the osterior since it is often fast even for large data sets. MFVI is fast in art because it formulates osterior aroximation as an otimization roblem, and leads to an efficient coordinate-ascent algorithm when there is certain structure within the model, known as conditional conjugacy [3]. Such conjugacy roerties tyically only hold for factorization aroximations, which effectively cannot cature multimodality and underestimate the osterior covariance, sometimes drastically [1, 28, 27, 25, 17]. The linear resonse technique [13] and the full-rank aroach within [15] rovide a correction to the covariance underestimation of MFVI in the unimodal case but do not address the multimodality issue. Black-box inference, as in [23] and the mean-field aroach within [15], focus on making the MFVI otimization roblem easier for ractitioners, by avoiding tedious calculations, but they do not change the otimization objective of MFVI and therefore still face the roblems outlined here. An alternative and more flexible class of aroximating distributions for variational inference (VI) is the family of mixture models. Indeed, even if we consider only Gaussian base distributions, one can find a mixture of Gaussians that is arbitrarily close to any continuous robability density [7, 20]. [2, 14, 12] have reviously considered using aroximating families with a fixed number of mixture comonents; these authors also emloy a further aroximation to the VI otimization objective. The resulting otimization algorithms have clear ractical limitations, which limit the ability to find a good aroximation in the mixture model family. In articular, there is large sensitivity to initial values, for good erformance algorithms may need to be rerun for many different initializations and comonent numbers, and the aroximation to the VI objective can limit flexibility. 29th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Sain.

2 As a much more effective algorithm, which automatically adats the number of mixture comonents to the comlexity of the osterior and increases aroximation accuracy, we roose a novel aroach insired by boosting. We call our method boosting variational inference (BVI). BVI starts with a single-comonent aroximation and roceeds to add a new mixture comonent at each ste. Indeendent to our work, we notice that this idea is also considered by a concurrent aer [18]. 2 Variational inference and Gaussian mixtures Suose we observe N data oints, collected in the matrix X with N rows. A Bayesian model is secified through a rior π(θ) and likelihood (X θ), yielding the osterior (θ X) π(θ)(x θ) =: f(θ) by the Bayes Theorem, with θ R D. While f is easy to evaluate, the normalizing constant (X) involves an intractable integral, so an aroximation is needed. The osterior (θ X) is rarely used directly; rather, we would often like to reort a mean, covariance, or other osterior functional E g(θ) = (θ X)g(θ)dθ for some function g. E.g., g(θ) = θ yields the osterior mean. One aroach to aroximate the osterior is as follows. Choose a discreancy D between distribution q(θ) and the exact osterior X (θ) := (θ X). We assume D is non-negative and zero only when q = X. In general, though, the otimum D := inf q H D(q, X ) over some constrained family of distributions H may be strictly greater than zero. Roughly, we exect a larger H to yield a lower D but at a higher comutational cost. When D(q, X ) = D KL (q X ), the Kullback-Leibler (KL) divergence between q and X, this otimization roblem is called variational inference. A articularly flexible choice for H is the family of all finite mixtures. More recisely, let h φ (θ) be some arametric distribution over θ with arameter φ Φ. E.g., for a Gaussian mixtures, h φ (θ) = N µ,σ (θ) with mean vector µ and ositive semidefinite covariance Σ. Let k denote the (k-1)-dimensional simlex: k = {w R k : k j=1 w j = 1 & j, w j 0}. Then H k is the set of all k-comonent mixtures over these base distributions, and H is the set of all finite mixtures, namely k H k = {h : h(θ) = w j h φj (θ), w k, φ Φ k }, H = H k. (1) j=1 Our main contribution is to roose a novel algorithm for aroximately solving this discreancy minimization roblem. 3 Boosting and Gradient Boosting In general, reaching D may require an infinite mixture. We consider a greedy, incremental rocedure, as in [29], to aroach D with a sequence of finite mixtures q 1, q 2, for each q t H t. The quality of aroximation can be measured with the excess discreancy D(q t ) := D(q t, X ) D 0, and we would like D(q t ) 0. Hence, given any ɛ > 0, we can find a large enough t such that D(q t ) ɛ. In articular, we start with a single base distribution q 1 = h φ1 for some φ 1. Iteratively, at each ste t = 2, 3,..., let q t 1 be the aroximation from the revious ste. Form q t by mixing a new base distribution h t with weight α t [0, 1] together with q t 1 with weight (1 α t ). This aroach is called greedy [22, Ch. 4] if we choose (aroximately) otimal base distribution h t and weight α t at each ste t: q t = (1 α t )q t 1 + α t h t, D(q t, X ) inf φ Φ,0 α 1 D((1 α)q t 1 + αh φ, X ) + ɛ t, (2) where we relax otimality to within some non-negative sequence ɛ t 0. At each ste q t remains normalized by construction and takes the form of a mixture of base distributions. The iterative udates are in the style of boosting or greedy error minimization [9, 8, 10, 16]. Under convexity and strong smoothness conditions on D(, x ), Theorem II.1 of [29] guarantees that D(q t ) converges to zero at rate O(1/t). We will verify that KL divergence satisfies these conditions in Theorem 1. Gradient Boosting Let D(q) := D(q, X ) as a shorthand notation. Rather than jointly otimizing D((1 α t )q t 1 +α t h t ) over (α t, h t ), which may be non-convex and difficult in general, we consider nearly otimal choices (cf. Eq. (2)). We choose h t first, in a gradient descent style, and then otimize the corresonding weight α t. For h t, we follow gradient boosting [11] and consider the functional gradient D(q) at the current solution q = q t 1. In what follows, we adot the notation g, h = g(θ)h(θ)dθ and h 2 2 = h(θ) 2 dθ. When h 2 0, a Taylor exansion yields k=1 D(q + h) = D(q) + g, h + o( h 2 2), (3) 2

3 where g is the functional gradient D(q) := g(θ). We choose h to minimize the inner roduct in Eq. (3); i.e., as in gradient descent, we choose h to match the direction of D(q). 4 Boosting Variational Inference Boosting variational inference (BVI) alies the framework of the revious section with Kullback- Leibler (KL) divergence as the discreancy measure. We first justify the choice of KL, and then resent a two-stage multivariate Gaussian mixture boosting algorithm (Algorithm 1) to stochastically decrease the excess discreancy. In each iteration, it firstly identifies h t with gradient boosting, and then solves for α t. The KL discreancy measure is defined as D(q, X ) = D KL (q X ) = q(θ) log q(θ) dθ = log (X) + X (θ) q(θ) log q(θ) dθ. (4) f(θ) By droing the constant log (X), an effective discreancy (negative value of ELBO [3]) can be defined as D KL (q) = q(θ) log q(θ) f(θ) dθ. The following theorem shows that KL satisfies the greedy boosting conditions [29, 22] under some additional assumtions (that may, e.g., hold on a bounded set). See Aendix A for roof and discussions. This trivially imlies that the conditions also hold for D KL (q). See Aendix C for a similar analysis of KL divergence in the other direction, D( X q). Theorem 1. Given densities q 1, q 2 and true density, KL divergence is a convex functional, i.e., for any α [0, 1] satisfying D KL ((1 α)q 1 + αq 2 ) (1 α) D KL (q 1 ) + α D KL (q 2 ). (5) If we further assume that densities are bounded q 1 (θ), q 2 (θ) a > 0, and denote the functional gradient of KL at density q as D KL (q) = log q(θ) log (θ), then the KL divergence is also strongly smooth, i.e., satisfying D KL (q 2 ) D KL (q 1 ) D KL (q 1 ), q 2 q a q 2 q (6) Setting α t with Stochastic Newton For fixed h t, DKL is a convex function of α t (see (11) in Aendix B). We can estimate both its first and second derivatives with Monte Carlo, by drawing samles from h t and q t 1. Then we can use a stochastic 2 nd -order method, e.g., Newton s [5], to solve α t. See Aendix B for details. Setting h t with Lalacian Gradient Boosting Following the idea of Eq. (3) for gradient boosting, we take the Taylor exansion of D KL (q t ) around α t 0 D KL (q t ) = D KL (q t 1 ) + α t h t, log q t 1 f α t q t 1, log q t 1 f + o(α2 t ), (7) which suggests minimizing h t, log qt 1 f qt 1, where log f is the functional gradient D KL (q t 1 ). However, direct minimization of the inner roduct is ill-osed since h t will degenerate to a oint mass at the minimum of functional gradient. Instead, following the least square rocedure of [11], we match the direction in terms of l 2 norm by ĥt = arg min h=hφ,λ>0 λh log(f/q t 1 ) 2 2. Plugging in the otimal value for λ, and with some algebra the objective is identical to ĥ t = arg max E θ h log(f(θ)/q t 1 (θ)) 1/2 log h 2 2, (8) h=h φ where log(f/q t 1 ) is effectively the residual log-likelihood. Ideally, when q t 1 f, the residual should be flat; otherwise, its has eaks where osterior density is underestimated and basins where overestimated. For Gaussian family h φ (θ) = N µ,σ (θ), (8) becomes ĥ µt,σ t = arg max µ,σ E θ Nµ,Σ log(f(θ)/q t 1 (θ)) + 1/4 log Σ. (9) We notice that the log determinant term revents degeneration. We therefore roose the following heuristic (Algorithm 1) to efficiently otimize (9): we aroximately decomose the residual log(f/q t 1 ) into a constant lus a quadratic eak 1 2 (θ η)t S 1 (θ η) (i.e. Lalacian aroximation to f/q t 1 ), and then (9) becomes convex and solutions are in closed-form as µ = η, Σ = S/2, where η and S 1 can be solved numerically with any suitable otimization routine. 3

4 Algorithm 1 Lalacian Gradient Boosting for Gaussian Base Family N µ,σ Require: evaluable roduct density of rior and likelihood f(θ) for θ R D Start with some initial aroximation q 1 = N µ1,σ 1 e.g. q 1 = N 0,cI for t = 2 to T do ˆµ t arg min θ log(q t 1 (θ)/f(θ)) with an otimization routine initialized at θ 0 q t 1. H t Hessian θ= ˆµt log(q t 1 (θ)/f(θ)) with numerical aroximation ˆΣ t Ht 1 /2 and let ĥt(θ) = N (θ ˆµ t, ˆΣ t ) be the new comonent ˆα t arg min αt DKL ((1 α t )q t 1 + α t ĥ t ) with Algorithm 2. See Aendix B q t (1 ˆα t )q t 1 + ˆα t ĥ t. Boosting end for return q t 5 Exeriments In this section, we comare erformance of BVI to MFVI with both toy and real-world data. Toy Examles Figure 1 highlights the ability of BVI (unlike MFVI) to cature (a) heavy tails (b) 1 multimodality and (c) multivariate distributions. Figure 1 (a) is a Cauchy density X (θ) 1+(θ/2) ; 2 (b) is a mixture of univariate Gaussians with different locations and scales; (c) is a mixture of five 2D-Gaussians with random locations and covariances. We initialize with a Gaussian with very large (co)variance, and then run BVI for 50 iterations. Sequences (α t ) t and ( ˆD KL (q t, )) t are shown in sublots. Since these distributions are non-conjugate, we run the automatic variational inference (ADVI) in Stan [15, 6] to obtain results for MFVI (orange). Figure 1: Toy examles: (a) (heavy-tailed) Cauchy distribution (b) a mixture of four univariate Gaussians and (c) a mixture of five bivariate Gaussians with random means and covariances. Curves/contours are colored by blue (true), red (BVI), green (initial q 1 in BVI) and orange (ADVI). Sequence of (α t ) t and Monte Carlo estimates of (D KL (q t )) t are lotted against iteration in sublots. 1 Mean estimates 0.4 Variance estimates 0.05 Covariance estimates Estimate 0 1 ADVI mean field BVI True (MCMC) Estimate True (MCMC) Estimate True (MCMC) Figure 2: Estimated osterior mean and covariance for logistic regression (dashed: estimate = true). Bayesian Logistic Regression We aly our algorithm to Bayesian logistic regression for the Nodal dataset [4], consisting of N = 53 observations of six redictors x i (intercet included) and a binary resonse y i { 1, +1}. The likelihood is N i=1 g(y ix i β), where g(x) = (1 + e x ) 1 and we use the rior β N (0, I). For reference, we show results from the Polya-Gamma samler (an MCMC algorithm for logistic regression) using R ackage BayesLogit [21] as the ground truth. We also comare the erformance of BVI to MFVI (ADVI with Stan). As shown in Figure 2, while both methods cature the correct mean, BVI rovides better estimates of the variance and, unlike MFVI, does not set the covariances to zero. We exect more dramatic differences in cases where MFVI yields biased estimates of the osterior means [26] and cases where the osterior is multimodal. We lan to investigate these cases in future work. 4

5 References [1] C. M. Bisho. Pattern Recognition and Machine Learning, chater 10. Sringer, New York, [2] C. M. Bisho, N. Lawrence, T. Jaakkola, and M. I. Jordan. Aroximating osterior distributions in belief networks using mixtures. In Advances in Neural Information Processing Systems 10: Proceedings of the 1997 Conference, volume 10, age 416, [3] D. M. Blei, A. Kucukelbir, and J. D. McAuliffe. Variational inference: A review for statisticians. arxiv rerint arxiv: , [4] B. Brown. Prediction analyses for binary data. Biostatistics Casebook, ages 3 18, [5] R. H. Byrd, S. Hansen, J. Nocedal, and Y. Singer. A stochastic quasi-newton method for large-scale otimization. SIAM Journal on Otimization, 26(2): , [6] B. Carenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. A. Brubaker, J. Guo, P. Li, and A. Riddell. Stan: A robabilistic rogramming language. J Stat Softw, [7] V. A. Eanechnikov. Non-arametric estimation of a multivariate robability density. Theory of Probability &am; Its Alications, 14(1): , [8] Y. Freund, R. Schaire, and N. Abe. A short introduction to boosting. Journal-Jaanese Society For Artificial Intelligence, 14( ):1612, [9] Y. Freund and R. E. Schaire. A desicion-theoretic generalization of on-line learning and an alication to boosting. In Euroean conference on comutational learning theory, ages Sringer, [10] J. Friedman, T. Hastie, R. Tibshirani, et al. Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The annals of statistics, 28(2): , [11] J. H. Friedman. Greedy function aroximation: a gradient boosting machine. Annals of statistics, ages , [12] S. Gershman, M. Hoffman, and D. Blei. Nonarametric variational inference. In Proceedings of the 29th International Conference on Machine Learning, ages , New York, NY, USA, [13] R. J. Giordano, T. Broderick, and M. I. Jordan. Linear resonse methods for accurate covariance estimates from mean field variational bayes. In Advances in Neural Information Processing Systems, ages , [14] T. Jaakkola and M. I. Jordan. Imroving the mean field aroximation via the use of mixture distributions. In Learning in grahical models, ages Sringer, [15] A. Kucukelbir, R. Ranganath, A. Gelman, and D. Blei. Automatic variational inference in stan. In Advances in neural information rocessing systems, ages , [16] J. Q. Li and A. R. Barron. Mixture density estimation. In Advances in neural information rocessing systems, ages , [17] D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms, chater 33. Cambridge University Press, [18] A. C. Miller, N. Foti, and R. P. Adams. Variational boosting: Iteratively refining osterior aroximations. arxiv rerint arxiv: , [19] T. Minka et al. Divergence measures and message assing. Technical reort, Technical reort, Microsoft Research, [20] E. Parzen. On estimation of a robability density function and mode. The annals of mathematical statistics, 33(3): , [21] N. G. Polson, J. G. Scott, and J. Windle. Bayesian inference for logistic models using ólya gamma latent variables. Journal of the American Statistical Association, 108(504): , [22] A. Rakhlin. Alications of emirical rocesses in learning theory: algorithmic stability and generalization bounds. PhD thesis, Massachusetts Institute of Technology, [23] R. Ranganath, S. Gerrish, and D. M. Blei. Black box variational inference. In AISTATS, ages ,

6 [24] H. Robbins and S. Monro. A stochastic aroximation method. The Annals of Mathematical Statistics, ages , [25] H. Rue, S. Martino, and N. Choin. Aroximate Bayesian inference for latent Gaussian models by using integrated nested Lalace aroximations. Journal of the Royal Statistical Society: Series B (statistical methodology), 71(2): , [26] R. E. Turner and M. Sahani. Two roblems with variational exectation maximisation for time-series models. In Worksho on Inference and Estimation in Probabilistic Time-Series Models, volume 2, [27] R. E. Turner and M. Sahani. Two roblems with variational exectation maximisation for time-series models. In D. Barber, A. T. Cemgil, and S. Chiaa, editors, Bayesian Time Series Models. Cambridge University Press, [28] B. Wang and M. Titterington. Inadequacy of interval estimates corresonding to variational Bayesian aroximations. In Worksho on Artificial Intelligence and Statistics, ages , [29] T. Zhang. Sequential greedy aroximation for certain convex otimization roblems. IEEE Transactions on Information Theory, 49(3): ,

7 Aendices A Proof of Theorem 1 Proof. For any densities q 1, q 2 and α [0, 1], we have the convexity from D KL ((1 α)q 1 + αq 2 ) = (1 α) q 1 log (1 α)q 1 + αq 2 dθ + α q 2 log (1 α)q 1 + αq 2 dθ = (1 α) q 1 log q 1[1 + α(q 2 /q 1 1)] dθ + α q 2 log q 2[1 + (1 α)(q 1 /q 2 1)] dθ = (1 α) q 1 log q 1 dθ + α q 2 log q 2 dθ + (1 α) q 1 log[1 + α(q 2 /q 1 1)]dθ + α q 2 log[1 + (1 α)(q 1 /q 2 1)]dθ (1 α) D KL (q 1 ) + α D KL (q 2 ) + (1 α) α(q 2 q 1 )dθ + α (1 α)(q 1 q 2 )dθ (1 α) D KL (q 1 ) + α D KL (q 2 ), where we have used log(1 + x) x and (q 2 q 1 )dθ = 0. Let h = q 2 q 1, then we have h(θ)dθ = (q 2 q 1 )dθ = 0. Again, using the inequality log(1 + x) x, we have the strong smoothness from D KL (q 1 + h ) D KL (q 1 ) = q 1 + h, log(q 1 + h) q 1, log q 1 h, log q 1 + h, h q 1 + log q 1 q 1, log q 1 h, log = h, log q 1 + h, h q 1 h, log log q 1 log, h + 1 a h 2 2, where in the last ste we used the assumtion that q 1, q 2 a > 0. Note that here we assume that densities are lower-bounded by a > 0. While this may seem unrealistic for some densities, this is a technical requirement to ensure that the discreancy is smooth enough so that greedy boosting can be alied. For densities on R D that do not have a lower bound, we suggest consider aroximation within (1 ɛ) of robability mass. In articular, consider a closed set Θ R D such that Θ x(θ)dθ = (1 ɛ) for a small ɛ > 0. Then we can use the smallest density in Θ as a lower bound. B Stochastic Newton s Algorithm for Solving Mixing Weights Fixing h t, by taking derivatives of effective discreancy with resect to α t, we have D KL (q t ) = α t 2 DKL (q t ) α 2 t (h t q t 1 ) log (1 α t)q t 1 + α t h t dθ = f = (h t q t 1 ) 2 dθ = E η αt (θ) (1 α t )q t 1 + α t h t θ ht E γ αt (θ) E γ αt (θ), (10) θ ht θ q t 1 E η αt (θ) 0, (11) θ q t 1 where γ αt and η αt are evaluable functions of θ and α t given below in Algorithm 2. Because the first and second derivatives are stochastically estimated instead of exact, to ensure convergence we use a decaying sequence of ste sizes b/k in Algorithm 2 that satisfy the Robbins-Monro conditions [5, 24]. 7

8 Algorithm 2 Stochastic Newton s Require: current aroximation q t 1 (θ), new comonent h t (θ), roduct density of rior and likelihood f(θ), Monte Carlo samle size n, initial ste size b > 0 Initialize k 0, α t 0 Indeendently draw {θ (h) i } h t and {θ (q) i } q t 1 for i = 1,, n while α t not convergent do Comute ˆD KL = 1 n i (γ α t (θ (h) i ) γ αt (θ (q) i )) with γ αt (θ) = log (1 αt)qt 1(θ)+αtht(θ) f(θ) Comute ˆD KL = 1 n i (ηα t(θ (h) i ) ηα t (θ (q) h i )) with η αt (θ) = t(θ) q t 1(θ) (1 α t)q t 1(θ)+α th t(θ) k k + 1, α t α t (b/k) ˆD KL / ˆD KL end while C BVI with KL Divergence in the Alternative Direction KL divergence is asymmetric and can be written in two directions [19]: D KL (q t X ) and D KL ( X q t ). We have discussed BVI algorithm in the main text based on the exclusive direction D KL (q t X ); in this Aendix, we show that a similar algorithm can be derived with the alternative direction, namely the inclusive direction D KL ( X q t ) = X (θ) log X(θ) q t (θ) dθ = X (θ) log X (θ)dθ c f(θ) log q t (θ)dθ. (12) Here c > 0 is the normalizing constant. Hence, by droing constants that do not involve q t, we can similarly define an effective discreancy as D kl (q) = f(θ) log q(θ)dθ, (13) where we use lower-case italic subscrit kl to distinguish it from the revious KL divergence. Fixing X, we will use the notation D kl (q) := D KL ( X q). Before roceeding to the derivation of algorithm, we firstly show that D KL ( X q t ) also satisfies the regularity conditions required by greedy boosting. Lemma 1. D KL ( q) is a convex functional of q, i.e., given densities, q 1, q 2, it holds that for all α [0, 1]. D KL ( (1 α)q 1 + αq 2 ) (1 α) D KL ( q 1 ) + α D KL ( q 2 ) (14) Proof. For any α [0, 1], by Jensen s inequality, log[(1 α)q 1 + αq 2 ] (1 α) log q 1 + α log q 2. Then it directly follows that D KL ( (1 α)q 1 + αq 2 ) (1 α) D KL ( q 1 ) + α D KL ( q 2 ). Another condition required by boosting framework of [29] is strong smoothness. It has been shown that for D KL ( q) to satisfy strong smoothness, it suffices to require an uer bound on the log ratio of base family densities [16] [22, Chater 4]: a = su q 1,q 2 {h φ :φ Φ}, θ R D This condition might be translated to constraints on the arameter sace Φ. log q 1 (θ) < +. (15) q 2 (θ) 8

9 Solving α t Again, we firstly show that by fixing h t, otimizing α t is a convex roblem, and can be solved with stochastic Newton s method. Since in D kl the integral measure does not involve α t, we have α D kl = f(θ) log q t(θ) h t q t 1 dθ = f(θ) dθ, (16) α (1 α)q t 1 + αh t 2 α D 2 kl = f(θ) 2 log q t (θ) (h t q t 1 ) 2 α 2 dθ = f(θ) dθ 0. (17) [(1 α)q t 1 + αh t ] 2 Notice that as we cannot samle from the true osterior X (θ) f(θ), we cannot directly estimate these derivatives with Monte Carlo. However, imortance samling can be alied here, by rewriting the derivatives as α D f(θ) h t (θ) q t 1 (θ) kl = E (18) θ q t 1 q t 1 (θ) (1 α)q t 1 (θ) + αh t (θ) 2 α 2 D kl = E θ q t 1 f(θ) (h t (θ) q t 1 (θ)) 2 q t 1 (θ) [(1 α)q t 1 (θ) + αh t (θ)] 2. (19) Hence we can draw articles θ i q t 1 (θ) and reweigh it with w i = f(θ i )/q t 1 (θ i ). And we have the following algorithm. Algorithm 3 Stochastic Newton s with D kl (q t ) = D KL ( x q t ) Require: current aroximation q t 1 (θ), new comonent h t (θ), roduct density of rior and likelihood f(θ), imortance samling samle size n, initial ste size b > 0 Initialize k 0, α t 0 Indeendently draw {θ i } q t 1 for i = 1,, n w i f(θ i )/q t 1 (θ i ) for i = 1,, n while α t not convergent do h s i t(θ i) q t 1(θ i) (1 α t)q t 1(θ i)+α th t(θ i) for i = 1,, n Comute ˆD kl = 1 n i w is i Comute ˆD kl = 1 n i w is 2 i k k + 1, α t α t (b/k) ˆD kl / ˆD kl end while Setting h t It turns out with D kl we can derive the same gradient boosting rocedure as Algorithm 1. Again, with Taylor exansion of the effective discreancy at α t 0, we have D kl (q t ) = D kl (q t 1 ) α t f(θ)/q t 1 (θ), h t (θ) + α t f(θ)dθ + o(α 2 t ). (20) Similarly, to avoid degeneracy, we consider which is equivalent to min λh f/q t 1 2 2, (21) h=h φ,λ>0 max log E (L(θ)/q t 1 (θ)) 1/2 log h 2 h=h φ θ h 2. (22) By Jensen s inequality, we observe that the revious objective in Eqs. (8) is in fact a lower bound of this objective, namely log E θ h (f(θ)/q t 1 (θ)) 1/2 log h 2 2 E θ h log(f(θ)/q t 1 (θ)) 1/2 log h 2 2. (23) And we further notice that this bound is iteratively tightened, since as q t 1 (θ) better aroximates X (θ), f(θ)/q t 1 (θ) X (θ)/q t 1 (θ) will converge to a constant. By maximizing the lower bound, we arrive at the same gradient boosting objective, and hence the same Lalacian gradient boosting subroutine for setting the new comonent h t. To summarize, with KL divergence in the alternative direction D KL ( X q t ) as the discreancy measure, we derive an algorithm almost identical to Algorithm 1, excet that the ste for solving ˆα t is erformed with Algorithm 3. 9

4. Score normalization technical details We now discuss the technical details of the score normalization method.

4. Score normalization technical details We now discuss the technical details of the score normalization method. SMT SCORING SYSTEM This document describes the scoring system for the Stanford Math Tournament We begin by giving an overview of the changes to scoring and a non-technical descrition of the scoring rules

More information

Estimation of the large covariance matrix with two-step monotone missing data

Estimation of the large covariance matrix with two-step monotone missing data Estimation of the large covariance matrix with two-ste monotone missing data Masashi Hyodo, Nobumichi Shutoh 2, Takashi Seo, and Tatjana Pavlenko 3 Deartment of Mathematical Information Science, Tokyo

More information

Understanding Covariance Estimates in Expectation Propagation

Understanding Covariance Estimates in Expectation Propagation Understanding Covariance Estimates in Expectation Propagation William Stephenson Department of EECS Massachusetts Institute of Technology Cambridge, MA 019 wtstephe@csail.mit.edu Tamara Broderick Department

More information

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis HIPAD LAB: HIGH PERFORMANCE SYSTEMS LABORATORY DEPARTMENT OF CIVIL AND ENVIRONMENTAL ENGINEERING AND EARTH SCIENCES Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis Why use metamodeling

More information

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)]

LECTURE 7 NOTES. x n. d x if. E [g(x n )] E [g(x)] LECTURE 7 NOTES 1. Convergence of random variables. Before delving into the large samle roerties of the MLE, we review some concets from large samle theory. 1. Convergence in robability: x n x if, for

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2

STA 250: Statistics. Notes 7. Bayesian Approach to Statistics. Book chapters: 7.2 STA 25: Statistics Notes 7. Bayesian Aroach to Statistics Book chaters: 7.2 1 From calibrating a rocedure to quantifying uncertainty We saw that the central idea of classical testing is to rovide a rigorous

More information

Collaborative Place Models Supplement 1

Collaborative Place Models Supplement 1 Collaborative Place Models Sulement Ber Kaicioglu Foursquare Labs ber.aicioglu@gmail.com Robert E. Schaire Princeton University schaire@cs.rinceton.edu David S. Rosenberg P Mobile Labs david.davidr@gmail.com

More information

Probability Estimates for Multi-class Classification by Pairwise Coupling

Probability Estimates for Multi-class Classification by Pairwise Coupling Probability Estimates for Multi-class Classification by Pairwise Couling Ting-Fan Wu Chih-Jen Lin Deartment of Comuter Science National Taiwan University Taiei 06, Taiwan Ruby C. Weng Deartment of Statistics

More information

Estimating function analysis for a class of Tweedie regression models

Estimating function analysis for a class of Tweedie regression models Title Estimating function analysis for a class of Tweedie regression models Author Wagner Hugo Bonat Deartamento de Estatística - DEST, Laboratório de Estatística e Geoinformação - LEG, Universidade Federal

More information

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests 009 American Control Conference Hyatt Regency Riverfront, St. Louis, MO, USA June 0-, 009 FrB4. System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests James C. Sall Abstract

More information

arxiv: v1 [physics.data-an] 26 Oct 2012

arxiv: v1 [physics.data-an] 26 Oct 2012 Constraints on Yield Parameters in Extended Maximum Likelihood Fits Till Moritz Karbach a, Maximilian Schlu b a TU Dortmund, Germany, moritz.karbach@cern.ch b TU Dortmund, Germany, maximilian.schlu@cern.ch

More information

Convex Optimization methods for Computing Channel Capacity

Convex Optimization methods for Computing Channel Capacity Convex Otimization methods for Comuting Channel Caacity Abhishek Sinha Laboratory for Information and Decision Systems (LIDS), MIT sinhaa@mit.edu May 15, 2014 We consider a classical comutational roblem

More information

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process P. Mantalos a1, K. Mattheou b, A. Karagrigoriou b a.deartment of Statistics University of Lund

More information

Notes on Instrumental Variables Methods

Notes on Instrumental Variables Methods Notes on Instrumental Variables Methods Michele Pellizzari IGIER-Bocconi, IZA and frdb 1 The Instrumental Variable Estimator Instrumental variable estimation is the classical solution to the roblem of

More information

Radial Basis Function Networks: Algorithms

Radial Basis Function Networks: Algorithms Radial Basis Function Networks: Algorithms Introduction to Neural Networks : Lecture 13 John A. Bullinaria, 2004 1. The RBF Maing 2. The RBF Network Architecture 3. Comutational Power of RBF Networks 4.

More information

Approximating min-max k-clustering

Approximating min-max k-clustering Aroximating min-max k-clustering Asaf Levin July 24, 2007 Abstract We consider the roblems of set artitioning into k clusters with minimum total cost and minimum of the maximum cost of a cluster. The cost

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell February 10, 2010 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Journal of Modern Alied Statistical Methods Volume Issue Article 7 --03 A Comarison between Biased and Unbiased Estimators in Ordinary Least Squares Regression Ghadban Khalaf King Khalid University, Saudi

More information

Information collection on a graph

Information collection on a graph Information collection on a grah Ilya O. Ryzhov Warren Powell October 25, 2009 Abstract We derive a knowledge gradient olicy for an otimal learning roblem on a grah, in which we use sequential measurements

More information

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA Proceedings of the 2011 Winter Simulation Conference S. Jain, R. R. Creasey, J. Himmelsach, K. P. White, and M. Fu, eds. EFFICIENT RARE EVENT SIMULATION FOR HEAVY-TAILED SYSTEMS VIA CROSS ENTROPY Jose

More information

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning TNN-2009-P-1186.R2 1 Uncorrelated Multilinear Princial Comonent Analysis for Unsuervised Multilinear Subsace Learning Haiing Lu, K. N. Plataniotis and A. N. Venetsanooulos The Edward S. Rogers Sr. Deartment

More information

Feedback-error control

Feedback-error control Chater 4 Feedback-error control 4.1 Introduction This chater exlains the feedback-error (FBE) control scheme originally described by Kawato [, 87, 8]. FBE is a widely used neural network based controller

More information

DIFFERENTIAL evolution (DE) [3] has become a popular

DIFFERENTIAL evolution (DE) [3] has become a popular Self-adative Differential Evolution with Neighborhood Search Zhenyu Yang, Ke Tang and Xin Yao Abstract In this aer we investigate several self-adative mechanisms to imrove our revious work on [], which

More information

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI ** Iranian Journal of Science & Technology, Transaction A, Vol 3, No A3 Printed in The Islamic Reublic of Iran, 26 Shiraz University Research Note REGRESSION ANALYSIS IN MARKOV HAIN * A Y ALAMUTI AND M R

More information

Named Entity Recognition using Maximum Entropy Model SEEM5680

Named Entity Recognition using Maximum Entropy Model SEEM5680 Named Entity Recognition using Maximum Entroy Model SEEM5680 Named Entity Recognition System Named Entity Recognition (NER): Identifying certain hrases/word sequences in a free text. Generally it involves

More information

Learning Sequence Motif Models Using Gibbs Sampling

Learning Sequence Motif Models Using Gibbs Sampling Learning Sequence Motif Models Using Gibbs Samling BMI/CS 776 www.biostat.wisc.edu/bmi776/ Sring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides excluding third-arty material are licensed under

More information

An Analysis of Reliable Classifiers through ROC Isometrics

An Analysis of Reliable Classifiers through ROC Isometrics An Analysis of Reliable Classifiers through ROC Isometrics Stijn Vanderlooy s.vanderlooy@cs.unimaas.nl Ida G. Srinkhuizen-Kuyer kuyer@cs.unimaas.nl Evgueni N. Smirnov smirnov@cs.unimaas.nl MICC-IKAT, Universiteit

More information

Lasso Regularization for Generalized Linear Models in Base SAS Using Cyclical Coordinate Descent

Lasso Regularization for Generalized Linear Models in Base SAS Using Cyclical Coordinate Descent Paer 397-015 Lasso Regularization for Generalized Linear Models in Base SAS Using Cyclical Coordinate Descent Robert Feyerharm, Beacon Health Otions ABSTRACT The cyclical coordinate descent method is a

More information

Analysis of some entrance probabilities for killed birth-death processes

Analysis of some entrance probabilities for killed birth-death processes Analysis of some entrance robabilities for killed birth-death rocesses Master s Thesis O.J.G. van der Velde Suervisor: Dr. F.M. Sieksma July 5, 207 Mathematical Institute, Leiden University Contents Introduction

More information

Adaptive Estimation of the Regression Discontinuity Model

Adaptive Estimation of the Regression Discontinuity Model Adative Estimation of the Regression Discontinuity Model Yixiao Sun Deartment of Economics Univeristy of California, San Diego La Jolla, CA 9293-58 Feburary 25 Email: yisun@ucsd.edu; Tel: 858-534-4692

More information

SAS for Bayesian Mediation Analysis

SAS for Bayesian Mediation Analysis Paer 1569-2014 SAS for Bayesian Mediation Analysis Miočević Milica, Arizona State University; David P. MacKinnon, Arizona State University ABSTRACT Recent statistical mediation analysis research focuses

More information

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO) Combining Logistic Regression with Kriging for Maing the Risk of Occurrence of Unexloded Ordnance (UXO) H. Saito (), P. Goovaerts (), S. A. McKenna (2) Environmental and Water Resources Engineering, Deartment

More information

Linear diophantine equations for discrete tomography

Linear diophantine equations for discrete tomography Journal of X-Ray Science and Technology 10 001 59 66 59 IOS Press Linear diohantine euations for discrete tomograhy Yangbo Ye a,gewang b and Jiehua Zhu a a Deartment of Mathematics, The University of Iowa,

More information

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H:

where x i is the ith coordinate of x R N. 1. Show that the following upper bound holds for the growth function of H: Mehryar Mohri Foundations of Machine Learning Courant Institute of Mathematical Sciences Homework assignment 2 October 25, 2017 Due: November 08, 2017 A. Growth function Growth function of stum functions.

More information

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III AI*IA 23 Fusion of Multile Pattern Classifiers PART III AI*IA 23 Tutorial on Fusion of Multile Pattern Classifiers by F. Roli 49 Methods for fusing multile classifiers Methods for fusing multile classifiers

More information

Boosting Black Box Variational Inference

Boosting Black Box Variational Inference Boosting Black Box Variational Inference Francesco Locatello 1,2, Gideon Dresdner 2, Rajiv Khanna 3, Isabel Valera 1, and Gunnar Rätsch 2 1 Max-Planck Institute for Intelligent Systems, Germany 2 Det.

More information

arxiv: v3 [physics.data-an] 23 May 2011

arxiv: v3 [physics.data-an] 23 May 2011 Date: October, 8 arxiv:.7v [hysics.data-an] May -values for Model Evaluation F. Beaujean, A. Caldwell, D. Kollár, K. Kröninger Max-Planck-Institut für Physik, München, Germany CERN, Geneva, Switzerland

More information

Distributed Rule-Based Inference in the Presence of Redundant Information

Distributed Rule-Based Inference in the Presence of Redundant Information istribution Statement : roved for ublic release; distribution is unlimited. istributed Rule-ased Inference in the Presence of Redundant Information June 8, 004 William J. Farrell III Lockheed Martin dvanced

More information

Covariance Matrix Estimation for Reinforcement Learning

Covariance Matrix Estimation for Reinforcement Learning Covariance Matrix Estimation for Reinforcement Learning Tomer Lancewicki Deartment of Electrical Engineering and Comuter Science University of Tennessee Knoxville, TN 37996 tlancewi@utk.edu Itamar Arel

More information

s v 0 q 0 v 1 q 1 v 2 (q 2) v 3 q 3 v 4

s v 0 q 0 v 1 q 1 v 2 (q 2) v 3 q 3 v 4 Discrete Adative Transmission for Fading Channels Lang Lin Λ, Roy D. Yates, Predrag Sasojevic WINLAB, Rutgers University 7 Brett Rd., NJ- fllin, ryates, sasojevg@winlab.rutgers.edu Abstract In this work

More information

Numerical Linear Algebra

Numerical Linear Algebra Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and

More information

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity Bayesian Satially Varying Coefficient Models in the Presence of Collinearity David C. Wheeler 1, Catherine A. Calder 1 he Ohio State University 1 Abstract he belief that relationshis between exlanatory

More information

Bayesian System for Differential Cryptanalysis of DES

Bayesian System for Differential Cryptanalysis of DES Available online at www.sciencedirect.com ScienceDirect IERI Procedia 7 (014 ) 15 0 013 International Conference on Alied Comuting, Comuter Science, and Comuter Engineering Bayesian System for Differential

More information

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules

CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules CHAPTER-II Control Charts for Fraction Nonconforming using m-of-m Runs Rules. Introduction: The is widely used in industry to monitor the number of fraction nonconforming units. A nonconforming unit is

More information

Bayesian Networks Practice

Bayesian Networks Practice Bayesian Networks Practice Part 2 2016-03-17 Byoung-Hee Kim, Seong-Ho Son Biointelligence Lab, CSE, Seoul National University Agenda Probabilistic Inference in Bayesian networks Probability basics D-searation

More information

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm

NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS. The Goldstein-Levitin-Polyak algorithm - (23) NLP - NONLINEAR OPTIMIZATION WITH CONVEX CONSTRAINTS The Goldstein-Levitin-Polya algorithm We consider an algorithm for solving the otimization roblem under convex constraints. Although the convexity

More information

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data Quality Technology & Quantitative Management Vol. 1, No.,. 51-65, 15 QTQM IAQM 15 Lower onfidence Bound for Process-Yield Index with Autocorrelated Process Data Fu-Kwun Wang * and Yeneneh Tamirat Deartment

More information

Adaptive estimation with change detection for streaming data

Adaptive estimation with change detection for streaming data Adative estimation with change detection for streaming data A thesis resented for the degree of Doctor of Philosohy of the University of London and the Diloma of Imerial College by Dean Adam Bodenham Deartment

More information

Efficient algorithms for the smallest enclosing ball problem

Efficient algorithms for the smallest enclosing ball problem Efficient algorithms for the smallest enclosing ball roblem Guanglu Zhou, Kim-Chuan Toh, Jie Sun November 27, 2002; Revised August 4, 2003 Abstract. Consider the roblem of comuting the smallest enclosing

More information

Asymptotically Optimal Simulation Allocation under Dependent Sampling

Asymptotically Optimal Simulation Allocation under Dependent Sampling Asymtotically Otimal Simulation Allocation under Deendent Samling Xiaoing Xiong The Robert H. Smith School of Business, University of Maryland, College Park, MD 20742-1815, USA, xiaoingx@yahoo.com Sandee

More information

Finite Mixture EFA in Mplus

Finite Mixture EFA in Mplus Finite Mixture EFA in Mlus November 16, 2007 In this document we describe the Mixture EFA model estimated in Mlus. Four tyes of deendent variables are ossible in this model: normally distributed, ordered

More information

Robust Solutions to Markov Decision Problems

Robust Solutions to Markov Decision Problems Robust Solutions to Markov Decision Problems Arnab Nilim and Laurent El Ghaoui Deartment of Electrical Engineering and Comuter Sciences University of California, Berkeley, CA 94720 nilim@eecs.berkeley.edu,

More information

3.4 Design Methods for Fractional Delay Allpass Filters

3.4 Design Methods for Fractional Delay Allpass Filters Chater 3. Fractional Delay Filters 15 3.4 Design Methods for Fractional Delay Allass Filters Above we have studied the design of FIR filters for fractional delay aroximation. ow we show how recursive or

More information

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deriving ndicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V. Deutsch Centre for Comutational Geostatistics Deartment of Civil &

More information

On Wald-Type Optimal Stopping for Brownian Motion

On Wald-Type Optimal Stopping for Brownian Motion J Al Probab Vol 34, No 1, 1997, (66-73) Prerint Ser No 1, 1994, Math Inst Aarhus On Wald-Tye Otimal Stoing for Brownian Motion S RAVRSN and PSKIR The solution is resented to all otimal stoing roblems of

More information

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley

Elements of Asymptotic Theory. James L. Powell Department of Economics University of California, Berkeley Elements of Asymtotic Theory James L. Powell Deartment of Economics University of California, Berkeley Objectives of Asymtotic Theory While exact results are available for, say, the distribution of the

More information

Recursive Estimation of the Preisach Density function for a Smart Actuator

Recursive Estimation of the Preisach Density function for a Smart Actuator Recursive Estimation of the Preisach Density function for a Smart Actuator Ram V. Iyer Deartment of Mathematics and Statistics, Texas Tech University, Lubbock, TX 7949-142. ABSTRACT The Preisach oerator

More information

MATH 2710: NOTES FOR ANALYSIS

MATH 2710: NOTES FOR ANALYSIS MATH 270: NOTES FOR ANALYSIS The main ideas we will learn from analysis center around the idea of a limit. Limits occurs in several settings. We will start with finite limits of sequences, then cover infinite

More information

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales

Optimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Otimal Learning Policies for the Newsvendor Problem with Censored Demand and Unobservable Lost Sales Diana Negoescu Peter Frazier Warren Powell Abstract In this aer, we consider a version of the newsvendor

More information

Supplementary Materials for Robust Estimation of the False Discovery Rate

Supplementary Materials for Robust Estimation of the False Discovery Rate Sulementary Materials for Robust Estimation of the False Discovery Rate Stan Pounds and Cheng Cheng This sulemental contains roofs regarding theoretical roerties of the roosed method (Section S1), rovides

More information

Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective

Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Perspective Estimating Posterior Ratio for Classification: Transfer Learning from Probabilistic Persective Song Liu, Kenji Fukumizu arxiv:506.02784v3 [stat.ml] 9 Oct 205 Abstract Transfer learning assumes classifiers

More information

SPARSE SIGNAL RECOVERY USING A BERNOULLI GENERALIZED GAUSSIAN PRIOR

SPARSE SIGNAL RECOVERY USING A BERNOULLI GENERALIZED GAUSSIAN PRIOR 3rd Euroean Signal Processing Conference EUSIPCO SPARSE SIGNAL RECOVERY USING A BERNOULLI GENERALIZED GAUSSIAN PRIOR Lotfi Chaari 1, Jean-Yves Tourneret 1 and Caroline Chaux 1 University of Toulouse, IRIT

More information

On parameter estimation in deformable models

On parameter estimation in deformable models Downloaded from orbitdtudk on: Dec 7, 07 On arameter estimation in deformable models Fisker, Rune; Carstensen, Jens Michael Published in: Proceedings of the 4th International Conference on Pattern Recognition

More information

The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem Submitted for journal ublication. The non-stochastic multi-armed bandit roblem Peter Auer Institute for Theoretical Comuter Science Graz University of Technology A-8010 Graz (Austria) auer@igi.tu-graz.ac.at

More information

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION

A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST BASED ON THE WEIBULL DISTRIBUTION O P E R A T I O N S R E S E A R C H A N D D E C I S I O N S No. 27 DOI:.5277/ord73 Nasrullah KHAN Muhammad ASLAM 2 Kyung-Jun KIM 3 Chi-Hyuck JUN 4 A MIXED CONTROL CHART ADAPTED TO THE TRUNCATED LIFE TEST

More information

HENSEL S LEMMA KEITH CONRAD

HENSEL S LEMMA KEITH CONRAD HENSEL S LEMMA KEITH CONRAD 1. Introduction In the -adic integers, congruences are aroximations: for a and b in Z, a b mod n is the same as a b 1/ n. Turning information modulo one ower of into similar

More information

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder

Bayesian inference & Markov chain Monte Carlo. Note 1: Many slides for this lecture were kindly provided by Paul Lewis and Mark Holder Bayesian inference & Markov chain Monte Carlo Note 1: Many slides for this lecture were kindly rovided by Paul Lewis and Mark Holder Note 2: Paul Lewis has written nice software for demonstrating Markov

More information

Published: 14 October 2013

Published: 14 October 2013 Electronic Journal of Alied Statistical Analysis EJASA, Electron. J. A. Stat. Anal. htt://siba-ese.unisalento.it/index.h/ejasa/index e-issn: 27-5948 DOI: 1.1285/i275948v6n213 Estimation of Parameters of

More information

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators

An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators An Investigation on the Numerical Ill-conditioning of Hybrid State Estimators S. K. Mallik, Student Member, IEEE, S. Chakrabarti, Senior Member, IEEE, S. N. Singh, Senior Member, IEEE Deartment of Electrical

More information

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process

Using a Computational Intelligence Hybrid Approach to Recognize the Faults of Variance Shifts for a Manufacturing Process Journal of Industrial and Intelligent Information Vol. 4, No. 2, March 26 Using a Comutational Intelligence Hybrid Aroach to Recognize the Faults of Variance hifts for a Manufacturing Process Yuehjen E.

More information

On split sample and randomized confidence intervals for binomial proportions

On split sample and randomized confidence intervals for binomial proportions On slit samle and randomized confidence intervals for binomial roortions Måns Thulin Deartment of Mathematics, Usala University arxiv:1402.6536v1 [stat.me] 26 Feb 2014 Abstract Slit samle methods have

More information

Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform

Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform Detection Algorithm of Particle Contamination in Reticle Images with Continuous Wavelet Transform Chaoquan Chen and Guoing Qiu School of Comuter Science and IT Jubilee Camus, University of Nottingham Nottingham

More information

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia

Maximum Likelihood Asymptotic Theory. Eduardo Rossi University of Pavia Maximum Likelihood Asymtotic Theory Eduardo Rossi University of Pavia Slutsky s Theorem, Cramer s Theorem Slutsky s Theorem Let {X N } be a random sequence converging in robability to a constant a, and

More information

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem An Ant Colony Otimization Aroach to the Probabilistic Traveling Salesman Problem Leonora Bianchi 1, Luca Maria Gambardella 1, and Marco Dorigo 2 1 IDSIA, Strada Cantonale Galleria 2, CH-6928 Manno, Switzerland

More information

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables

Improved Bounds on Bell Numbers and on Moments of Sums of Random Variables Imroved Bounds on Bell Numbers and on Moments of Sums of Random Variables Daniel Berend Tamir Tassa Abstract We rovide bounds for moments of sums of sequences of indeendent random variables. Concentrating

More information

Sums of independent random variables

Sums of independent random variables 3 Sums of indeendent random variables This lecture collects a number of estimates for sums of indeendent random variables with values in a Banach sace E. We concentrate on sums of the form N γ nx n, where

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Percolation Thresholds of Updated Posteriors for Tracking Causal Markov Processes in Complex Networks

Percolation Thresholds of Updated Posteriors for Tracking Causal Markov Processes in Complex Networks Percolation Thresholds of Udated Posteriors for Tracing Causal Marov Processes in Comlex Networs We resent the adative measurement selection robarxiv:95.2236v1 stat.ml 14 May 29 Patric L. Harrington Jr.

More information

Inequalities for the L 1 Deviation of the Empirical Distribution

Inequalities for the L 1 Deviation of the Empirical Distribution Inequalities for the L 1 Deviation of the Emirical Distribution Tsachy Weissman, Erik Ordentlich, Gadiel Seroussi, Sergio Verdu, Marcelo J. Weinberger June 13, 2003 Abstract We derive bounds on the robability

More information

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application

Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Application BULGARIA ACADEMY OF SCIECES CYBEREICS AD IFORMAIO ECHOLOGIES Volume 9 o 3 Sofia 009 Positive Definite Uncertain Homogeneous Matrix Polynomials: Analysis and Alication Svetoslav Savov Institute of Information

More information

State Estimation with ARMarkov Models

State Estimation with ARMarkov Models Deartment of Mechanical and Aerosace Engineering Technical Reort No. 3046, October 1998. Princeton University, Princeton, NJ. State Estimation with ARMarkov Models Ryoung K. Lim 1 Columbia University,

More information

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation

Paper C Exact Volume Balance Versus Exact Mass Balance in Compositional Reservoir Simulation Paer C Exact Volume Balance Versus Exact Mass Balance in Comositional Reservoir Simulation Submitted to Comutational Geosciences, December 2005. Exact Volume Balance Versus Exact Mass Balance in Comositional

More information

Lecture 3 Consistency of Extremum Estimators 1

Lecture 3 Consistency of Extremum Estimators 1 Lecture 3 Consistency of Extremum Estimators 1 This lecture shows how one can obtain consistency of extremum estimators. It also shows how one can find the robability limit of extremum estimators in cases

More information

Chapter 3. GMM: Selected Topics

Chapter 3. GMM: Selected Topics Chater 3. GMM: Selected oics Contents Otimal Instruments. he issue of interest..............................2 Otimal Instruments under the i:i:d: assumtion..............2. he basic result............................2.2

More information

AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM. Radu Balan, Alexander Jourjine, Justinian Rosca. Siemens Corporation Research

AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM. Radu Balan, Alexander Jourjine, Justinian Rosca. Siemens Corporation Research AR PROCESSES AND SOURCES CAN BE RECONSTRUCTED FROM DEGENERATE MIXTURES Radu Balan, Alexander Jourjine, Justinian Rosca Siemens Cororation Research 7 College Road East Princeton, NJ 8 fradu,jourjine,roscag@scr.siemens.com

More information

The Variational Gaussian Approximation Revisited

The Variational Gaussian Approximation Revisited The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much

More information

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points. Solved Problems Solved Problems P Solve the three simle classification roblems shown in Figure P by drawing a decision boundary Find weight and bias values that result in single-neuron ercetrons with the

More information

Maxisets for μ-thresholding rules

Maxisets for μ-thresholding rules Test 008 7: 33 349 DOI 0.007/s749-006-0035-5 ORIGINAL PAPER Maxisets for μ-thresholding rules Florent Autin Received: 3 January 005 / Acceted: 8 June 006 / Published online: March 007 Sociedad de Estadística

More information

Minimax Design of Nonnegative Finite Impulse Response Filters

Minimax Design of Nonnegative Finite Impulse Response Filters Minimax Design of Nonnegative Finite Imulse Resonse Filters Xiaoing Lai, Anke Xue Institute of Information and Control Hangzhou Dianzi University Hangzhou, 3118 China e-mail: laix@hdu.edu.cn; akxue@hdu.edu.cn

More information

Difference of Convex Functions Programming for Reinforcement Learning (Supplementary File)

Difference of Convex Functions Programming for Reinforcement Learning (Supplementary File) Difference of Convex Functions Programming for Reinforcement Learning Sulementary File Bilal Piot 1,, Matthieu Geist 1, Olivier Pietquin,3 1 MaLIS research grou SUPELEC - UMI 958 GeorgiaTech-CRS, France

More information

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS NCCI 1 -National Conference on Comutational Instrumentation CSIO Chandigarh, INDIA, 19- March 1 COMPARISON OF VARIOUS OPIMIZAION ECHNIQUES FOR DESIGN FIR DIGIAL FILERS Amanjeet Panghal 1, Nitin Mittal,Devender

More information

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split A Bound on the Error of Cross Validation Using the Aroximation and Estimation Rates, with Consequences for the Training-Test Slit Michael Kearns AT&T Bell Laboratories Murray Hill, NJ 7974 mkearns@research.att.com

More information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material Robustness of classifiers to uniform l and Gaussian noise Sulementary material Jean-Yves Franceschi Ecole Normale Suérieure de Lyon LIP UMR 5668 Omar Fawzi Ecole Normale Suérieure de Lyon LIP UMR 5668

More information

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material

Inference for Empirical Wasserstein Distances on Finite Spaces: Supplementary Material Inference for Emirical Wasserstein Distances on Finite Saces: Sulementary Material Max Sommerfeld Axel Munk Keywords: otimal transort, Wasserstein distance, central limit theorem, directional Hadamard

More information

Black-box α-divergence Minimization

Black-box α-divergence Minimization Black-box α-divergence Minimization José Miguel Hernández-Lobato, Yingzhen Li, Daniel Hernández-Lobato, Thang Bui, Richard Turner, Harvard University, University of Cambridge, Universidad Autónoma de Madrid.

More information

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications;

Formal Modeling in Cognitive Science Lecture 29: Noisy Channel Model and Applications; Formal Modeling in Cognitive Science Lecture 9: and ; ; Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Proerties of 3 March, 6 Frank Keller Formal Modeling in Cognitive

More information

Logistics Optimization Using Hybrid Metaheuristic Approach under Very Realistic Conditions

Logistics Optimization Using Hybrid Metaheuristic Approach under Very Realistic Conditions 17 th Euroean Symosium on Comuter Aided Process Engineering ESCAPE17 V. Plesu and P.S. Agachi (Editors) 2007 Elsevier B.V. All rights reserved. 1 Logistics Otimization Using Hybrid Metaheuristic Aroach

More information

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK Towards understanding the Lorenz curve using the Uniform distribution Chris J. Stehens Newcastle City Council, Newcastle uon Tyne, UK (For the Gini-Lorenz Conference, University of Siena, Italy, May 2005)

More information