Conjugate Predictive Distributions and Generalized Entropies

Size: px
Start display at page:

Download "Conjugate Predictive Distributions and Generalized Entropies"

Transcription

1 Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy March, 2013

2 Menu 1 Antipasto/Appetizer Posterior unbiasedness 2 Piatto Principale/Main Course Conjugate predictive distributions as generalized exponential families 3 Dolce/Dessert Concluding remarks

3 Posterior unbiasedness Unbiasedness Why the interest in unbiasedness?

4 Posterior unbiasedness Unbiasedness Why the interest in unbiasedness? There is an interesting duality between the classical notion of unbiasedness and minimization of posterior expected loss [Noorbaloochi & Meeden, 1983]

5 Posterior unbiasedness Unbiasedness Why the interest in unbiasedness? There is an interesting duality between the classical notion of unbiasedness and minimization of posterior expected loss [Noorbaloochi & Meeden, 1983] Some Bayes estimators ˆλ = ˆλ(x (n) ) turn out to be unbiased for certain transformations λ = λ(θ) in the sense that E(ˆλ λ) = λ This definition of unbiasedness is related to the squared error loss L 2 λ(ˆλ, λ) (ˆλ λ) 2, as is the Bayes estimator ˆλ = E(λ x (n) ) S = E(λ ˆλ)

6 Posterior unbiasedness L-unbiasedness For a general loss function L(ˆλ, λ), we shall say that the estimator ˆλ(x (n) ) is L-unbiased for the parameter λ if for all λ, λ Λ [Lehmann, 1951] E(L(ˆλ(X (n) ), λ) λ) E(L(ˆλ(X (n) ), λ ) λ) Also, for a loss function L and a prior p, we shall say that the estimator ˆλ is (L, p)-bayes if it minimizes the corresponding posterior expected loss L(ˆλ, λ) p(λ x (n) ) dλ

7 Posterior unbiasedness L-unbiasedness Using this general definition of unbiasedness, Hartigan (1965) showed that the prior which minimizes the asymptotic bias of the Bayes estimator relative to the loss function L(ˆλ, λ) is given by [ 2 L(λ, ω) π(λ) = i λ (λ) ω 2 where i λ (λ) denotes the Fisher information ] 1/2 ω=λ For the squared error loss, L 2 λ(ˆλ, λ), this becomes π(λ) i λ (λ)

8 Posterior unbiasedness L-unbiasedness For exponential family models and squared error loss with respect to the mean parameter [ i.e. L 2 µ(ˆµ, µ) = (ˆµ µ) 2 ], the unbiasedness of π µ(µ) i µ(µ) is exact [Hartigan, 1965] This prior can be written as π µ(µ) V (µ) 1, while in terms of the canonical parameter it becomes π θ (θ) 1 Hence, the (L 2 µ, π µ)-bayes estimator ˆµ = E(µ x (n) ) is L 2 µ-unbiased and the corresponding unbiased prior is π µ Under certain conditions, this also holds (in the usual L 2 λ-sense) for other parametrizations λ = λ(θ) [GP & Mendoza, 1999]

9 Posterior unbiasedness Relative entropy loss For an arbitrary parametrization λ, let L K (ˆλ, λ) = D KL (p(x λ) p(x ˆλ)) denote the relative entropy (Kullback-Leibler) loss, and let denote the corresponding dual loss L K (ˆλ, λ) = D KL (p(x ˆλ) p(x λ)) Unlike the squared error loss L 2 λ, both of these loss functions are invariant under reparametrizations of the model and so we may drop the subscript λ from the notation With this notation, ˆµ = E(µ x (n) ) is not only (L 2 µ, π µ)-bayes but also (L K, π µ)-bayes

10 Posterior unbiasedness Relative entropy loss Therefore, for the mean parameter of an exponential family, the (L K, π µ)-bayes estimator is L 2 µ-unbiased A partial dual result holds for the canonical parameter θ In this case, the (L K, π θ )-Bayes estimator is ˆθ = E(θ x (n) ), which is also the (L 2 θ, π θ )-Bayes estimator of θ Suppose ˆθ is unbiased in the usual sense. Then, for the canonical parameter of an exponential family, the (L K, π θ )-Bayes estimator is L 2 θ-unbiased The (L K, π θ )-Bayes estimator of θ is also its posterior mode which, since π θ (θ) 1, coincides with the maximum likelihood estimator. In other words, π θ (θ) 1 is a maximum likelihood prior as defined by Hartigan (1998)

11 Posterior unbiasedness Relative entropy loss Unlike (L 2, p)-bayes estimators, both (L K, p)- and (L K, p)-bayes estimators are invariant under reparametrizations of the model Wu & Vos (2012) have shown that, for exponential families, the maximum likelihood estimator is L K -unbiased (regardless of the parametrization) We can rephrase this result to say that the (L K, π)-bayes estimator is L K -unbiased, so it is unbiased in a dual sense. Here, π is the prior induced by the uniform prior on the canonical parameter θ

12 Posterior unbiasedness Relative entropy loss It follows that the L K -unbiased estimator for an arbitrary transformation λ = λ(µ) of the mean parameter always exists and can be obtained by correspondingly transforming the usual L 2 µ-unbiased estimator ˆµ = E(µ x (n) ) i.e. ˆλ = λ(ˆµ) which in this case coincides with the maximum likelihood estimator [GP & Mendoza, 2013] Such ˆλ is not necessarily unbiased in the usual L 2 λ-sense

13 Posterior unbiasedness Relative entropy loss Again, a partial dual result can be obtained for the canonical parameter Using results from Wu & Vos (2012), it can be shown that an estimator is L K -unbiased for θ if it is L 2 θ-unbiased. Recall that the (L K, π)-bayes estimator is ˆθ = E(θ x (n) ). If ˆθ happens to be unbiased in the usual sense, then the (L K, π)-bayes estimator is L K -unbiased In this case, the L K -unbiased estimator for an arbitrary transformation λ = λ(θ) of the canonical parameter can be obtained by correspondingly transforming the usual L 2 θ-unbiased estimator ˆθ, i.e. ˆλ = λ(ˆθ) As before, such ˆλ is not necessarily unbiased in the usual L 2 λ-sense

14 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Natural exponential family p θ (x θ) = b(x) exp{θx M(θ)} (θ Ξ) with M(θ) = log b(x) exp{θx} η(dx) Canonical parameter space Ξ = {θ IR : M(θ) < } Mean-value parameter µ = µ(θ) = E[ X θ ] = dm(θ)/dθ Mean-value parameter space: Ω = µ(ξ) Variance function V (µ) = Var[ X θ(µ) ] = d 2 M(θ(µ))/dθ 2

15 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Let X 1,..., X n p θ (x θ). Then p θ (s θ, n) = b( x, n) exp{n [ θ x M(θ) ]} Likelihood L θ (θ x, n) exp{n [ θ x M(θ) ]} Fisher information i θ (θ) = d 2 M(θ)/dθ 2 In terms of the mean-value parameter i µ(µ) = V (µ) 1

16 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Natural conjugate family π θ (θ x 0, n 0 ) L θ (θ x 0, n 0 ) (s 0 IR, n 0 IR) Normalized form π θ (θ x 0, n 0 ) = h(x 0, n 0 ) exp{n 0 [ x 0 θ n 0 M(θ) ]} with h(x 0, n 0 ) 1 = exp{n 0 [x 0 θ n 0 M(θ) ]} dθ (DY-conjugate prior)

17 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Prior p(x x 0, n 0 ) = b(x) h(x 0, n 0 ) h(x + x 0, n 0 + 1) Posterior p(x x 1, n 1 ) = b(x) h(x 1, n 1 ) h(x + x 1, n 1 + 1) where x 1 = (n 0 x 0 + n x)/(n 0 + n) and n 1 = n 0 + n In general, this is not an exponential family

18 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Prior p(x x 0, n 0 ) = b(x) h(x 0, n 0 ) h(x + x 0, n 0 + 1) Posterior p(x x 1, n 1 ) = b(x) h(x 1, n 1 ) h(x + x 1, n 1 + 1) where x 1 = (n 0 x 0 + n x)/(n 0 + n) and n 1 = n 0 + n In general, this is not an exponential family As n, x 1 µ and p(x x 1, n 1 ) p µ(x µ) where p µ(x µ) = b(x) exp{θ(µ)x M(θ(µ))}

19 Conjugate predictive distributions and generalized entropies Generalized entropies The maximum entropy principle (MEP) is called up when one wishes to select a single distribution P (for a random variable X) as a representative of a class Γ of probability distributions Typically Γ = Γ d {P : E P (S) = ς} where ς IR d and S = s(x) is a statistic taking values on IR d The Shannon entropy of the distribution P is defined by { } p(x) H(P) = p(x) log η(dx) = D KL (p p 0 ) p 0 (x) and describes the uncertainty inherent in p( ) relative to p 0 ( )

20 Conjugate predictive distributions and generalized entropies Generalized entropies If there exists a distribution P maximizing H(P) over the class Γ d, then its density p satisfies p (x) = p 0 (x) exp{λ T s(x)}, p 0 (x) exp{λ T s(x)} η(dx) where λ = (λ 1,..., λ d ) is a vector of Lagrange multipliers and is determined by the moment constraints E P (S) = ς which define the class Γ d Despite its intuitive appeal, the MEP is controversial. However, Topsøe (1979) provides an interesting decision theoretical justification for the MEP

21 Conjugate predictive distributions and generalized entropies Generalized entropies Basically, there exists a duality between maximization of entropy and minimization of worst-case expected loss in a suitably defined statistical decision problem Grünwald and Dawid (2004) extend the work of Topsøe by considering a generalized concept of entropy related to the choice of the loss function They also discuss generalized divergences and the corresponding generalized exponential families of distributions

22 Conjugate predictive distributions and generalized entropies Generalized entropies Consider the following parametrized family of divergences for α > 1 { [ ] 1/α D α(p p 0 ) = α2 p0 (x) 1 p(x) η(dx)} (α 1) p(x) which includes the Kullback-Leibler divergence as the limiting case α This is a particular case of the general class of f -divergences defined by ( ) p(x) D (f ) (p p 0 ) = f p 0 (x) η(dx), p 0 (x) where f : [0, ) (, ] is a convex function, continuous at 0 and such that f (1) = 0

23 Conjugate predictive distributions and generalized entropies Generalized entropies Define the corresponding f -entropy by H (f ) (P) = D (f ) (p p 0 ) Then the generalized maximum entropy distribution P over the class Γ, if it exists, has a density of the form p (x) = p 0 (x) g(λ 0 + λ T s(x)) where g( ) = ḟ 1 ( ) and ḟ denotes the first derivative of f

24 Conjugate predictive distributions and generalized entropies Generalized entropies The α-divergence D α(p p 0 ) is obtained when f (x) = f α(x) α (α 1) {(1 x) αx(x 1/α 1)} This is derived from the well-known limit lim α(x 1/α 1) = log(x) α Here we shall focus on the corresponding generalized entropy for which g(y) = 1/(1 y/α) α H α(p) = D α(p p 0 )

25 Conjugate predictive distributions and generalized entropies Generalized exponential families The Student s t distribution, with density function St(x µ, σ 2, γ) = [ Γ((γ + 1)/2) σ ] (x µ) 2 (γ+1)/2 (x IR) γπ Γ(γ/2) γ σ 2 maximizes H α(p) over the class Γ 2 provided that α > 3/2 [Here γ = 2α 1] On the other hand, the Student s t distribution can be expressed as St(x µ, σ 2, γ) = 0 N(x µ, σ 2 /y) Ga(y γ/2, γ/2) dy As γ, H α(p) H(P) and St(x µ, σ 2, γ) N(x µ, σ 2 ) The Student s t distribution can be regarded as a generalized exponential family in this sense

26 Conjugate predictive distributions and generalized entropies Generalized exponential families The Exponential-Gamma distribution, with density function EG(x γ, γµ) = [ ( µ 1 + x )] (γ+1) (x IR +) µ maximizes H α(p) over the class Γ 1 provided that α > 2 [Here γ = α 1] On the other hand, the Exponential-Gamma distribution can be expressed as EG(x γ, γµ) = NE(x µ/y) Ga(y γ, γ) dy 0 Here NE(x µ) denotes the density of a Negative Exponential distribution with mean µ As γ, H α(p) H(P) and EG(x γ, γµ) NE(x µ) The Exponential-Gamma distribution can also be regarded as a generalized exponential family in this sense

27 Conjugate predictive distributions and generalized entropies Generalized exponential families Results analogous to those discussed above are likely to hold for predictive distributions derived from other exponential-conjugate pairs, such an the Poisson-Gamma and Binomial-Beta distributions However, the analysis in these discrete cases in not as simple since one has to deal with (power) series instead of integrals On the other hand, it is well known that the Poisson distribution is not the maximum entropy distribution over the class of distributions on the non-negative integers (the geometric distribution claiming this property)

28 Conjugate predictive distributions and generalized entropies Generalized exponential families Nonetheless, the Poisson distribution is the maximum entropy distribution over the class of ultra log-concave distributions on the non-negative integers [Johnson, 2007] So, we can expect the Poisson-Gamma distribution to maximize the generalized entropy H α(p) over the same class A similar result could well hold in the Binomial case [Harremoës, 2001]

29 Conjugate predictive distributions and generalized entropies Generalized exponential families The predictive distributions discussed here are related to the q-exponential families introduced by Naudts (2009, 2011) These families are based on the so-called deformed exponential function and its inverse, the deformed logarithmic function, which can be used in the definition of the generalized entropy H α(p) Such generalized exponential families share some of the nice properties of standard exponential families, but are not as tractable These deformed logarithmic and exponential functions can be further generalized, and one may be able to define the f -divergence H (f ) (P) in terms of them [Naudts, 2004]

30 Discussion Concluding Remarks 1 There exists an interesting duality between the classical notion of unbiasedness and minimization of posterior expected quadratic loss A similar result holds for the relative entropy loss if one uses a more general definition of unbiasedness

31 Discussion Concluding Remarks 1 There exists an interesting duality between the classical notion of unbiasedness and minimization of posterior expected quadratic loss A similar result holds for the relative entropy loss if one uses a more general definition of unbiasedness Using the relative entropy as loss function one can achieve invariant L-unbiased Bayes estimators In exponential family settings, such estimator can easily be computed for any (informative) conjugate prior, not only for unbiased priors

32 Discussion Concluding Remarks 2 There is another interesting duality between maximization of entropy and minimization of worst-case expected loss Conjugate predictive distribution derived from exponential family sampling models can be regarded as generalized exponential families in the sense that they maximize a generalized entropy

33 Discussion Concluding Remarks 2 There is another interesting duality between maximization of entropy and minimization of worst-case expected loss Conjugate predictive distribution derived from exponential family sampling models can be regarded as generalized exponential families in the sense that they maximize a generalized entropy They can be seen as flexible/robust versions of the corresponding sampling model and used for data analysis in their own right They inherit some of the tractability of the original sampling model

34 Discussion Concluding Remarks 2 There is another interesting duality between maximization of entropy and minimization of worst-case expected loss Conjugate predictive distribution derived from exponential family sampling models can be regarded as generalized exponential families in the sense that they maximize a generalized entropy They can be seen as flexible/robust versions of the corresponding sampling model and used for data analysis in their own right They inherit some of the tractability of the original sampling model When used as sampling models for data analysis, their mixture representation leads to a simple analysis via Gibbs sampling

35 References Grünwald & Dawid (2004). Game theory, maximum extropy, minumum discrepancy and robust Bayesian decision theory. Annals of Statistics 32, GP & Mendoza (1999). A note on Bayes estimates for exponential families. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales (España) 93, GP & Mendoza, M. (2013). Proper and non-informative conjugate priors for exponential family models. In Bayesian Theory and Applications (P. Damien, P. Dellaportas, N.G. Polson & D.A. Stephens, eds.) Oxford: University Press, chap. 19. Harremoës, P. (2001). Binomial and Poisson distributions as maximum entropy distributions. IEEE Trans. Information Theory 47, Hartigan (1965). The asymptotically unbiased prior distribution. Annals of Mathematical Statistics 36, Hartigan (1998). The maximum likelihood prior. Annals of Statistics 26, Johnson, O. (2007). Log-concavity and the maximum entropy property of the Poisson distribution. Stochastic Process. Appl. 117, Lehmann (1951). A general concept of unbiasedness. The Annals of Mathematical Statistics 32, Naudts, J. (2004). Estimators, escort probabilities, and φ-exponential families in statistical physics. Journal of Inequalities in Pure and Applied Mathematics 5, 102. Naudts, J. (2009). The q-exponential family in statistical physics. Central European Journal of Physics 7, Naudts, J. (2011). Generalised Thermostatistics. Berlin: Springer-Verlag. Noorbaloochi & Meeden (1983). Unbiasedness as the dual of being Bayes. Journal of the American Statistical Association 78, Topsøe, F. (1979). Information theoretical optimization techniques. Kybernetika 15, Wu & Vos (2012). Decomposition of Kullback-Leibler risk and unbiasedness for parameter free estimators. Journal of Statistical Planning and Inference 142,

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Robustness and duality of maximum entropy and exponential family distributions

Robustness and duality of maximum entropy and exponential family distributions Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

Information geometry of Bayesian statistics

Information geometry of Bayesian statistics Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

ICES REPORT Model Misspecification and Plausibility

ICES REPORT Model Misspecification and Plausibility ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin

More information

Probability and Statistics

Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Chapter 3: Parametric families of univariate distributions CHAPTER 3: PARAMETRIC

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory

Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007

More information

13: Variational inference II

13: Variational inference II 10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational

More information

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.

The Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition. Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface

More information

Overall Objective Priors

Overall Objective Priors Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

ST5215: Advanced Statistical Theory

ST5215: Advanced Statistical Theory Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)

LECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b) LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016

Modern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016 Auxiliary material: Exponential Family of Distributions Timo Koski Second Quarter 2016 Exponential Families The family of distributions with densities (w.r.t. to a σ-finite measure µ) on X defined by R(θ)

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Exam C Solutions Spring 2005

Exam C Solutions Spring 2005 Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8

More information

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk

Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Parameter Estimation

Parameter Estimation Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods

More information

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.

The binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution. The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let

More information

A Very Brief Summary of Bayesian Inference, and Examples

A Very Brief Summary of Bayesian Inference, and Examples A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

One-parameter models

One-parameter models One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked

More information

Statistical Inference

Statistical Inference Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.

Stat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet. Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.

More information

Suggested solutions to written exam Jan 17, 2012

Suggested solutions to written exam Jan 17, 2012 LINKÖPINGS UNIVERSITET Institutionen för datavetenskap Statistik, ANd 73A36 THEORY OF STATISTICS, 6 CDTS Master s program in Statistics and Data Mining Fall semester Written exam Suggested solutions to

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Bayesian analysis in finite-population models

Bayesian analysis in finite-population models Bayesian analysis in finite-population models Ryan Martin www.math.uic.edu/~rgmartin February 4, 2014 1 Introduction Bayesian analysis is an increasingly important tool in all areas and applications of

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Probabilistic Graphical Models for Image Analysis - Lecture 4

Probabilistic Graphical Models for Image Analysis - Lecture 4 Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1 Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions

More information

Instructor: Dr. Volkan Cevher. 1. Background

Instructor: Dr. Volkan Cevher. 1. Background Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Remarks on Improper Ignorance Priors

Remarks on Improper Ignorance Priors As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive

More information

Predictive Hypothesis Identification

Predictive Hypothesis Identification Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive

More information

WE start with a general discussion. Suppose we have

WE start with a general discussion. Suppose we have 646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be

More information

The Expectation Maximization or EM algorithm

The Expectation Maximization or EM algorithm The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Entropy measures of physics via complexity

Entropy measures of physics via complexity Entropy measures of physics via complexity Giorgio Kaniadakis and Flemming Topsøe Politecnico of Torino, Department of Physics and University of Copenhagen, Department of Mathematics 1 Introduction, Background

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013 School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two

More information

Statistical Theory MT 2007 Problems 4: Solution sketches

Statistical Theory MT 2007 Problems 4: Solution sketches Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the

More information

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain

A BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist

More information

Stat 260/CS Learning in Sequential Decision Problems.

Stat 260/CS Learning in Sequential Decision Problems. Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential

More information

CS 361: Probability & Statistics

CS 361: Probability & Statistics March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the

More information

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided

Let us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking

Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking FUSION 2012, Singapore 118) Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking Karl Granström*, Umut Orguner** *Division of Automatic Control Department of Electrical

More information

Beta statistics. Keywords. Bayes theorem. Bayes rule

Beta statistics. Keywords. Bayes theorem. Bayes rule Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate

More information

Introduction to Bayesian Statistics

Introduction to Bayesian Statistics Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California

More information

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm

Midterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

14 : Theory of Variational Inference: Inner and Outer Approximation

14 : Theory of Variational Inference: Inner and Outer Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction

More information

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk

Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract

More information

Bayesian Inference. Chapter 4: Regression and Hierarchical Models

Bayesian Inference. Chapter 4: Regression and Hierarchical Models Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative

More information

Experimental Design to Maximize Information

Experimental Design to Maximize Information Experimental Design to Maximize Information P. Sebastiani and H.P. Wynn Department of Mathematics and Statistics University of Massachusetts at Amherst, 01003 MA Department of Statistics, University of

More information

arxiv: v3 [stat.me] 11 Feb 2018

arxiv: v3 [stat.me] 11 Feb 2018 arxiv:1708.02742v3 [stat.me] 11 Feb 2018 Minimum message length inference of the Poisson and geometric models using heavy-tailed prior distributions Chi Kuen Wong, Enes Makalic, Daniel F. Schmidt February

More information

arxiv: v1 [cs.lg] 1 May 2010

arxiv: v1 [cs.lg] 1 May 2010 A Geometric View of Conjugate Priors Arvind Agarwal and Hal Daumé III School of Computing, University of Utah, Salt Lake City, Utah, 84112 USA {arvind,hal}@cs.utah.edu arxiv:1005.0047v1 [cs.lg] 1 May 2010

More information

Testing Statistical Hypotheses

Testing Statistical Hypotheses E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........

More information

Graduate Econometrics I: Maximum Likelihood I

Graduate Econometrics I: Maximum Likelihood I Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference

Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture

More information

ORIGINS OF STOCHASTIC PROGRAMMING

ORIGINS OF STOCHASTIC PROGRAMMING ORIGINS OF STOCHASTIC PROGRAMMING Early 1950 s: in applications of Linear Programming unknown values of coefficients: demands, technological coefficients, yields, etc. QUOTATION Dantzig, Interfaces 20,1990

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

A View on Extension of Utility-Based on Links with Information Measures

A View on Extension of Utility-Based on Links with Information Measures Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,

More information

ECE 275A Homework 7 Solutions

ECE 275A Homework 7 Solutions ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator

More information