Conjugate Predictive Distributions and Generalized Entropies
|
|
- Bruno Daniel
- 6 years ago
- Views:
Transcription
1 Conjugate Predictive Distributions and Generalized Entropies Eduardo Gutiérrez-Peña Department of Probability and Statistics IIMAS-UNAM, Mexico Padova, Italy March, 2013
2 Menu 1 Antipasto/Appetizer Posterior unbiasedness 2 Piatto Principale/Main Course Conjugate predictive distributions as generalized exponential families 3 Dolce/Dessert Concluding remarks
3 Posterior unbiasedness Unbiasedness Why the interest in unbiasedness?
4 Posterior unbiasedness Unbiasedness Why the interest in unbiasedness? There is an interesting duality between the classical notion of unbiasedness and minimization of posterior expected loss [Noorbaloochi & Meeden, 1983]
5 Posterior unbiasedness Unbiasedness Why the interest in unbiasedness? There is an interesting duality between the classical notion of unbiasedness and minimization of posterior expected loss [Noorbaloochi & Meeden, 1983] Some Bayes estimators ˆλ = ˆλ(x (n) ) turn out to be unbiased for certain transformations λ = λ(θ) in the sense that E(ˆλ λ) = λ This definition of unbiasedness is related to the squared error loss L 2 λ(ˆλ, λ) (ˆλ λ) 2, as is the Bayes estimator ˆλ = E(λ x (n) ) S = E(λ ˆλ)
6 Posterior unbiasedness L-unbiasedness For a general loss function L(ˆλ, λ), we shall say that the estimator ˆλ(x (n) ) is L-unbiased for the parameter λ if for all λ, λ Λ [Lehmann, 1951] E(L(ˆλ(X (n) ), λ) λ) E(L(ˆλ(X (n) ), λ ) λ) Also, for a loss function L and a prior p, we shall say that the estimator ˆλ is (L, p)-bayes if it minimizes the corresponding posterior expected loss L(ˆλ, λ) p(λ x (n) ) dλ
7 Posterior unbiasedness L-unbiasedness Using this general definition of unbiasedness, Hartigan (1965) showed that the prior which minimizes the asymptotic bias of the Bayes estimator relative to the loss function L(ˆλ, λ) is given by [ 2 L(λ, ω) π(λ) = i λ (λ) ω 2 where i λ (λ) denotes the Fisher information ] 1/2 ω=λ For the squared error loss, L 2 λ(ˆλ, λ), this becomes π(λ) i λ (λ)
8 Posterior unbiasedness L-unbiasedness For exponential family models and squared error loss with respect to the mean parameter [ i.e. L 2 µ(ˆµ, µ) = (ˆµ µ) 2 ], the unbiasedness of π µ(µ) i µ(µ) is exact [Hartigan, 1965] This prior can be written as π µ(µ) V (µ) 1, while in terms of the canonical parameter it becomes π θ (θ) 1 Hence, the (L 2 µ, π µ)-bayes estimator ˆµ = E(µ x (n) ) is L 2 µ-unbiased and the corresponding unbiased prior is π µ Under certain conditions, this also holds (in the usual L 2 λ-sense) for other parametrizations λ = λ(θ) [GP & Mendoza, 1999]
9 Posterior unbiasedness Relative entropy loss For an arbitrary parametrization λ, let L K (ˆλ, λ) = D KL (p(x λ) p(x ˆλ)) denote the relative entropy (Kullback-Leibler) loss, and let denote the corresponding dual loss L K (ˆλ, λ) = D KL (p(x ˆλ) p(x λ)) Unlike the squared error loss L 2 λ, both of these loss functions are invariant under reparametrizations of the model and so we may drop the subscript λ from the notation With this notation, ˆµ = E(µ x (n) ) is not only (L 2 µ, π µ)-bayes but also (L K, π µ)-bayes
10 Posterior unbiasedness Relative entropy loss Therefore, for the mean parameter of an exponential family, the (L K, π µ)-bayes estimator is L 2 µ-unbiased A partial dual result holds for the canonical parameter θ In this case, the (L K, π θ )-Bayes estimator is ˆθ = E(θ x (n) ), which is also the (L 2 θ, π θ )-Bayes estimator of θ Suppose ˆθ is unbiased in the usual sense. Then, for the canonical parameter of an exponential family, the (L K, π θ )-Bayes estimator is L 2 θ-unbiased The (L K, π θ )-Bayes estimator of θ is also its posterior mode which, since π θ (θ) 1, coincides with the maximum likelihood estimator. In other words, π θ (θ) 1 is a maximum likelihood prior as defined by Hartigan (1998)
11 Posterior unbiasedness Relative entropy loss Unlike (L 2, p)-bayes estimators, both (L K, p)- and (L K, p)-bayes estimators are invariant under reparametrizations of the model Wu & Vos (2012) have shown that, for exponential families, the maximum likelihood estimator is L K -unbiased (regardless of the parametrization) We can rephrase this result to say that the (L K, π)-bayes estimator is L K -unbiased, so it is unbiased in a dual sense. Here, π is the prior induced by the uniform prior on the canonical parameter θ
12 Posterior unbiasedness Relative entropy loss It follows that the L K -unbiased estimator for an arbitrary transformation λ = λ(µ) of the mean parameter always exists and can be obtained by correspondingly transforming the usual L 2 µ-unbiased estimator ˆµ = E(µ x (n) ) i.e. ˆλ = λ(ˆµ) which in this case coincides with the maximum likelihood estimator [GP & Mendoza, 2013] Such ˆλ is not necessarily unbiased in the usual L 2 λ-sense
13 Posterior unbiasedness Relative entropy loss Again, a partial dual result can be obtained for the canonical parameter Using results from Wu & Vos (2012), it can be shown that an estimator is L K -unbiased for θ if it is L 2 θ-unbiased. Recall that the (L K, π)-bayes estimator is ˆθ = E(θ x (n) ). If ˆθ happens to be unbiased in the usual sense, then the (L K, π)-bayes estimator is L K -unbiased In this case, the L K -unbiased estimator for an arbitrary transformation λ = λ(θ) of the canonical parameter can be obtained by correspondingly transforming the usual L 2 θ-unbiased estimator ˆθ, i.e. ˆλ = λ(ˆθ) As before, such ˆλ is not necessarily unbiased in the usual L 2 λ-sense
14 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Natural exponential family p θ (x θ) = b(x) exp{θx M(θ)} (θ Ξ) with M(θ) = log b(x) exp{θx} η(dx) Canonical parameter space Ξ = {θ IR : M(θ) < } Mean-value parameter µ = µ(θ) = E[ X θ ] = dm(θ)/dθ Mean-value parameter space: Ω = µ(ξ) Variance function V (µ) = Var[ X θ(µ) ] = d 2 M(θ(µ))/dθ 2
15 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Let X 1,..., X n p θ (x θ). Then p θ (s θ, n) = b( x, n) exp{n [ θ x M(θ) ]} Likelihood L θ (θ x, n) exp{n [ θ x M(θ) ]} Fisher information i θ (θ) = d 2 M(θ)/dθ 2 In terms of the mean-value parameter i µ(µ) = V (µ) 1
16 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Natural conjugate family π θ (θ x 0, n 0 ) L θ (θ x 0, n 0 ) (s 0 IR, n 0 IR) Normalized form π θ (θ x 0, n 0 ) = h(x 0, n 0 ) exp{n 0 [ x 0 θ n 0 M(θ) ]} with h(x 0, n 0 ) 1 = exp{n 0 [x 0 θ n 0 M(θ) ]} dθ (DY-conjugate prior)
17 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Prior p(x x 0, n 0 ) = b(x) h(x 0, n 0 ) h(x + x 0, n 0 + 1) Posterior p(x x 1, n 1 ) = b(x) h(x 1, n 1 ) h(x + x 1, n 1 + 1) where x 1 = (n 0 x 0 + n x)/(n 0 + n) and n 1 = n 0 + n In general, this is not an exponential family
18 Conjugate predictive distributions and generalized entropies Conjugate predictive distributions Prior p(x x 0, n 0 ) = b(x) h(x 0, n 0 ) h(x + x 0, n 0 + 1) Posterior p(x x 1, n 1 ) = b(x) h(x 1, n 1 ) h(x + x 1, n 1 + 1) where x 1 = (n 0 x 0 + n x)/(n 0 + n) and n 1 = n 0 + n In general, this is not an exponential family As n, x 1 µ and p(x x 1, n 1 ) p µ(x µ) where p µ(x µ) = b(x) exp{θ(µ)x M(θ(µ))}
19 Conjugate predictive distributions and generalized entropies Generalized entropies The maximum entropy principle (MEP) is called up when one wishes to select a single distribution P (for a random variable X) as a representative of a class Γ of probability distributions Typically Γ = Γ d {P : E P (S) = ς} where ς IR d and S = s(x) is a statistic taking values on IR d The Shannon entropy of the distribution P is defined by { } p(x) H(P) = p(x) log η(dx) = D KL (p p 0 ) p 0 (x) and describes the uncertainty inherent in p( ) relative to p 0 ( )
20 Conjugate predictive distributions and generalized entropies Generalized entropies If there exists a distribution P maximizing H(P) over the class Γ d, then its density p satisfies p (x) = p 0 (x) exp{λ T s(x)}, p 0 (x) exp{λ T s(x)} η(dx) where λ = (λ 1,..., λ d ) is a vector of Lagrange multipliers and is determined by the moment constraints E P (S) = ς which define the class Γ d Despite its intuitive appeal, the MEP is controversial. However, Topsøe (1979) provides an interesting decision theoretical justification for the MEP
21 Conjugate predictive distributions and generalized entropies Generalized entropies Basically, there exists a duality between maximization of entropy and minimization of worst-case expected loss in a suitably defined statistical decision problem Grünwald and Dawid (2004) extend the work of Topsøe by considering a generalized concept of entropy related to the choice of the loss function They also discuss generalized divergences and the corresponding generalized exponential families of distributions
22 Conjugate predictive distributions and generalized entropies Generalized entropies Consider the following parametrized family of divergences for α > 1 { [ ] 1/α D α(p p 0 ) = α2 p0 (x) 1 p(x) η(dx)} (α 1) p(x) which includes the Kullback-Leibler divergence as the limiting case α This is a particular case of the general class of f -divergences defined by ( ) p(x) D (f ) (p p 0 ) = f p 0 (x) η(dx), p 0 (x) where f : [0, ) (, ] is a convex function, continuous at 0 and such that f (1) = 0
23 Conjugate predictive distributions and generalized entropies Generalized entropies Define the corresponding f -entropy by H (f ) (P) = D (f ) (p p 0 ) Then the generalized maximum entropy distribution P over the class Γ, if it exists, has a density of the form p (x) = p 0 (x) g(λ 0 + λ T s(x)) where g( ) = ḟ 1 ( ) and ḟ denotes the first derivative of f
24 Conjugate predictive distributions and generalized entropies Generalized entropies The α-divergence D α(p p 0 ) is obtained when f (x) = f α(x) α (α 1) {(1 x) αx(x 1/α 1)} This is derived from the well-known limit lim α(x 1/α 1) = log(x) α Here we shall focus on the corresponding generalized entropy for which g(y) = 1/(1 y/α) α H α(p) = D α(p p 0 )
25 Conjugate predictive distributions and generalized entropies Generalized exponential families The Student s t distribution, with density function St(x µ, σ 2, γ) = [ Γ((γ + 1)/2) σ ] (x µ) 2 (γ+1)/2 (x IR) γπ Γ(γ/2) γ σ 2 maximizes H α(p) over the class Γ 2 provided that α > 3/2 [Here γ = 2α 1] On the other hand, the Student s t distribution can be expressed as St(x µ, σ 2, γ) = 0 N(x µ, σ 2 /y) Ga(y γ/2, γ/2) dy As γ, H α(p) H(P) and St(x µ, σ 2, γ) N(x µ, σ 2 ) The Student s t distribution can be regarded as a generalized exponential family in this sense
26 Conjugate predictive distributions and generalized entropies Generalized exponential families The Exponential-Gamma distribution, with density function EG(x γ, γµ) = [ ( µ 1 + x )] (γ+1) (x IR +) µ maximizes H α(p) over the class Γ 1 provided that α > 2 [Here γ = α 1] On the other hand, the Exponential-Gamma distribution can be expressed as EG(x γ, γµ) = NE(x µ/y) Ga(y γ, γ) dy 0 Here NE(x µ) denotes the density of a Negative Exponential distribution with mean µ As γ, H α(p) H(P) and EG(x γ, γµ) NE(x µ) The Exponential-Gamma distribution can also be regarded as a generalized exponential family in this sense
27 Conjugate predictive distributions and generalized entropies Generalized exponential families Results analogous to those discussed above are likely to hold for predictive distributions derived from other exponential-conjugate pairs, such an the Poisson-Gamma and Binomial-Beta distributions However, the analysis in these discrete cases in not as simple since one has to deal with (power) series instead of integrals On the other hand, it is well known that the Poisson distribution is not the maximum entropy distribution over the class of distributions on the non-negative integers (the geometric distribution claiming this property)
28 Conjugate predictive distributions and generalized entropies Generalized exponential families Nonetheless, the Poisson distribution is the maximum entropy distribution over the class of ultra log-concave distributions on the non-negative integers [Johnson, 2007] So, we can expect the Poisson-Gamma distribution to maximize the generalized entropy H α(p) over the same class A similar result could well hold in the Binomial case [Harremoës, 2001]
29 Conjugate predictive distributions and generalized entropies Generalized exponential families The predictive distributions discussed here are related to the q-exponential families introduced by Naudts (2009, 2011) These families are based on the so-called deformed exponential function and its inverse, the deformed logarithmic function, which can be used in the definition of the generalized entropy H α(p) Such generalized exponential families share some of the nice properties of standard exponential families, but are not as tractable These deformed logarithmic and exponential functions can be further generalized, and one may be able to define the f -divergence H (f ) (P) in terms of them [Naudts, 2004]
30 Discussion Concluding Remarks 1 There exists an interesting duality between the classical notion of unbiasedness and minimization of posterior expected quadratic loss A similar result holds for the relative entropy loss if one uses a more general definition of unbiasedness
31 Discussion Concluding Remarks 1 There exists an interesting duality between the classical notion of unbiasedness and minimization of posterior expected quadratic loss A similar result holds for the relative entropy loss if one uses a more general definition of unbiasedness Using the relative entropy as loss function one can achieve invariant L-unbiased Bayes estimators In exponential family settings, such estimator can easily be computed for any (informative) conjugate prior, not only for unbiased priors
32 Discussion Concluding Remarks 2 There is another interesting duality between maximization of entropy and minimization of worst-case expected loss Conjugate predictive distribution derived from exponential family sampling models can be regarded as generalized exponential families in the sense that they maximize a generalized entropy
33 Discussion Concluding Remarks 2 There is another interesting duality between maximization of entropy and minimization of worst-case expected loss Conjugate predictive distribution derived from exponential family sampling models can be regarded as generalized exponential families in the sense that they maximize a generalized entropy They can be seen as flexible/robust versions of the corresponding sampling model and used for data analysis in their own right They inherit some of the tractability of the original sampling model
34 Discussion Concluding Remarks 2 There is another interesting duality between maximization of entropy and minimization of worst-case expected loss Conjugate predictive distribution derived from exponential family sampling models can be regarded as generalized exponential families in the sense that they maximize a generalized entropy They can be seen as flexible/robust versions of the corresponding sampling model and used for data analysis in their own right They inherit some of the tractability of the original sampling model When used as sampling models for data analysis, their mixture representation leads to a simple analysis via Gibbs sampling
35 References Grünwald & Dawid (2004). Game theory, maximum extropy, minumum discrepancy and robust Bayesian decision theory. Annals of Statistics 32, GP & Mendoza (1999). A note on Bayes estimates for exponential families. Revista de la Real Academia de Ciencias Exactas, Físicas y Naturales (España) 93, GP & Mendoza, M. (2013). Proper and non-informative conjugate priors for exponential family models. In Bayesian Theory and Applications (P. Damien, P. Dellaportas, N.G. Polson & D.A. Stephens, eds.) Oxford: University Press, chap. 19. Harremoës, P. (2001). Binomial and Poisson distributions as maximum entropy distributions. IEEE Trans. Information Theory 47, Hartigan (1965). The asymptotically unbiased prior distribution. Annals of Mathematical Statistics 36, Hartigan (1998). The maximum likelihood prior. Annals of Statistics 26, Johnson, O. (2007). Log-concavity and the maximum entropy property of the Poisson distribution. Stochastic Process. Appl. 117, Lehmann (1951). A general concept of unbiasedness. The Annals of Mathematical Statistics 32, Naudts, J. (2004). Estimators, escort probabilities, and φ-exponential families in statistical physics. Journal of Inequalities in Pure and Applied Mathematics 5, 102. Naudts, J. (2009). The q-exponential family in statistical physics. Central European Journal of Physics 7, Naudts, J. (2011). Generalised Thermostatistics. Berlin: Springer-Verlag. Noorbaloochi & Meeden (1983). Unbiasedness as the dual of being Bayes. Journal of the American Statistical Association 78, Topsøe, F. (1979). Information theoretical optimization techniques. Kybernetika 15, Wu & Vos (2012). Decomposition of Kullback-Leibler risk and unbiasedness for parameter free estimators. Journal of Statistical Planning and Inference 142,
Invariant HPD credible sets and MAP estimators
Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationRobustness and duality of maximum entropy and exponential family distributions
Chapter 7 Robustness and duality of maximum entropy and exponential family distributions In this lecture, we continue our study of exponential families, but now we investigate their properties in somewhat
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationInformation geometry of Bayesian statistics
Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationICES REPORT Model Misspecification and Plausibility
ICES REPORT 14-21 August 2014 Model Misspecification and Plausibility by Kathryn Farrell and J. Tinsley Odena The Institute for Computational Engineering and Sciences The University of Texas at Austin
More informationProbability and Statistics
Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Chapter 3: Parametric families of univariate distributions CHAPTER 3: PARAMETRIC
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationParametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST, 2007
More information13: Variational inference II
10-708: Probabilistic Graphical Models, Spring 2015 13: Variational inference II Lecturer: Eric P. Xing Scribes: Ronghuo Zheng, Zhiting Hu, Yuntian Deng 1 Introduction We started to talk about variational
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationOverall Objective Priors
Overall Objective Priors Jim Berger, Jose Bernardo and Dongchu Sun Duke University, University of Valencia and University of Missouri Recent advances in statistical inference: theory and case studies University
More informationExponential Families
Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationST5215: Advanced Statistical Theory
Department of Statistics & Applied Probability Wednesday, October 5, 2011 Lecture 13: Basic elements and notions in decision theory Basic elements X : a sample from a population P P Decision: an action
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2014 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Yu-Hsin Kuo, Amos Ng 1 Introduction Last lecture
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationModern Methods of Statistical Learning sf2935 Auxiliary material: Exponential Family of Distributions Timo Koski. Second Quarter 2016
Auxiliary material: Exponential Family of Distributions Timo Koski Second Quarter 2016 Exponential Families The family of distributions with densities (w.r.t. to a σ-finite measure µ) on X defined by R(θ)
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationExam C Solutions Spring 2005
Exam C Solutions Spring 005 Question # The CDF is F( x) = 4 ( + x) Observation (x) F(x) compare to: Maximum difference 0. 0.58 0, 0. 0.58 0.7 0.880 0., 0.4 0.680 0.9 0.93 0.4, 0.6 0.53. 0.949 0.6, 0.8
More informationEstimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk
Ann Inst Stat Math (0) 64:359 37 DOI 0.007/s0463-00-036-3 Estimators for the binomial distribution that dominate the MLE in terms of Kullback Leibler risk Paul Vos Qiang Wu Received: 3 June 009 / Revised:
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationThe binomial model. Assume a uniform prior distribution on p(θ). Write the pdf for this distribution.
The binomial model Example. After suspicious performance in the weekly soccer match, 37 mathematical sciences students, staff, and faculty were tested for the use of performance enhancing analytics. Let
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationOne-parameter models
One-parameter models Patrick Breheny January 22 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/17 Introduction Binomial data is not the only example in which Bayesian solutions can be worked
More informationStatistical Inference
Statistical Inference Robert L. Wolpert Institute of Statistics and Decision Sciences Duke University, Durham, NC, USA Week 12. Testing and Kullback-Leibler Divergence 1. Likelihood Ratios Let 1, 2, 2,...
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationStat 535 C - Statistical Computing & Monte Carlo Methods. Arnaud Doucet.
Stat 535 C - Statistical Computing & Monte Carlo Methods Arnaud Doucet Email: arnaud@cs.ubc.ca 1 Suggested Projects: www.cs.ubc.ca/~arnaud/projects.html First assignement on the web: capture/recapture.
More informationSuggested solutions to written exam Jan 17, 2012
LINKÖPINGS UNIVERSITET Institutionen för datavetenskap Statistik, ANd 73A36 THEORY OF STATISTICS, 6 CDTS Master s program in Statistics and Data Mining Fall semester Written exam Suggested solutions to
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationBayesian analysis in finite-population models
Bayesian analysis in finite-population models Ryan Martin www.math.uic.edu/~rgmartin February 4, 2014 1 Introduction Bayesian analysis is an increasingly important tool in all areas and applications of
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationProbabilistic Graphical Models for Image Analysis - Lecture 4
Probabilistic Graphical Models for Image Analysis - Lecture 4 Stefan Bauer 12 October 2018 Max Planck ETH Center for Learning Systems Overview 1. Repetition 2. α-divergence 3. Variational Inference 4.
More informationPart IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationEstimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1
Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationInstructor: Dr. Volkan Cevher. 1. Background
Instructor: Dr. Volkan Cevher Variational Bayes Approximation ice University STAT 631 / ELEC 639: Graphical Models Scribe: David Kahle eviewers: Konstantinos Tsianos and Tahira Saleem 1. Background These
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationRemarks on Improper Ignorance Priors
As a limit of proper priors Remarks on Improper Ignorance Priors Two caveats relating to computations with improper priors, based on their relationship with finitely-additive, but not countably-additive
More informationPredictive Hypothesis Identification
Marcus Hutter - 1 - Predictive Hypothesis Identification Predictive Hypothesis Identification Marcus Hutter Canberra, ACT, 0200, Australia http://www.hutter1.net/ ANU RSISE NICTA Marcus Hutter - 2 - Predictive
More informationWE start with a general discussion. Suppose we have
646 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 43, NO. 2, MARCH 1997 Minimax Redundancy for the Class of Memoryless Sources Qun Xie and Andrew R. Barron, Member, IEEE Abstract Let X n = (X 1 ; 111;Xn)be
More informationThe Expectation Maximization or EM algorithm
The Expectation Maximization or EM algorithm Carl Edward Rasmussen November 15th, 2017 Carl Edward Rasmussen The EM algorithm November 15th, 2017 1 / 11 Contents notation, objective the lower bound functional,
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationEntropy measures of physics via complexity
Entropy measures of physics via complexity Giorgio Kaniadakis and Flemming Topsøe Politecnico of Torino, Department of Physics and University of Copenhagen, Department of Mathematics 1 Introduction, Background
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationProbabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013
School of Computer Science Probabilistic Graphical Models Theory of Variational Inference: Inner and Outer Approximation Junming Yin Lecture 15, March 4, 2013 Reading: W & J Book Chapters 1 Roadmap Two
More informationStatistical Theory MT 2007 Problems 4: Solution sketches
Statistical Theory MT 007 Problems 4: Solution sketches 1. Consider a 1-parameter exponential family model with density f(x θ) = f(x)g(θ)exp{cφ(θ)h(x)}, x X. Suppose that the prior distribution has the
More informationA BAYESIAN MATHEMATICAL STATISTICS PRIMER. José M. Bernardo Universitat de València, Spain
A BAYESIAN MATHEMATICAL STATISTICS PRIMER José M. Bernardo Universitat de València, Spain jose.m.bernardo@uv.es Bayesian Statistics is typically taught, if at all, after a prior exposure to frequentist
More informationStat 260/CS Learning in Sequential Decision Problems.
Stat 260/CS 294-102. Learning in Sequential Decision Problems. Peter Bartlett 1. Multi-armed bandit algorithms. Exponential families. Cumulant generating function. KL-divergence. KL-UCB for an exponential
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationLet us first identify some classes of hypotheses. simple versus simple. H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided
Let us first identify some classes of hypotheses. simple versus simple H 0 : θ = θ 0 versus H 1 : θ = θ 1. (1) one-sided H 0 : θ θ 0 versus H 1 : θ > θ 0. (2) two-sided; null on extremes H 0 : θ θ 1 or
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationEstimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking
FUSION 2012, Singapore 118) Estimation and Maintenance of Measurement Rates for Multiple Extended Target Tracking Karl Granström*, Umut Orguner** *Division of Automatic Control Department of Electrical
More informationBeta statistics. Keywords. Bayes theorem. Bayes rule
Keywords Beta statistics Tommy Norberg tommy@chalmers.se Mathematical Sciences Chalmers University of Technology Gothenburg, SWEDEN Bayes s formula Prior density Likelihood Posterior density Conjugate
More informationIntroduction to Bayesian Statistics
Bayesian Parameter Estimation Introduction to Bayesian Statistics Harvey Thornburg Center for Computer Research in Music and Acoustics (CCRMA) Department of Music, Stanford University Stanford, California
More informationMidterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm
Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More information14 : Theory of Variational Inference: Inner and Outer Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 14 : Theory of Variational Inference: Inner and Outer Approximation Lecturer: Eric P. Xing Scribes: Maria Ryskina, Yen-Chia Hsu 1 Introduction
More informationIterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk
Iterative Markov Chain Monte Carlo Computation of Reference Priors and Minimax Risk John Lafferty School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 lafferty@cs.cmu.edu Abstract
More informationBayesian Inference. Chapter 4: Regression and Hierarchical Models
Bayesian Inference Chapter 4: Regression and Hierarchical Models Conchi Ausín and Mike Wiper Department of Statistics Universidad Carlos III de Madrid Master in Business Administration and Quantitative
More informationExperimental Design to Maximize Information
Experimental Design to Maximize Information P. Sebastiani and H.P. Wynn Department of Mathematics and Statistics University of Massachusetts at Amherst, 01003 MA Department of Statistics, University of
More informationarxiv: v3 [stat.me] 11 Feb 2018
arxiv:1708.02742v3 [stat.me] 11 Feb 2018 Minimum message length inference of the Poisson and geometric models using heavy-tailed prior distributions Chi Kuen Wong, Enes Makalic, Daniel F. Schmidt February
More informationarxiv: v1 [cs.lg] 1 May 2010
A Geometric View of Conjugate Priors Arvind Agarwal and Hal Daumé III School of Computing, University of Utah, Salt Lake City, Utah, 84112 USA {arvind,hal}@cs.utah.edu arxiv:1005.0047v1 [cs.lg] 1 May 2010
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........
More informationGraduate Econometrics I: Maximum Likelihood I
Graduate Econometrics I: Maximum Likelihood I Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationReview and continuation from last week Properties of MLEs
Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More informationORIGINS OF STOCHASTIC PROGRAMMING
ORIGINS OF STOCHASTIC PROGRAMMING Early 1950 s: in applications of Linear Programming unknown values of coefficients: demands, technological coefficients, yields, etc. QUOTATION Dantzig, Interfaces 20,1990
More informationThe Information Bottleneck Revisited or How to Choose a Good Distortion Measure
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationA View on Extension of Utility-Based on Links with Information Measures
Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,
More informationECE 275A Homework 7 Solutions
ECE 275A Homework 7 Solutions Solutions 1. For the same specification as in Homework Problem 6.11 we want to determine an estimator for θ using the Method of Moments (MOM). In general, the MOM estimator
More information