Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory
|
|
- Arlene Fleming
- 6 years ago
- Views:
Transcription
1 Statistical Inference Parametric Inference Maximum Likelihood Inference Exponential Families Expectation Maximization (EM) Bayesian Inference Statistical Decison Theory IP, José Bioucas Dias, IST,
2 Statistical Inference Statistics aims at retriving the causes (e.g., parameters of a pdf) from the observations (effects) Probability Statistics Statistical inference problems can thus be seen as Inverse Problems As a result of this perpective, at the eighteenth century (at the time of Bayes and Laplace) Statistics was often called Inverse Probability IP, José Bioucas Dias, IST,
3 Parametric Inference Consider the parametric model is the parameter space and The problem of inference reduces to the estimation of where is the parameter from ; i.e, Parameters of interest and nuisance parameters Let Sometimes we are only interested in some function that depends only on - parameter of interest; Example: - nuisance parameter IP, José Bioucas Dias, IST,
4 Parametric Inference (theoretical limits) The Cramer Rao Lower Bound (CRLB) Under under appropriate regularity conditions, the covariance matrix of any Unbiased estimator, satisfies where is the Fisher information matrix given by An unbiased estimator that attains the CRLB may be found iif For some function h. The estimator is IP, José Bioucas Dias, IST,
5 CRLB for the general Gaussian case Example: Parameter of a signal in white noise If Example: Known signal in unknown white noise IP, José Bioucas Dias, IST,
6 Maximum Likelihood Method is the likelihood function If for all f we can use the log-likelihood Example (Bernoulli) IP, José Bioucas Dias, IST,
7 Maximum Likelihood Example (Uniform) 1 1 IP, José Bioucas Dias, IST,
8 Maximum Likelihood Example (Gaussian) IID Sample mean Sample variace IP, José Bioucas Dias, IST,
9 Maximum Likelihood Example (Multivariate Gaussian) IID Sample mean Sample covariance IP, José Bioucas Dias, IST,
10 Maximum Likelihood (linear observation model) Example: Linear observation in Gaussian noise A is full rank IP, José Bioucas Dias, IST,
11 Example: Linear observation in Gaussian noise (cont.) MLE is equivalent to the LSE using the norm If,, is given by the Moore-Penrose Pseudo-Inverse is a projection matrix (SVD) If the noise is zero-mean but not Gaussian, the Best Linear Unbiased estimator (BLUE) is still given by IP, José Bioucas Dias, IST,
12 Maximum likelihood Linear observation in Gaussian noise MLE Properties (MLE is optimal for the linear model) Is the Minimum Variance Unbiased (MVU) estimator [ and is the minimum among all unbiased estimators] Is efficient (it attains the Camer Rao Lower Bound (CRLB)) Its PDF is IP, José Bioucas Dias, IST,
13 Maximum likelihood (characterization) Appealing properties of MLE Let A sequence of IID vectors in and 1. The MLE is consistent: ( denotes the true parameter) 2. The MLE is equivariant: if is the MLE estimate of, then is the MLE estimate of 3. The MLE (under appropriate regularity conditions) is asymptotically Normal and optimal or efficient: Fisher information matrix IP, José Bioucas Dias, IST,
14 The exponential Family Definition: the set dimension k if there there are functions such that an exponential family of is a sufficient statistic for f, i.e, Theorem: (Neyman-Fisher Factorization) f iif can be factored as is a sufficient statistic for IP, José Bioucas Dias, IST,
15 The exponential family Natural (or canonical) form Given an exponential family, it is always possible to introduce the change of variables and the reparemeterization such that Since is a PDF, it must integrate to one IP, José Bioucas Dias, IST,
16 The exponential family (The partition function) Computing moments from the derivatives of the partition function After some calculus IP, José Bioucas Dias, IST,
17 The exponential family (IID sequences) Let a member of an exponential family defined by The density of the IID sequence is belongs exponential family defined by IP, José Bioucas Dias, IST,
18 Examples of exponential families Many of the most common probabilistic models belong to exponential families; e.g., Gaussian, Poisson, Bernoulli, binomial, exponential, gamma, and beta. Example: Canonical form IP, José Bioucas Dias, IST,
19 Examples of exponential families (Gaussian) Example: Canonical form IP, José Bioucas Dias, IST,
20 Computing maximum likelihood estimates Very often the MLE can not be found analytically. Commonly used numerical methods: 1. Newton-Raphson 2. Scoring 3. Expectation Maximization (EM) Newton-Raphson method Scoring method Can be computed off-line IP, José Bioucas Dias, IST,
21 Computing maximum likelihood estimates (EM) Expectation Maximization (EM) [Dempster, Laird, and Rubin, 1977] Suppose that is hard to maximize But we can find a vector z such that is easy to maximze and Idea: iterate between two steps: E-step: Fill in z in M-step: Maximize Terminology Observed data Missing data Complete data IP, José Bioucas Dias, IST,
22 Expectation maximization The EM algorithm 1. Pick up a starting vector : repeat steps 2. and E-step: Calculate 3. M-step Alternatively (GEM) IP, José Bioucas Dias, IST,
23 Expectation maximization The EM (GEM) algorithm always increases the likelihood. Define Kulback Leibler distance 4. KL distance maximization IP, José Bioucas Dias, IST,
24 Expectation maximization (why does it work?) IP, José Bioucas Dias, IST,
25 EM: Mixtures of densities Let be the random variable that selects the active mode: where and IP, José Bioucas Dias, IST,
26 EM: Mixtures of densities Consider now that is a sequence of IID random variables Let be IID random variables, where selects the active mode in the sample : IP, José Bioucas Dias, IST,
27 EM: Mixtures of densities Equivalent Q Where is the sample mean of x, i.e., IP, José Bioucas Dias, IST,
28 EM: Mixtures of densities E-step: M-step: IP, José Bioucas Dias, IST,
29 EM: Mixtures of densities E-step: M-step: IP, José Bioucas Dias, IST,
30 EM: Mixtures of Gaussian densities (MOGs) E-step: M-step: Weighted sample mean Weighted sample covariance IP, José Bioucas Dias, IST,
31 EM: Mixtures of Gaussian densities. 1D Example loglikelihood L(f k ) p = 3 N = IP, José Bioucas Dias, IST,
32 EM: Mixtures of Gaussian Densities (MOGs) Example 1D p = 3 N = hist hist 0.3 est MOG true MOG 0.3 est modes true modes IP, José Bioucas Dias, IST,
33 EM: Mixtures of Gaussian Densities: 2D Example MOG with determination of the number of modes [M. Figueiredo, 2002] k= IP, José Bioucas Dias, IST,
34 Bayesian Estimation IP, José Bioucas Dias, IST,
35 The Bayesian Philosophy ([Wasserman, 2004]) Bayesian Inference B1 Probabilities describe degrees of belief, not limiting relative frequency B2 We can make probability statements about parameters, even though they are fixed parameters B3 We make inferences about a parameter by producing a probalility distribution for Frequencist or Classical Inference F1 Probabilities refer to limiting relative frequencies and are objective properties of the real world F2 Parameters are fixed unknown parameters F3 The criteria for obtaining statistical procedures are based on long run frequency properties. IP, José Bioucas Dias, IST,
36 The Bayesian Philosophy unknown Observation model observation Prior knowledge Bayesian Inference Classical Inference describes degrees of belief (subjective), not limiting frequency IP, José Bioucas Dias, IST,
37 The Bayesian method 1. Choose a prior density, called the prior (or a priori) distribution that expresses our beliefs about f, before we see any data 2. Choose the observation model that reflects our belief about g given f 3. Calculate the posterior (or a posteriori) distribution using the Bayes law: where is the marginal on g (other names: evidence, unconditional, predictive) 4. Any inference should be based on the posterior IP, José Bioucas Dias, IST,
38 The Bayesian method Example: Let IID and = =0.5 = =1 = =2 = =10 for = >1, towards 1/2 pulls IP, José Bioucas Dias, IST,
39 Example (cont.): (Bernoulli observations, Beta prior) Observation model Prior Posterior Thus, IP, José Bioucas Dias, IST,
40 Example (cont.): (Bernoulli observations, Beta prior) Maximum a posteriori estimate (MAP) Total ignorance: flat prior = =1 Note that for large values of n The von Mises Theorem If the prior is continuous and not zero at the location of the MLestimate, then, IP, José Bioucas Dias, IST,
41 Conjugate priors In the previous example, the prior and the posterior are both Beta distributed. We say that the prior is conjugate with respect to the model Formally, let and be two parametrized families of priors and observation models, respectively is a conjugate family for if for some Very often, prior information about f is very small, allowing to select conjugate priors Conjugate priors why? Computing the posterior density simply consists in updating the parameters of the prior IP, José Bioucas Dias, IST,
42 Conjugate priors (Gaussian observation, Gaussian prior) Gaussian observations Gaussian prior The posterior distribution is Gaussian 1. The mean of is in the simplex defined by {g, } 2. The variance of is the parallel of variances and IP, José Bioucas Dias, IST,
43 Conjugate priors (Gaussian IID observations, Gaussian prior) Gaussian IID observations Gaussian prior The posterior distribution is Gaussian 1. The mean of is in the simplex defined by 2. The variance of is the parallel of variances and IP, José Bioucas Dias, IST,
44 Conjugate Priors (Gaussian IID observations, Gaussian prior) IP, José Bioucas Dias, IST,
45 Conjugate Priors (multivariate Gaussian: observation and prior) (g,f) jointly Gaussian distributed: Then a) b) c) IP, José Bioucas Dias, IST,
46 Conjugate Priors (multivariate Gaussian: observation and prior) Linear observation model (f and w independent) Posterior IP, José Bioucas Dias, IST,
47 Conjugate Priors (multivariate Gaussian: observation and prior) Linear observation model (f and w independent) Using the matrix inversion lemma is the solution of the following regularized LS problem e.g., penalize oscillatory solutions IP, José Bioucas Dias, IST,
48 Improper Priors Assume that p(f)=k on given domain Even if the domain of f is unbounded, and, thus, the posterior is well defined. In a sense, improper priors account for a state of total ignorance. This raises no issues to the Bayesian framework, as far as the posterior is proper. IP, José Bioucas Dias, IST,
49 Bayes Estimators IP, José Bioucas Dias, IST,
50 Bayes estimators Ingredients of Statistical Decision Theory: posterior distribution conveys all knowledge about f, given the observation g loss function measures the discrepancy between and. a posteriori expected loss optimal Bayes estimator IP, José Bioucas Dias, IST,
51 Bayesian framework Nuisance Parameter Let and Nuisance parameter The posterior risk depends only on the marginal on In a pure Bayesian framework, nuisance parameters are integrated out IP, José Bioucas Dias, IST,
52 Bayes estimators: Maximum a posteriori probability (MAP) Zero-one, 0/1, loss Volume of an -ball Maximum a posteriori probability A discrete domain leads to the MAP estimator as well IP, José Bioucas Dias, IST,
53 Bayes Estimators: Posterior Mean (PM) Quadratic loss: Q is symmetric and positive definite Only this term Depends on Posterior mean may be hard to compute Valid for any is additive. If Q diagonal the loss function IP, José Bioucas Dias, IST,
54 Bayes estimators: Additive loss Let Then, the minimization is decoupled Each component of minimizes the corresponding marginal a posteriori expected loss IP, José Bioucas Dias, IST,
55 Bayes Estimators: Additive Loss Additive 0/1 loss: is the maximizer of the posterior marginal Additive quadratic loss: The additive quadratic loss is a quadratic loss with Q=I. Therefore, The corresponding Bayes estimator is the posterior mean IP, José Bioucas Dias, IST,
56 Example (Gaussian IID observations, Gaussian prior) Gaussian IID observations Gaussian prior The posterior distribution is Gaussian as IP, José Bioucas Dias, IST,
57 Example (Gaussian observation, Laplacian prior) MAP estimate Strictly concave IP, José Bioucas Dias, IST,
58 Example (Gaussian observation, Laplacian prior) MAP estimate IP, José Bioucas Dias, IST,
59 Example (Gaussian observation, Laplacian prior) PM estimate No closed form expressions Resort to numerical procedures IP, José Bioucas Dias, IST,
60 Example (Gaussian observation, Laplacian prior) IP, José Bioucas Dias, IST,
61 Example (Gaussian observation, Laplacian prior) IP, José Bioucas Dias, IST,
62 Example (Multivariate Gaussian: observation and prior) Linear observation model (f and w independent) Posterior is called the Wiener filter If all the eigenvectors of C approaches infinite, then which is the Moore-Penrose pseudo (or generalized) inverse of A IP, José Bioucas Dias, IST,
Stat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2
Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate
More informationPROBABILITY DISTRIBUTIONS. J. Elder CSE 6390/PSYC 6225 Computational Modeling of Visual Perception
PROBABILITY DISTRIBUTIONS Credits 2 These slides were sourced and/or modified from: Christopher Bishop, Microsoft UK Parametric Distributions 3 Basic building blocks: Need to determine given Representation:
More informationMaximum Likelihood Estimation
Connexions module: m11446 1 Maximum Likelihood Estimation Clayton Scott Robert Nowak This work is produced by The Connexions Project and licensed under the Creative Commons Attribution License Abstract
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationINTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP
INTRODUCTION TO BAYESIAN INFERENCE PART 2 CHRIS BISHOP Personal Healthcare Revolution Electronic health records (CFH) Personal genomics (DeCode, Navigenics, 23andMe) X-prize: first $10k human genome technology
More informationStatistical Signal Processing Detection, Estimation, and Time Series Analysis
Statistical Signal Processing Detection, Estimation, and Time Series Analysis Louis L. Scharf University of Colorado at Boulder with Cedric Demeure collaborating on Chapters 10 and 11 A TT ADDISON-WESLEY
More informationBayesian Regression Linear and Logistic Regression
When we want more than point estimates Bayesian Regression Linear and Logistic Regression Nicole Beckage Ordinary Least Squares Regression and Lasso Regression return only point estimates But what if we
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationEstimation, Detection, and Identification CMU 18752
Estimation, Detection, and Identification CMU 18752 Graduate Course on the CMU/Portugal ECE PhD Program Spring 2008/2009 Instructor: Prof. Paulo Jorge Oliveira pjcro @ isr.ist.utl.pt Phone: +351 21 8418053
More informationTABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1
TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1 1.1 The Probability Model...1 1.2 Finite Discrete Models with Equally Likely Outcomes...5 1.2.1 Tree Diagrams...6 1.2.2 The Multiplication Principle...8
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano Testing Statistical Hypotheses Third Edition 4y Springer Preface vii I Small-Sample Theory 1 1 The General Decision Problem 3 1.1 Statistical Inference and Statistical Decisions
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationBasic concepts in estimation
Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates
More informationPART I INTRODUCTION The meaning of probability Basic definitions for frequentist statistics and Bayesian inference Bayesian inference Combinatorics
Table of Preface page xi PART I INTRODUCTION 1 1 The meaning of probability 3 1.1 Classical definition of probability 3 1.2 Statistical definition of probability 9 1.3 Bayesian understanding of probability
More informationPattern Recognition and Machine Learning
Christopher M. Bishop Pattern Recognition and Machine Learning ÖSpri inger Contents Preface Mathematical notation Contents vii xi xiii 1 Introduction 1 1.1 Example: Polynomial Curve Fitting 4 1.2 Probability
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLinear Models A linear model is defined by the expression
Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationGentle Introduction to Infinite Gaussian Mixture Modeling
Gentle Introduction to Infinite Gaussian Mixture Modeling with an application in neuroscience By Frank Wood Rasmussen, NIPS 1999 Neuroscience Application: Spike Sorting Important in neuroscience and for
More informationLinear Dependent Dimensionality Reduction
Linear Dependent Dimensionality Reduction Nathan Srebro Tommi Jaakkola Department of Electrical Engineering and Computer Science Massachusetts Institute of Technology Cambridge, MA 239 nati@mit.edu,tommi@ai.mit.edu
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationDensity Estimation: ML, MAP, Bayesian estimation
Density Estimation: ML, MAP, Bayesian estimation CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani Outline Introduction Maximum-Likelihood Estimation Maximum
More informationProbability and Statistics
Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Chapter 3: Parametric families of univariate distributions CHAPTER 3: PARAMETRIC
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationEstimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction
Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept
More informationPoint Estimation. Vibhav Gogate The University of Texas at Dallas
Point Estimation Vibhav Gogate The University of Texas at Dallas Some slides courtesy of Carlos Guestrin, Chris Bishop, Dan Weld and Luke Zettlemoyer. Basics: Expectation and Variance Binary Variables
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationMaximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS
Maximum Likelihood (ML), Expectation Maximization (EM) Pieter Abbeel UC Berkeley EECS Many slides adapted from Thrun, Burgard and Fox, Probabilistic Robotics Outline Maximum likelihood (ML) Priors, and
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationDS-GA 1002 Lecture notes 11 Fall Bayesian statistics
DS-GA 100 Lecture notes 11 Fall 016 Bayesian statistics In the frequentist paradigm we model the data as realizations from a distribution that depends on deterministic parameters. In contrast, in Bayesian
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationModelling geoadditive survival data
Modelling geoadditive survival data Thomas Kneib & Ludwig Fahrmeir Department of Statistics, Ludwig-Maximilians-University Munich 1. Leukemia survival data 2. Structured hazard regression 3. Mixed model
More informationThe Bayesian Choice. Christian P. Robert. From Decision-Theoretic Foundations to Computational Implementation. Second Edition.
Christian P. Robert The Bayesian Choice From Decision-Theoretic Foundations to Computational Implementation Second Edition With 23 Illustrations ^Springer" Contents Preface to the Second Edition Preface
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationThe Bayes classifier
The Bayes classifier Consider where is a random vector in is a random variable (depending on ) Let be a classifier with probability of error/risk given by The Bayes classifier (denoted ) is the optimal
More informationSignal detection theory
Signal detection theory z p[r -] p[r +] - + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z)) Is there a better test to use than r? z p[r -] p[r +] - + The optimal
More informationStatistical Estimation
Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationCS 361: Probability & Statistics
March 14, 2018 CS 361: Probability & Statistics Inference The prior From Bayes rule, we know that we can express our function of interest as Likelihood Prior Posterior The right hand side contains the
More informationAn introduction to Bayesian inference and model comparison J. Daunizeau
An introduction to Bayesian inference and model comparison J. Daunizeau ICM, Paris, France TNU, Zurich, Switzerland Overview of the talk An introduction to probabilistic modelling Bayesian model comparison
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2
More informationσ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =
Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,
More informationTesting Statistical Hypotheses
E.L. Lehmann Joseph P. Romano, 02LEu1 ttd ~Lt~S Testing Statistical Hypotheses Third Edition With 6 Illustrations ~Springer 2 The Probability Background 28 2.1 Probability and Measure 28 2.2 Integration.........
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationOutline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution
Outline A short review on Bayesian analysis. Binomial, Multinomial, Normal, Beta, Dirichlet Posterior mean, MAP, credible interval, posterior distribution Gibbs sampling Revisit the Gaussian mixture model
More informationPart 2: One-parameter models
Part 2: One-parameter models 1 Bernoulli/binomial models Return to iid Y 1,...,Y n Bin(1, ). The sampling model/likelihood is p(y 1,...,y n ) = P y i (1 ) n P y i When combined with a prior p( ), Bayes
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationClassical and Bayesian inference
Classical and Bayesian inference AMS 132 January 18, 2018 Claudia Wehrhahn (UCSC) Classical and Bayesian inference January 18, 2018 1 / 9 Sampling from a Bernoulli Distribution Theorem (Beta-Bernoulli
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationReview Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the
Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then
More informationHANDBOOK OF APPLICABLE MATHEMATICS
HANDBOOK OF APPLICABLE MATHEMATICS Chief Editor: Walter Ledermann Volume VI: Statistics PART A Edited by Emlyn Lloyd University of Lancaster A Wiley-Interscience Publication JOHN WILEY & SONS Chichester
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More information2 Statistical Estimation: Basic Concepts
Technion Israel Institute of Technology, Department of Electrical Engineering Estimation and Identification in Dynamical Systems (048825) Lecture Notes, Fall 2009, Prof. N. Shimkin 2 Statistical Estimation:
More informationCS 361: Probability & Statistics
October 17, 2017 CS 361: Probability & Statistics Inference Maximum likelihood: drawbacks A couple of things might trip up max likelihood estimation: 1) Finding the maximum of some functions can be quite
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationState-Space Methods for Inferring Spike Trains from Calcium Imaging
State-Space Methods for Inferring Spike Trains from Calcium Imaging Joshua Vogelstein Johns Hopkins April 23, 2009 Joshua Vogelstein (Johns Hopkins) State-Space Calcium Imaging April 23, 2009 1 / 78 Outline
More informationLessons in Estimation Theory for Signal Processing, Communications, and Control
Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationLecture 4: Types of errors. Bayesian regression models. Logistic regression
Lecture 4: Types of errors. Bayesian regression models. Logistic regression A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting more generally COMP-652 and ECSE-68, Lecture
More informationECE285/SIO209, Machine learning for physical applications, Spring 2017
ECE285/SIO209, Machine learning for physical applications, Spring 2017 Peter Gerstoft, 534-7768, gerstoft@ucsd.edu We meet Wednesday from 5 to 6:20pm in Spies Hall 330 Text Bishop Grading A or maybe S
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More information1. Fisher Information
1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationBiostat 2065 Analysis of Incomplete Data
Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies
More informationBayesian inference J. Daunizeau
Bayesian inference J. Daunizeau Brain and Spine Institute, Paris, France Wellcome Trust Centre for Neuroimaging, London, UK Overview of the talk 1 Probabilistic modelling and representation of uncertainty
More informationBayesian linear regression
Bayesian linear regression Linear regression is the basis of most statistical modeling. The model is Y i = X T i β + ε i, where Y i is the continuous response X i = (X i1,..., X ip ) T is the corresponding
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationBayesian Gaussian / Linear Models. Read Sections and 3.3 in the text by Bishop
Bayesian Gaussian / Linear Models Read Sections 2.3.3 and 3.3 in the text by Bishop Multivariate Gaussian Model with Multivariate Gaussian Prior Suppose we model the observed vector b as having a multivariate
More informationEM Algorithm. Expectation-maximization (EM) algorithm.
EM Algorithm Outline: Expectation-maximization (EM) algorithm. Examples. Reading: A.P. Dempster, N.M. Laird, and D.B. Rubin, Maximum likelihood from incomplete data via the EM algorithm, J. R. Stat. Soc.,
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationGeneral Bayesian Inference I
General Bayesian Inference I Outline: Basic concepts, One-parameter models, Noninformative priors. Reading: Chapters 10 and 11 in Kay-I. (Occasional) Simplified Notation. When there is no potential for
More informationBayesian Paradigm. Maximum A Posteriori Estimation
Bayesian Paradigm Maximum A Posteriori Estimation Simple acquisition model noise + degradation Constraint minimization or Equivalent formulation Constraint minimization Lagrangian (unconstraint minimization)
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationECE521 lecture 4: 19 January Optimization, MLE, regularization
ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity
More information