A732: Exercise #7 Maximum Likelihood

Size: px
Start display at page:

Download "A732: Exercise #7 Maximum Likelihood"

Transcription

1 A732: Exercise #7 Maximum Likelihood Due: 29 Novemer 2007 Analytic computation of some one-dimensional maximum likelihood estimators (a) Including the normalization, the exponential distriution function is f(x,θ) = θe θx. () The likelihood function for n data points is then L(θ) = θ n e θ n i= x i (2) The likelihood estimator follows easily y determining the zero of dl(θ)/dθ to find θ ML = n n i= x =. (3) i µ x In words, the maximum likelihood estimator θ ML is equal to the inverse of the sample mean. () In this case one can not determine θ y searching the zero of the first derivative of the likelihood function ecause density function is discontinuous, { θ f(x;θ) = if x < θ 0 otherwise. and the resulting likelihood function, L(θ) = θ n (4)

2 is monotonic. Note that finding the root of the first derivative for the likelihood is only a mathematical device for finding the extremum and there is no reason that other arguments can not e used. In particular, note that θ must e larger than or equal to x max = max(x,x 2,...,x n ). Since x max is the smallest of the possile values of θ consistent with the data, it is the one that maximimizes L(θ). We have therefore argued that the maximum likelihood estimate of θ is θ ML = x max. Suppose that the true value of θ = Θ is greater than the maximum likelihood estimate (θ ML = x max ). It is straightforward to calculate the cumulative proaility, P(x < x max ;Θ) and determine the proaility that n values of x smaller than x max : ( xmax ) n. P(x < x max ;Θ) = (5) Θ As intuitively expected, the larger n, the smaller the proaility that Θ > θ ML. One can also easily see that the maximum-likelihood estimate is iased: the distriution function of x max = max(x,x 2,...,x n ) with x i drawn from the uniform distriution discussed in this exercise is f(x max ) = n θ nxn max (6) It is straightforward to show that E(x max ) = n+ n θ. Figure shows the histogram of x max otained from 0000 samples of 5 random numers drawn from a uniform distriution with θ = 2. As expected for n = 5 and θ = 2, the mean value of x max is in this case equal to Bayes theorem and ias (a) See next... () Assume that {x} is distriuted as f(x;µ,σ 2 ) where µ and σ 2 descrie the mean and variance of the known distriution f( ). The likelihood function is then L(µ;σ 2 ) = N f(x k ;µ,σ 2 ) k= 09Dec07/MDW 2

3 x max N Figure : Histogram of values of x max from samples of 5 random numers drawn from a uniform distriution for 0 < x < 2. and our ML estimates are the values ˆµ, ˆσ 2 that maximize L or equivalently log L. Our analytic expressions for these parameters are: logl µ logl σ 2 = k = k f f f µ f σ 2 For a Gaussian, f(x;µ,σ 2 ) = exp [ (x µ) 2 /2σ 2] / 2πσ 2, we find: xk xk ˆµ = N k x k.75 ˆσ 2 = N k (x k ˆµ) In class, we showed that the sample mean and sample variance are uniased estimators for the mean and variance. Notice that our ML estimate ˆµ is the sample mean ut ˆσ 2 differs from the sample variance y the factor N/(N ) and is, therefore, iased. The difference etween 2.9 for the iased variance nand 2.92 for the uniased variance is relatively large for our small data set! (c) We know that Bayes theorem tells us that the posterior proaility of some parameter µ is the product of the prior distriution of the 09Dec07/MDW 3

4 parameter and the likelihood function, appropriately normalized: P(µ D) = P(µ)L(D µ) R dµp(µ)l(d µ), where D is the data. If we now assume that the prior distriution of µ is normal with mean µ 0 = 2 and variance σ 2 0 = 2 we have: P(µ D) P(µ;µ 0,σ 2 0 )L(D µ) where L(D µ) = N k= f(x k;µ,σ 2 e) with σ 2 e = 3. Therefore, P(µ D) is the product of two Gaussians which is, again, a Gaussian. After a it of algera, one find that the mean and variance of P(µ D) is: µ = σ 2 = ( µ0 σ 2 0 ( σ ˆµ ) ( σ 2 e /N.82/ + ) σ e/n σ σ 2 e /N In other words, the mean of the distriution for µ is the average of the prior mean inversely weighted y the prior variance and the sample mean inversely weighted y the variance of the sample mean. For large N, this mean will e dominated y the sample mean, and vice versa for small N. Similarly, the variance of the distriution for µ is the harmonic mean of prior variance and the variance of the sample mean. Note that the variance of the sample mean follows from the central limit theorem: the true population variance over the numer of data points. Therefore, just as in the case for the mean, for large N, the variance will e dominated y the sample variance, and vice versa for small N. 3 Estimating a power law with a reak ), (a) We assume that x [x min,x max ] to prevent divergence as x 0 or x. This is typical in physical applications. For example, if x represents the mass of a galaxy, the distriution has a cutoff at some small and large galaxy mass. Further, we can assume that [x min,x max], otherwise we would not have a roken power law. We can integrate { (x/) p if x f(x; p, p 2,) = K (x/) p 2 if x > 09Dec07/MDW 4

5 and therefore = Z xmax x min K = Z = x min / p + dx f(x; p, p 2,) Z dyy p xmax / + [ ( xmin dyy p 2 ) p + ] + p 2 + [ (xmax ) p2 + ] determines K. This expression demands p > if x min 0 and p 2 < if x max, further illustrating the care that must e taken in choosing the limits on the range for a power law. () Comined with next part... (c) In this prolem, p 2 = 3/2 so f is a two parameter family of functions: f(x; p, 3/2,). The likelihood is then: N L(p,) = f(x; p, 3/2,) k= or N logl(p,) = log f(x; p, 3/2,). k= My ML solution is plotted along with a histgram of the data in Figure 2 As stated in the prolem and discussed in class, logl = logl o (θ θ) 2 /2σ 2 θ (7) so that one sigma error is the value logl( θ+σ θ ) = L o /2. For higher dimensionality, one uses the fact that the likelihood function is distriuted like a multidimensional Gaussian (also known as the χ 2 distriution). From equation (7) it is then clear that the quantity 2(logL logl o ) is distriuted like χ 2 where L o is the value of maximum likelihood. In our case, one sigma for two of degrees of freedom is descried y the contour in the plot of log likelihood down from the maximum value y χ 2 /2.5. Similarly the two sigma ( three sigma ) value is the contour containing 95.4% (99.7%) of the proaility. For two degrees of freedom, the values of χ 2 /2 09Dec07/MDW 5

6 35 30 ML estimate data 25 Bin counts Values Figure 2: Plot the roken power law with ML estimated values for p and (red curve) along with the data (green histogram). Tale : Parameter confidence limits from likelihood plot Peak: p 0.27, Confidence p limit min max min max 68.3% & & are 3.09 (5.9). The value of logl is shown in Figure 3 and the three contours corresponding to these one, two, and three sigma proaility values. The confidence limits read from this plot is shown in Figure. (d) We have derived that the covariance matrix for the likelihood is σ 2 i j = 2 logl θ i θ j. Because f(x; p, p 2,) has a slightly messy analytic expression, I found it easier to perform numerical partial differentiation of log L rather than use the equivalent expression: σ 2 i j = E [ 2 log f L θ i θ j ]. 09Dec07/MDW 6

7 p p Figure 3: Plot logl as a function of power-law exponent p and reak point with logl 0 = 0. Top panel: the three curves show the theoretical 68.3%, 95.4% and 99.7% isovalues. Lower panel: low up of the density inside of the 68.3% contour. 09Dec07/MDW 7

8 I recursively used the two-point difference formula. For the diagonal terms one finds: 2 logl p 2 2 logl 2 2 logl p ( p ) 2 [logl(p + p,) 2logL(p,)+ logl(p p,)] ( ) 2 [logl(p,+ ) 2logL(p,)+ logl(p,+ )] p [logl(p + p /2,+ /2) logl(p p /2,+ /2) logl(p + p, )+logl(p p, )] I chose p = 0.0 ˆp and = 0.0ˆ. The eigenvectors of the 2 2 covariance matrix descrie the principal components (directions of uncorrelated error) and the inverse of the eigenvalues is the variance in this direction. I find: σ and σ with corresponding directions: ê = (0.08,.0) ê 2 = (.0, 0.08) In other words, the principal axes are nearly along the p and directions with a small ( ) tilt. This is consistent with our graphical solution depicted in Figure 3. Similarly, the variance estimate is consistent with the overall scale on which the levels vary ut, of course, do not predict the shape of contours. In particular, it is very important to note that p is unounded to small values of p ; in other words we can not rule out a value of nearly zero for p. Similarly, the high-confidence oundaries are not elliptical. It is nearly always more revealing to study the explicit likelihood distriution rather than rely on the covariance matrix. 09Dec07/MDW 8

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were

More information

The Gaussian distribution

The Gaussian distribution The Gaussian distribution Probability density function: A continuous probability density function, px), satisfies the following properties:. The probability that x is between two points a and b b P a

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Notes on the Multivariate Normal and Related Topics

Notes on the Multivariate Normal and Related Topics Version: July 10, 2013 Notes on the Multivariate Normal and Related Topics Let me refresh your memory about the distinctions between population and sample; parameters and statistics; population distributions

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians Engineering Part IIB: Module F Statistical Pattern Processing University of Cambridge Engineering Part IIB Module F: Statistical Pattern Processing Handout : Multivariate Gaussians. Generative Model Decision

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March.

Essential Maths 1. Macquarie University MAFC_Essential_Maths Page 1 of These notes were prepared by Anne Cooper and Catriona March. Essential Maths 1 The information in this document is the minimum assumed knowledge for students undertaking the Macquarie University Masters of Applied Finance, Graduate Diploma of Applied Finance, and

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Exam 2. Jeremy Morris. March 23, 2006

Exam 2. Jeremy Morris. March 23, 2006 Exam Jeremy Morris March 3, 006 4. Consider a bivariate normal population with µ 0, µ, σ, σ and ρ.5. a Write out the bivariate normal density. The multivariate normal density is defined by the following

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according

More information

THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS

THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS THE UNIVERSITY OF HONG KONG DEPARTMENT OF MATHEMATICS MATH853: Linear Algebra, Probability and Statistics May 5, 05 9:30a.m. :30p.m. Only approved calculators as announced by the Examinations Secretary

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Teemu Roos TAs: Ville Hyvönen and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer

More information

Dimension Reduction. David M. Blei. April 23, 2012

Dimension Reduction. David M. Blei. April 23, 2012 Dimension Reduction David M. Blei April 23, 2012 1 Basic idea Goal: Compute a reduced representation of data from p -dimensional to q-dimensional, where q < p. x 1,...,x p z 1,...,z q (1) We want to do

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians

University of Cambridge Engineering Part IIB Module 4F10: Statistical Pattern Processing Handout 2: Multivariate Gaussians University of Cambridge Engineering Part IIB Module 4F: Statistical Pattern Processing Handout 2: Multivariate Gaussians.2.5..5 8 6 4 2 2 4 6 8 Mark Gales mjfg@eng.cam.ac.uk Michaelmas 2 2 Engineering

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes

UC Berkeley Department of Electrical Engineering and Computer Sciences. EECS 126: Probability and Random Processes UC Berkeley Department of Electrical Engineering and Computer Sciences EECS 6: Probability and Random Processes Problem Set 3 Spring 9 Self-Graded Scores Due: February 8, 9 Submit your self-graded scores

More information

Test Problems for Probability Theory ,

Test Problems for Probability Theory , 1 Test Problems for Probability Theory 01-06-16, 010-1-14 1. Write down the following probability density functions and compute their moment generating functions. (a) Binomial distribution with mean 30

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018

CLASS NOTES Models, Algorithms and Data: Introduction to computing 2018 CLASS NOTES Models, Algorithms and Data: Introduction to computing 208 Petros Koumoutsakos, Jens Honore Walther (Last update: June, 208) IMPORTANT DISCLAIMERS. REFERENCES: Much of the material (ideas,

More information

Introduction to Machine Learning

Introduction to Machine Learning 1, DATA11002 Introduction to Machine Learning Lecturer: Antti Ukkonen TAs: Saska Dönges and Janne Leppä-aho Department of Computer Science University of Helsinki (based in part on material by Patrik Hoyer,

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Gaussian Models (9/9/13)

Gaussian Models (9/9/13) STA561: Probabilistic machine learning Gaussian Models (9/9/13) Lecturer: Barbara Engelhardt Scribes: Xi He, Jiangwei Pan, Ali Razeen, Animesh Srivastava 1 Multivariate Normal Distribution The multivariate

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Probability Density Functions

Probability Density Functions Statistical Methods in Particle Physics / WS 13 Lecture II Probability Density Functions Niklaus Berger Physics Institute, University of Heidelberg Recap of Lecture I: Kolmogorov Axioms Ingredients: Set

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

1 Hoeffding s Inequality

1 Hoeffding s Inequality Proailistic Method: Hoeffding s Inequality and Differential Privacy Lecturer: Huert Chan Date: 27 May 22 Hoeffding s Inequality. Approximate Counting y Random Sampling Suppose there is a ag containing

More information

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander

ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 PROBABILITY. Prof. Steven Waslander ME 597: AUTONOMOUS MOBILE ROBOTICS SECTION 2 Prof. Steven Waslander p(a): Probability that A is true 0 pa ( ) 1 p( True) 1, p( False) 0 p( A B) p( A) p( B) p( A B) A A B B 2 Discrete Random Variable X

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours

More information

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo Comment aout AR spectral estimation Usually an estimate is produced y computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo simulation approach, for every draw (φ,σ 2 ), we can compute

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Topic 19 Extensions on the Likelihood Ratio

Topic 19 Extensions on the Likelihood Ratio Topic 19 Extensions on the Likelihood Ratio Two-Sided Tests 1 / 12 Outline Overview Normal Observations Power Analysis 2 / 12 Overview The likelihood ratio test is a popular choice for composite hypothesis

More information

Exponential Families

Exponential Families Exponential Families David M. Blei 1 Introduction We discuss the exponential family, a very flexible family of distributions. Most distributions that you have heard of are in the exponential family. Bernoulli,

More information

1 Data Arrays and Decompositions

1 Data Arrays and Decompositions 1 Data Arrays and Decompositions 1.1 Variance Matrices and Eigenstructure Consider a p p positive definite and symmetric matrix V - a model parameter or a sample variance matrix. The eigenstructure is

More information

Factor Analysis and Kalman Filtering (11/2/04)

Factor Analysis and Kalman Filtering (11/2/04) CS281A/Stat241A: Statistical Learning Theory Factor Analysis and Kalman Filtering (11/2/04) Lecturer: Michael I. Jordan Scribes: Byung-Gon Chun and Sunghoon Kim 1 Factor Analysis Factor analysis is used

More information

Hidden Markov Models and Gaussian Mixture Models

Hidden Markov Models and Gaussian Mixture Models Hidden Markov Models and Gaussian Mixture Models Hiroshi Shimodaira and Steve Renals Automatic Speech Recognition ASR Lectures 4&5 23&27 January 2014 ASR Lectures 4&5 Hidden Markov Models and Gaussian

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2

Some Assorted Formulae. Some confidence intervals: σ n. x ± z α/2. x ± t n 1;α/2 n. ˆp(1 ˆp) ˆp ± z α/2 n. χ 2 n 1;1 α/2. n 1;α/2 STA 248 H1S MIDTERM TEST February 26, 2008 SURNAME: SOLUTIONS GIVEN NAME: STUDENT NUMBER: INSTRUCTIONS: Time: 1 hour and 50 minutes Aids allowed: calculator Tables of the standard normal, t and chi-square

More information

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg

Statistics for Data Analysis. Niklaus Berger. PSI Practical Course Physics Institute, University of Heidelberg Statistics for Data Analysis PSI Practical Course 2014 Niklaus Berger Physics Institute, University of Heidelberg Overview You are going to perform a data analysis: Compare measured distributions to theoretical

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Minimum Error Rate Classification

Minimum Error Rate Classification Minimum Error Rate Classification Dr. K.Vijayarekha Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur-613 401 Table of Contents 1.Minimum Error Rate Classification...

More information

Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart

Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart Statistical and Learning Techniques in Computer Vision Lecture 1: Random Variables Jens Rittscher and Chuck Stewart 1 Motivation Imaging is a stochastic process: If we take all the different sources of

More information

Hybrid Censoring Scheme: An Introduction

Hybrid Censoring Scheme: An Introduction Department of Mathematics & Statistics Indian Institute of Technology Kanpur August 19, 2014 Outline 1 2 3 4 5 Outline 1 2 3 4 5 What is? Lifetime data analysis is used to analyze data in which the time

More information

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method

STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method STAT 135 Lab 2 Confidence Intervals, MLE and the Delta Method Rebecca Barter February 2, 2015 Confidence Intervals Confidence intervals What is a confidence interval? A confidence interval is calculated

More information

26, 24, 26, 28, 23, 23, 25, 24, 26, 25

26, 24, 26, 28, 23, 23, 25, 24, 26, 25 The ormal Distribution Introduction Chapter 5 in the text constitutes the theoretical heart of the subject of error analysis. We start by envisioning a series of experimental measurements of a quantity.

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

STAT 514 Solutions to Assignment #6

STAT 514 Solutions to Assignment #6 STAT 514 Solutions to Assignment #6 Question 1: Suppose that X 1,..., X n are a simple random sample from a Weibull distribution with density function f θ x) = θcx c 1 exp{ θx c }I{x > 0} for some fixed

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

probability of k samples out of J fall in R.

probability of k samples out of J fall in R. Nonparametric Techniques for Density Estimation (DHS Ch. 4) n Introduction n Estimation Procedure n Parzen Window Estimation n Parzen Window Example n K n -Nearest Neighbor Estimation Introduction Suppose

More information

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam

Representation theory of SU(2), density operators, purification Michael Walter, University of Amsterdam Symmetry and Quantum Information Feruary 6, 018 Representation theory of S(), density operators, purification Lecture 7 Michael Walter, niversity of Amsterdam Last week, we learned the asic concepts of

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Solving Systems of Linear Equations Symbolically

Solving Systems of Linear Equations Symbolically " Solving Systems of Linear Equations Symolically Every day of the year, thousands of airline flights crisscross the United States to connect large and small cities. Each flight follows a plan filed with

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Bayesian inference with reliability methods without knowing the maximum of the likelihood function

Bayesian inference with reliability methods without knowing the maximum of the likelihood function Bayesian inference with reliaility methods without knowing the maximum of the likelihood function Wolfgang Betz a,, James L. Beck, Iason Papaioannou a, Daniel Strau a a Engineering Risk Analysis Group,

More information

Problem Selected Scores

Problem Selected Scores Statistics Ph.D. Qualifying Exam: Part II November 20, 2010 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. Problem 1 2 3 4 5 6 7 8 9 10 11 12 Selected

More information

10. Composite Hypothesis Testing. ECE 830, Spring 2014

10. Composite Hypothesis Testing. ECE 830, Spring 2014 10. Composite Hypothesis Testing ECE 830, Spring 2014 1 / 25 In many real world problems, it is difficult to precisely specify probability distributions. Our models for data may involve unknown parameters

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20

Problem Set 1. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 20 Problem Set MAS 6J/.6J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 0 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information

Probability Distributions: Continuous

Probability Distributions: Continuous Probability Distributions: Continuous INFO-2301: Quantitative Reasoning 2 Michael Paul and Jordan Boyd-Graber FEBRUARY 28, 2017 INFO-2301: Quantitative Reasoning 2 Paul and Boyd-Graber Probability Distributions:

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Chapter 4: Factor Analysis

Chapter 4: Factor Analysis Chapter 4: Factor Analysis In many studies, we may not be able to measure directly the variables of interest. We can merely collect data on other variables which may be related to the variables of interest.

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin March 23, 2011 1 Introduction 2 3 4 5 Outline Introduction 1 Introduction 2 3 4 5 Preliminaries Introduction Ordinary least squares

More information

TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION

TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION TIGHT BOUNDS FOR THE FIRST ORDER MARCUM Q-FUNCTION Jiangping Wang and Dapeng Wu Department of Electrical and Computer Engineering University of Florida, Gainesville, FL 3611 Correspondence author: Prof.

More information

(Multivariate) Gaussian (Normal) Probability Densities

(Multivariate) Gaussian (Normal) Probability Densities (Multivariate) Gaussian (Normal) Probability Densities Carl Edward Rasmussen, José Miguel Hernández-Lobato & Richard Turner April 20th, 2018 Rasmussen, Hernàndez-Lobato & Turner Gaussian Densities April

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

Math 152. Rumbos Fall Solutions to Assignment #12

Math 152. Rumbos Fall Solutions to Assignment #12 Math 52. umbos Fall 2009 Solutions to Assignment #2. Suppose that you observe n iid Bernoulli(p) random variables, denoted by X, X 2,..., X n. Find the LT rejection region for the test of H o : p p o versus

More information

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2

Expansion formula using properties of dot product (analogous to FOIL in algebra): u v 2 u v u v u u 2u v v v u 2 2u v v 2 Least squares: Mathematical theory Below we provide the "vector space" formulation, and solution, of the least squares prolem. While not strictly necessary until we ring in the machinery of matrix algera,

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information