Inferring from data. Theory of estimators

Size: px
Start display at page:

Download "Inferring from data. Theory of estimators"

Transcription

1 Inferring from data Theory of estimators 1

2 Estimators Estimator is any function of the data e(x) used to provide an estimate ( a measurement ) of an unknown parameter. Because estimators are functions of data, which are random variables, estimators themselves are random variables and therefore have their own probability distributions. Performances of an estimator are evaluated based on the properties of its distribution. 2

3 Classic properties of estimators Consistency (in probability). Desirable that the estimator e(x) of m converges in probability to m 8 lim N!1 p( m e(x) > )=0 Precision. Desirable that the variance of the estimator is minimal V (e(x)) = h e(x) he(x)i 2 i Bias. Desirable that the estimator is unbiased (b(m)=0) b(m) =h e(x) m i Distribution. Desirable that the distribution p(e(x); m) of the estimator is simple (possibly Gaussian) 3

4 What this is all about Consistency low-variance, unbiased estimator high-variance, unbiased estimator biased estimator true value true value of the parameter we are trying to measure 4

5 Comments bias Many estimators suffer from biases, which, in general depend on the parameter m being estimated. For an estimator e(x) of m, the bias b(m) is defined from E[e(x)] = h e(x) i = m + b(m) Typically biases are small wrt the variance. Issues, however, arise in combinations of biased estimates: the variance reduces but the bias remains and weights more. If the distribution p(x m) is known, the bias can be calculated explicitly. If the bias is independent of m (b(m) = b) then use another estimator u(x) = e(x) - b, which is unbiased and has same precision (variance) of e(x). If the bias depend on m, need an unbiased estimator of b (B(x)) to redefine u(x) = e(x) - B(x). The new estimator has greater variance than e(x), but loss in precision 5 is often smaller than bias.

6 Example bias correction w/ known distribution I have N points xi distributed as a Gaussian and use the following ML estimator to estimate its variance ˆ2 = 1 N NX x i =1 (x i x) 2 This estimator has a bias b = 2 /N and a variance Var(ˆ2) =2 4 N 1 N 2 So, I can rework an alternative estimator which has zero bias and a variance than that of the previous estimator s 2 = 1 N 1 NX (x i x) 2 x i =1 Var(s 2 )=2 4 1 N 1 which is only 1/N 2 larger 6

7 Example biases w/ unknown distributions In most practical cases, p(x m) is not well known, or the bias is hard to calculate explicitly. Estimated mass vs true mass Biases are studied by repeating the measurement on simulated samples and comparing results with input true values or applying the estimator in control samples for which results are known. If deviations O(variance) occur, correcting the results of the measurement by subtracting the bias is dangerous. Need confidence that simulated experiments reproduce all features of the data (but then also the source of the bias could probably be with identified and removed) When possible, work harder and suppress biases Bias vs true mass 2007 measurement of lepton+jets top-quark mass by CDF 7

8 Information An useful point of view for dealing with estimation and data reduction is the theory of Fisher s information (of some data on some parameter). Information should increase linearly with the number of observations. (doubling observations, leads to double information) Ronald A. Fisher ( ) be conditional to what we are interested to. Irrelevant data for the quantity to be estimated should provide no information. connect with precision: the greater the information, the better the precision. Any quantity with these properties is desirable in data reduction. Could think of pursuing methodologies that maximize reduction while minimizing information loss 8

9 Fisher information (If it exist) the Fisher information of an observation x on the parameter m, related by the likelihood p(xim) = Lx(m) is I x (m) =E # 2 log(lx If the parameter of interest is a vector of parameters, this generalizes to [I x (m)] ij = E log(lx log(l x j If (i) the possible values of x do not depend on m and (ii) the likelihood is twice differentiable and derivatives in m and integrals in x commute [I x (m)] ij = E 2 log(l x i m j Information is additive: information of N independent measurements is NIx 9

10 Comments variance Small variance is good it implies high precision. Can it be arbitrarily small at given number of observations N? No. The variance of an estimator m is limited by the Cramer-Rao inequality Harald Cramer ( ) Var( ˆm) =E[( ˆm E[ˆm]) 2 ] (1 + db/dm) 2 I ˆm (m) (1 + db/dm) 2 I x (m) where b = E[m ] - m is the bias of the estimator and Ix(m) is the Fisher information of the observation x on the parameter m C.R. Rao (1920 ) Because for N observations the Fisher information is proportional to N, for increasing number of measurements the variance of the estimator (that is the precision of the measurement ) does not decrease faster than 1/N 10

11 Comments minimum variance bound The minimum variance bound is a useful property to decide which estimator to use in a certain measurement. It also provides a practical and convenient way of estimating the best statistical resolution achievable in a quantity to measure, before actually carrying out the measurement. It suffices to have a decent simulation to generate the possible likelihood of the observations and apply the CR inequality, assuming equalities. 11

12 Efficiency and sufficiency When both inequalities are equalities, the estimator reach minimum variance and is called efficient. A condition for this to happen is that once the value m of the estimator is known, complete knowledge of all data x does not provide further information on parameter m. If that happens m is a sufficient statistics for m Trivial cases: x or any invertible f(x) are sufficient statistics More interesting are cases when the dimensionality of m is smaller than the dimensionality of x. There we have data reduction without information loss 12

13 Aside: Darmois theorem Given a likelihood L(m) = p(x m), not always can find a sufficient statistics with finite number of dimensions s independent of the number of observations N. For this to happen, the likelihood of a single measurement needs to belong to the exponential family L(m) =p(x m) =exp " sx i=1 i (x)a i (m)+ Georges Darmois ( ) (x)+c(m) A rather restrictive condition: in most cases reducing data observations x into a lower-dimensional estimator leads to a loss of information. # Nevertheless, data reduction is still often convenient if the information loss is moderate. 13

14 Estimator robustness Robustness of an estimator expresses the stability of its properties (mainly variance and bias) against variations of the shape of the likelihood p(x m). Rather important in practice because in most cases p(x m) is unknown, or known only approximately. A good approach is toward robustness is pick an estimator and a sufficiently broad class of p(x m); evaluate the maximum variance of the estimator over the space of all the p(x m) as figure of merit of the estimator performance. repeat for various estimators and choose that showing the minimum value of the maximum variance 14

15 Inferring from data Maximum likelihood method 15

16 ML estimator properties Call m the estimate of unknown parameter m obtained by finding the value of m that maximizes the likelihood p(x m) = Lx(m). Under weak hypotheses m is consistent. In addition, if Lx(m) is differentiable twice and the set of possible values for x does not depend on the value of m, then m is asymptotically efficient for number of observations N > the variance of the estimator E [ m - E 2 [m ] ] is the minimum possible asymptotically normal for number of observations N > the difference m -m is distributed as a Gaussian with variance proportional to 1/N There are many estimators, but the ML estimator is what you will use most often. Not necessarily the right one. It s just simple enough and has useful properties. 16

17 (Another) ML example Poisson Want to study a Poisson process and assume the pdf: Rather than maximising L, one can minimise -ln L. d dµ ln L(µ) ˆµ =0 p(j µ) = µj j! e µ = L(µ) d dµ (µ j ln µ +lnj!) = 1 j µ Given observation j, the ML estimator of the mean rate of success μ is μ = j 17

18 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ j =5 j 18

19 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 j 19

20 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 µ =5 j 20

21 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 µ =5 µ = 20 j 21

22 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 µ =5 µ j =5 µ = 20 j 22

23 ML estimator variance Seen a few examples of ML estimates: given observations x0, and assuming L(m)= p(x0 m) known, we learned that the value m that maximizes L offers an estimate of the true value of m that has some attractive properties. OK, m is the central value of our measurement what s the uncertainty? Depends on the estimator s variance, which is also part of the inference. Analytical calculation of E[(m - E[m ]) 2 ]. Requires knowledge of the analytical form of p(x m) and integrals should not be intractable. Rarely used except for simple textbook examples (Poisson, exponential, Gaussian ) Approximation to the minimum variance bound. Most commonly used, good compromise in simple realistic applications. Brute force. Ultimate solution, accurate but work intensive 23

24 Approximating the variance The minimum variance bound offers an approximated estimate of the variance as the curvature (2nd derivative) of the log-likelihood at its maximum. [I x (m)] ij = E 2 log(l x i m j ˆV (ˆm) 1/E 2 ln 2 ln 2 1 m=ˆm This is what most common minimisation packages (like MINUIT) will give you as uncertainty of the ML estimate. No guarantee that for N finite the ML estimator has reached minimum variance, but in many cases it is close enough 24

25 Approximating the variance (graphical -1D) Expanding in Taylor at the maximum ln L(m) =lnl(ˆm)+(m ln m=ˆm ln 2 m=ˆm (m ˆm) First term is Lmax. Second is zero. In the third term use the minimum variance bound to make the replacement 2 ln 2 m=ˆm! 1ˆ2 ln L(m) ln L max (m ˆm) 2 2 ˆ2 ln L(ˆm ± ˆ) ln L max 1 2 and get Values of the parameter corresponding to a decrease of ln(l) by 1/2 units approximate the boundaries of 1-sigma uncertainties Uncertainty central value 25

26 Brute force The safest and most robust way of achieving this is to look at the width of the ML estimator distribution obtained from repeated measurements on independent samples It implies generating a large quantity of simulated experiments and repeat the inference in each to study the distributions of the estimator. This is usually Gaussian, for N large enough and the variance provides a measure of dispersion. This requires a lot of work, and often is not required. But when the previous approximations fail, this is the only was of getting your estimates right 26

27 ML caveats ML properties only hold for infinite observations. For N finite, ML could show biases* and its distribution is unknown. The number of observations N needed to approach asymptotic regime depends on the likelihood. Low-dimensional, regular likelihoods get already asymptotic with O(10) observations; others need much more. ML estimator won t tell how good is your fit, that is, whether the assumed p(x m) is a reasonable model of the observed data or not. * this is why it is wrong to take the arithmetic mean of results obtained from multiple ML estimators based on very small samples each 27

28 Simulation! Simulation! Simulation! The variety of issues and pathologies real likelihoods show is so broad that it is unrealistic to devise specific guidelines to address them. Much of this is art/blackmagic based on clear understanding of fundamentals and previous experience. The one, general recommendation is to use extensively simplified simulated experiments ( toy Monte Carlo ) to understand the distribution of the ML estimators and their properties prior to applying them to data. This is of utmost importance and is usually done by (1) choosing a plausible true value of the relevant parameter m (2) feeding it into L(m) and generate several sets of simulated data x from random numbers distributed according to p(x m) (3) maximize the likelihood in each set and look at the distribution of the estimator (4) repeat for a few others choices of true value m (important and often overlooked) 28

29 A standard example - pulls Toys per µ = ± 0.05 σ = 1.01 ± (x fit x true )/σ fit ML estimator of y perhaps biased. Uncertainty seems OK Toys per ML estimator of x unbiased. Uncertainty seems OK µ = 0.08 ± 0.04 σ = 1.00 ± (y fit y true )/σ fit Each entry is a simulated experiment, generated with the same set of true parameters. Distribution of the difference between ML estimate and the true value of the parameter, divided by the estimate of the std dev. 29

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according

More information

Primer on statistics:

Primer on statistics: Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1

Terminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1 Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Statistics and Data Analysis

Statistics and Data Analysis Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data

More information

Modern Methods of Data Analysis - WS 07/08

Modern Methods of Data Analysis - WS 07/08 Modern Methods of Data Analysis Lecture VII (26.11.07) Contents: Maximum Likelihood (II) Exercise: Quality of Estimators Assume hight of students is Gaussian distributed. You measure the size of N students.

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

Estimation and Detection

Estimation and Detection stimation and Detection Lecture 2: Cramér-Rao Lower Bound Dr. ir. Richard C. Hendriks & Dr. Sundeep P. Chepuri 7//207 Remember: Introductory xample Given a process (DC in noise): x[n]=a + w[n], n=0,,n,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Proof In the CR proof. and

Proof In the CR proof. and Question Under what conditions will we be able to attain the Cramér-Rao bound and find a MVUE? Lecture 4 - Consequences of the Cramér-Rao Lower Bound. Searching for a MVUE. Rao-Blackwell Theorem, Lehmann-Scheffé

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Estimators as Random Variables

Estimators as Random Variables Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Introduction to Error Analysis

Introduction to Error Analysis Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction

Estimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS

ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS TABLE OF CONTENTS INTRODUCTORY NOTE NOTES AND PROBLEM SETS Section 1 - Point Estimation 1 Problem Set 1 15 Section 2 - Confidence Intervals and

More information

Part 4: Multi-parameter and normal models

Part 4: Multi-parameter and normal models Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation

More information

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question

Reminders. Thought questions should be submitted on eclass. Please list the section related to the thought question Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood

More information

Monte Carlo Simulations

Monte Carlo Simulations Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!

Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses

More information

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER. Two hours MATH38181 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER EXTREME VALUES AND FINANCIAL RISK Examiner: Answer any FOUR

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu

More information

Signal detection theory

Signal detection theory Signal detection theory z p[r -] p[r +] - + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z)) Is there a better test to use than r? z p[r -] p[r +] - + The optimal

More information

Parameter Estimation and Fitting to Data

Parameter Estimation and Fitting to Data Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College

Distributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic

More information

Parametric Models: from data to models

Parametric Models: from data to models Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:

More information

Accuracy & confidence

Accuracy & confidence Accuracy & confidence Most of course so far: estimating stuff from data Today: how much do we trust our estimates? Last week: one answer to this question prove ahead of time that training set estimate

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Reduction of Variance. Importance Sampling

Reduction of Variance. Importance Sampling Reduction of Variance As we discussed earlier, the statistical error goes as: error = sqrt(variance/computer time). DEFINE: Efficiency = = 1/vT v = error of mean and T = total CPU time How can you make

More information

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn

Parameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Cosmological parameters A branch of modern cosmological research focuses on measuring

More information

4.2 Estimation on the boundary of the parameter space

4.2 Estimation on the boundary of the parameter space Chapter 4 Non-standard inference As we mentioned in Chapter the the log-likelihood ratio statistic is useful in the context of statistical testing because typically it is pivotal (does not depend on any

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit

Statistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution

More information

Introduction to Reliability Theory (part 2)

Introduction to Reliability Theory (part 2) Introduction to Reliability Theory (part 2) Frank Coolen UTOPIAE Training School II, Durham University 3 July 2018 (UTOPIAE) Introduction to Reliability Theory 1 / 21 Outline Statistical issues Software

More information

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions

STATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature

More information

Estimation. Max Welling. California Institute of Technology Pasadena, CA

Estimation. Max Welling. California Institute of Technology Pasadena, CA Preliminaries Estimation Max Welling California Institute of Technology 36-93 Pasadena, CA 925 welling@vision.caltech.edu Let x denote a random variable and p(x) its probability density. x may be multidimensional

More information

Motivation CRLB and Regularity Conditions CRLLB A New Bound Examples Conclusions

Motivation CRLB and Regularity Conditions CRLLB A New Bound Examples Conclusions A Survey of Some Recent Results on the CRLB for Parameter Estimation and its Extension Yaakov 1 1 Electrical and Computer Engineering Department University of Connecticut, Storrs, CT A Survey of Some Recent

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits

Lecture for Week 2 (Secs. 1.3 and ) Functions and Limits Lecture for Week 2 (Secs. 1.3 and 2.2 2.3) Functions and Limits 1 First let s review what a function is. (See Sec. 1 of Review and Preview.) The best way to think of a function is as an imaginary machine,

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Statistical Methods in Particle Physics

Statistical Methods in Particle Physics Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Statistics for the LHC Lecture 1: Introduction

Statistics for the LHC Lecture 1: Introduction Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

10-704: Information Processing and Learning Fall Lecture 24: Dec 7

10-704: Information Processing and Learning Fall Lecture 24: Dec 7 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of

More information

Hypothesis Testing: The Generalized Likelihood Ratio Test

Hypothesis Testing: The Generalized Likelihood Ratio Test Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

MATH4427 Notebook 2 Fall Semester 2017/2018

MATH4427 Notebook 2 Fall Semester 2017/2018 MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................

More information

Lecture 18: Learning probabilistic models

Lecture 18: Learning probabilistic models Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.

More information

Basic concepts in estimation

Basic concepts in estimation Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester Physics 403 Credible Intervals, Confidence Intervals, and Limits Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Summarizing Parameters with a Range Bayesian

More information

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection

SGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The

More information

1 Degree distributions and data

1 Degree distributions and data 1 Degree distributions and data A great deal of effort is often spent trying to identify what functional form best describes the degree distribution of a network, particularly the upper tail of that distribution.

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland

Irr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS

More information

Physics 509: Bootstrap and Robust Parameter Estimation

Physics 509: Bootstrap and Robust Parameter Estimation Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept

More information

Interval Estimation III: Fisher's Information & Bootstrapping

Interval Estimation III: Fisher's Information & Bootstrapping Interval Estimation III: Fisher's Information & Bootstrapping Frequentist Confidence Interval Will consider four approaches to estimating confidence interval Standard Error (+/- 1.96 se) Likelihood Profile

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

ECE 275B Homework # 1 Solutions Version Winter 2015

ECE 275B Homework # 1 Solutions Version Winter 2015 ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2

More information