Inferring from data. Theory of estimators
|
|
- Hope Henderson
- 5 years ago
- Views:
Transcription
1 Inferring from data Theory of estimators 1
2 Estimators Estimator is any function of the data e(x) used to provide an estimate ( a measurement ) of an unknown parameter. Because estimators are functions of data, which are random variables, estimators themselves are random variables and therefore have their own probability distributions. Performances of an estimator are evaluated based on the properties of its distribution. 2
3 Classic properties of estimators Consistency (in probability). Desirable that the estimator e(x) of m converges in probability to m 8 lim N!1 p( m e(x) > )=0 Precision. Desirable that the variance of the estimator is minimal V (e(x)) = h e(x) he(x)i 2 i Bias. Desirable that the estimator is unbiased (b(m)=0) b(m) =h e(x) m i Distribution. Desirable that the distribution p(e(x); m) of the estimator is simple (possibly Gaussian) 3
4 What this is all about Consistency low-variance, unbiased estimator high-variance, unbiased estimator biased estimator true value true value of the parameter we are trying to measure 4
5 Comments bias Many estimators suffer from biases, which, in general depend on the parameter m being estimated. For an estimator e(x) of m, the bias b(m) is defined from E[e(x)] = h e(x) i = m + b(m) Typically biases are small wrt the variance. Issues, however, arise in combinations of biased estimates: the variance reduces but the bias remains and weights more. If the distribution p(x m) is known, the bias can be calculated explicitly. If the bias is independent of m (b(m) = b) then use another estimator u(x) = e(x) - b, which is unbiased and has same precision (variance) of e(x). If the bias depend on m, need an unbiased estimator of b (B(x)) to redefine u(x) = e(x) - B(x). The new estimator has greater variance than e(x), but loss in precision 5 is often smaller than bias.
6 Example bias correction w/ known distribution I have N points xi distributed as a Gaussian and use the following ML estimator to estimate its variance ˆ2 = 1 N NX x i =1 (x i x) 2 This estimator has a bias b = 2 /N and a variance Var(ˆ2) =2 4 N 1 N 2 So, I can rework an alternative estimator which has zero bias and a variance than that of the previous estimator s 2 = 1 N 1 NX (x i x) 2 x i =1 Var(s 2 )=2 4 1 N 1 which is only 1/N 2 larger 6
7 Example biases w/ unknown distributions In most practical cases, p(x m) is not well known, or the bias is hard to calculate explicitly. Estimated mass vs true mass Biases are studied by repeating the measurement on simulated samples and comparing results with input true values or applying the estimator in control samples for which results are known. If deviations O(variance) occur, correcting the results of the measurement by subtracting the bias is dangerous. Need confidence that simulated experiments reproduce all features of the data (but then also the source of the bias could probably be with identified and removed) When possible, work harder and suppress biases Bias vs true mass 2007 measurement of lepton+jets top-quark mass by CDF 7
8 Information An useful point of view for dealing with estimation and data reduction is the theory of Fisher s information (of some data on some parameter). Information should increase linearly with the number of observations. (doubling observations, leads to double information) Ronald A. Fisher ( ) be conditional to what we are interested to. Irrelevant data for the quantity to be estimated should provide no information. connect with precision: the greater the information, the better the precision. Any quantity with these properties is desirable in data reduction. Could think of pursuing methodologies that maximize reduction while minimizing information loss 8
9 Fisher information (If it exist) the Fisher information of an observation x on the parameter m, related by the likelihood p(xim) = Lx(m) is I x (m) =E # 2 log(lx If the parameter of interest is a vector of parameters, this generalizes to [I x (m)] ij = E log(lx log(l x j If (i) the possible values of x do not depend on m and (ii) the likelihood is twice differentiable and derivatives in m and integrals in x commute [I x (m)] ij = E 2 log(l x i m j Information is additive: information of N independent measurements is NIx 9
10 Comments variance Small variance is good it implies high precision. Can it be arbitrarily small at given number of observations N? No. The variance of an estimator m is limited by the Cramer-Rao inequality Harald Cramer ( ) Var( ˆm) =E[( ˆm E[ˆm]) 2 ] (1 + db/dm) 2 I ˆm (m) (1 + db/dm) 2 I x (m) where b = E[m ] - m is the bias of the estimator and Ix(m) is the Fisher information of the observation x on the parameter m C.R. Rao (1920 ) Because for N observations the Fisher information is proportional to N, for increasing number of measurements the variance of the estimator (that is the precision of the measurement ) does not decrease faster than 1/N 10
11 Comments minimum variance bound The minimum variance bound is a useful property to decide which estimator to use in a certain measurement. It also provides a practical and convenient way of estimating the best statistical resolution achievable in a quantity to measure, before actually carrying out the measurement. It suffices to have a decent simulation to generate the possible likelihood of the observations and apply the CR inequality, assuming equalities. 11
12 Efficiency and sufficiency When both inequalities are equalities, the estimator reach minimum variance and is called efficient. A condition for this to happen is that once the value m of the estimator is known, complete knowledge of all data x does not provide further information on parameter m. If that happens m is a sufficient statistics for m Trivial cases: x or any invertible f(x) are sufficient statistics More interesting are cases when the dimensionality of m is smaller than the dimensionality of x. There we have data reduction without information loss 12
13 Aside: Darmois theorem Given a likelihood L(m) = p(x m), not always can find a sufficient statistics with finite number of dimensions s independent of the number of observations N. For this to happen, the likelihood of a single measurement needs to belong to the exponential family L(m) =p(x m) =exp " sx i=1 i (x)a i (m)+ Georges Darmois ( ) (x)+c(m) A rather restrictive condition: in most cases reducing data observations x into a lower-dimensional estimator leads to a loss of information. # Nevertheless, data reduction is still often convenient if the information loss is moderate. 13
14 Estimator robustness Robustness of an estimator expresses the stability of its properties (mainly variance and bias) against variations of the shape of the likelihood p(x m). Rather important in practice because in most cases p(x m) is unknown, or known only approximately. A good approach is toward robustness is pick an estimator and a sufficiently broad class of p(x m); evaluate the maximum variance of the estimator over the space of all the p(x m) as figure of merit of the estimator performance. repeat for various estimators and choose that showing the minimum value of the maximum variance 14
15 Inferring from data Maximum likelihood method 15
16 ML estimator properties Call m the estimate of unknown parameter m obtained by finding the value of m that maximizes the likelihood p(x m) = Lx(m). Under weak hypotheses m is consistent. In addition, if Lx(m) is differentiable twice and the set of possible values for x does not depend on the value of m, then m is asymptotically efficient for number of observations N > the variance of the estimator E [ m - E 2 [m ] ] is the minimum possible asymptotically normal for number of observations N > the difference m -m is distributed as a Gaussian with variance proportional to 1/N There are many estimators, but the ML estimator is what you will use most often. Not necessarily the right one. It s just simple enough and has useful properties. 16
17 (Another) ML example Poisson Want to study a Poisson process and assume the pdf: Rather than maximising L, one can minimise -ln L. d dµ ln L(µ) ˆµ =0 p(j µ) = µj j! e µ = L(µ) d dµ (µ j ln µ +lnj!) = 1 j µ Given observation j, the ML estimator of the mean rate of success μ is μ = j 17
18 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ j =5 j 18
19 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 j 19
20 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 µ =5 j 20
21 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 µ =5 µ = 20 j 21
22 Poisson illustrated Assume one measurement of j=5 out of a Poisson distribution with no background. Data are fixed, parameter mu varies. p(j µ) = µj j! e µ = L(µ) L(µ j = 5) = µ5 5! e µ µ =0.5 j =5 µ =5 µ j =5 µ = 20 j 22
23 ML estimator variance Seen a few examples of ML estimates: given observations x0, and assuming L(m)= p(x0 m) known, we learned that the value m that maximizes L offers an estimate of the true value of m that has some attractive properties. OK, m is the central value of our measurement what s the uncertainty? Depends on the estimator s variance, which is also part of the inference. Analytical calculation of E[(m - E[m ]) 2 ]. Requires knowledge of the analytical form of p(x m) and integrals should not be intractable. Rarely used except for simple textbook examples (Poisson, exponential, Gaussian ) Approximation to the minimum variance bound. Most commonly used, good compromise in simple realistic applications. Brute force. Ultimate solution, accurate but work intensive 23
24 Approximating the variance The minimum variance bound offers an approximated estimate of the variance as the curvature (2nd derivative) of the log-likelihood at its maximum. [I x (m)] ij = E 2 log(l x i m j ˆV (ˆm) 1/E 2 ln 2 ln 2 1 m=ˆm This is what most common minimisation packages (like MINUIT) will give you as uncertainty of the ML estimate. No guarantee that for N finite the ML estimator has reached minimum variance, but in many cases it is close enough 24
25 Approximating the variance (graphical -1D) Expanding in Taylor at the maximum ln L(m) =lnl(ˆm)+(m ln m=ˆm ln 2 m=ˆm (m ˆm) First term is Lmax. Second is zero. In the third term use the minimum variance bound to make the replacement 2 ln 2 m=ˆm! 1ˆ2 ln L(m) ln L max (m ˆm) 2 2 ˆ2 ln L(ˆm ± ˆ) ln L max 1 2 and get Values of the parameter corresponding to a decrease of ln(l) by 1/2 units approximate the boundaries of 1-sigma uncertainties Uncertainty central value 25
26 Brute force The safest and most robust way of achieving this is to look at the width of the ML estimator distribution obtained from repeated measurements on independent samples It implies generating a large quantity of simulated experiments and repeat the inference in each to study the distributions of the estimator. This is usually Gaussian, for N large enough and the variance provides a measure of dispersion. This requires a lot of work, and often is not required. But when the previous approximations fail, this is the only was of getting your estimates right 26
27 ML caveats ML properties only hold for infinite observations. For N finite, ML could show biases* and its distribution is unknown. The number of observations N needed to approach asymptotic regime depends on the likelihood. Low-dimensional, regular likelihoods get already asymptotic with O(10) observations; others need much more. ML estimator won t tell how good is your fit, that is, whether the assumed p(x m) is a reasonable model of the observed data or not. * this is why it is wrong to take the arithmetic mean of results obtained from multiple ML estimators based on very small samples each 27
28 Simulation! Simulation! Simulation! The variety of issues and pathologies real likelihoods show is so broad that it is unrealistic to devise specific guidelines to address them. Much of this is art/blackmagic based on clear understanding of fundamentals and previous experience. The one, general recommendation is to use extensively simplified simulated experiments ( toy Monte Carlo ) to understand the distribution of the ML estimators and their properties prior to applying them to data. This is of utmost importance and is usually done by (1) choosing a plausible true value of the relevant parameter m (2) feeding it into L(m) and generate several sets of simulated data x from random numbers distributed according to p(x m) (3) maximize the likelihood in each set and look at the distribution of the estimator (4) repeat for a few others choices of true value m (important and often overlooked) 28
29 A standard example - pulls Toys per µ = ± 0.05 σ = 1.01 ± (x fit x true )/σ fit ML estimator of y perhaps biased. Uncertainty seems OK Toys per ML estimator of x unbiased. Uncertainty seems OK µ = 0.08 ± 0.04 σ = 1.00 ± (y fit y true )/σ fit Each entry is a simulated experiment, generated with the same set of true parameters. Distribution of the difference between ML estimate and the true value of the parameter, divided by the estimate of the std dev. 29
Modern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture VIc (19.11.07) Contents: Maximum Likelihood Fit Maximum Likelihood (I) Assume N measurements of a random variable Assume them to be independent and distributed according
More informationPrimer on statistics:
Primer on statistics: MLE, Confidence Intervals, and Hypothesis Testing ryan.reece@gmail.com http://rreece.github.io/ Insight Data Science - AI Fellows Workshop Feb 16, 018 Outline 1. Maximum likelihood
More informationStatistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That
Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval
More informationTerminology Suppose we have N observations {x(n)} N 1. Estimators as Random Variables. {x(n)} N 1
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maximum likelihood Consistency Confidence intervals Properties of the mean estimator Properties of the
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationPractice Problems Section Problems
Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,
More informationStatistics and Data Analysis
Statistics and Data Analysis The Crash Course Physics 226, Fall 2013 "There are three kinds of lies: lies, damned lies, and statistics. Mark Twain, allegedly after Benjamin Disraeli Statistics and Data
More informationModern Methods of Data Analysis - WS 07/08
Modern Methods of Data Analysis Lecture VII (26.11.07) Contents: Maximum Likelihood (II) Exercise: Quality of Estimators Assume hight of students is Gaussian distributed. You measure the size of N students.
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationA Few Notes on Fisher Information (WIP)
A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationStatistical inference
Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall
More informationEstimation and Detection
stimation and Detection Lecture 2: Cramér-Rao Lower Bound Dr. ir. Richard C. Hendriks & Dr. Sundeep P. Chepuri 7//207 Remember: Introductory xample Given a process (DC in noise): x[n]=a + w[n], n=0,,n,
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationProof In the CR proof. and
Question Under what conditions will we be able to attain the Cramér-Rao bound and find a MVUE? Lecture 4 - Consequences of the Cramér-Rao Lower Bound. Searching for a MVUE. Rao-Blackwell Theorem, Lehmann-Scheffé
More informationECE531 Lecture 10b: Maximum Likelihood Estimation
ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationEstimators as Random Variables
Estimation Theory Overview Properties Bias, Variance, and Mean Square Error Cramér-Rao lower bound Maimum likelihood Consistency Confidence intervals Properties of the mean estimator Introduction Up until
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationStatistics. Lecture 4 August 9, 2000 Frank Porter Caltech. 1. The Fundamentals; Point Estimation. 2. Maximum Likelihood, Least Squares and All That
Statistics Lecture 4 August 9, 2000 Frank Porter Caltech The plan for these lectures: 1. The Fundamentals; Point Estimation 2. Maximum Likelihood, Least Squares and All That 3. What is a Confidence Interval?
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationEstimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators
Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationIntroduction to Error Analysis
Introduction to Error Analysis Part 1: the Basics Andrei Gritsan based on lectures by Petar Maksimović February 1, 2010 Overview Definitions Reporting results and rounding Accuracy vs precision systematic
More informationReview. December 4 th, Review
December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter
More informationp(z)
Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and
More informationEstimation Tasks. Short Course on Image Quality. Matthew A. Kupinski. Introduction
Estimation Tasks Short Course on Image Quality Matthew A. Kupinski Introduction Section 13.3 in B&M Keep in mind the similarities between estimation and classification Image-quality is a statistical concept
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More informationACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS
ACTEX CAS EXAM 3 STUDY GUIDE FOR MATHEMATICAL STATISTICS TABLE OF CONTENTS INTRODUCTORY NOTE NOTES AND PROBLEM SETS Section 1 - Point Estimation 1 Problem Set 1 15 Section 2 - Confidence Intervals and
More informationPart 4: Multi-parameter and normal models
Part 4: Multi-parameter and normal models 1 The normal model Perhaps the most useful (or utilized) probability model for data analysis is the normal distribution There are several reasons for this, e.g.,
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Temperature fluctuations Variance at multipole l (angle ~180o/l) C. Porciani Estimation
More informationReminders. Thought questions should be submitted on eclass. Please list the section related to the thought question
Linear regression Reminders Thought questions should be submitted on eclass Please list the section related to the thought question If it is a more general, open-ended question not exactly related to a
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationMachine Learning 4771
Machine Learning 4771 Instructor: Tony Jebara Topic 11 Maximum Likelihood as Bayesian Inference Maximum A Posteriori Bayesian Gaussian Estimation Why Maximum Likelihood? So far, assumed max (log) likelihood
More informationMonte Carlo Simulations
Monte Carlo Simulations What are Monte Carlo Simulations and why ones them? Pseudo Random Number generators Creating a realization of a general PDF The Bootstrap approach A real life example: LOFAR simulations
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationSTAT 730 Chapter 4: Estimation
STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum
More informationParameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn!
Parameter estimation! and! forecasting! Cristiano Porciani! AIfA, Uni-Bonn! Questions?! C. Porciani! Estimation & forecasting! 2! Cosmological parameters! A branch of modern cosmological research focuses
More informationTwo hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.
Two hours MATH38181 To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER EXTREME VALUES AND FINANCIAL RISK Examiner: Answer any FOUR
More informationSTATS 200: Introduction to Statistical Inference. Lecture 29: Course review
STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationSYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions
SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu
More informationSignal detection theory
Signal detection theory z p[r -] p[r +] - + Role of priors: Find z by maximizing P[correct] = p[+] b(z) + p[-](1 a(z)) Is there a better test to use than r? z p[r -] p[r +] - + The optimal
More informationParameter Estimation and Fitting to Data
Parameter Estimation and Fitting to Data Parameter estimation Maximum likelihood Least squares Goodness-of-fit Examples Elton S. Smith, Jefferson Lab 1 Parameter estimation Properties of estimators 3 An
More informationSTATISTICS SYLLABUS UNIT I
STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution
More informationDistributed Estimation, Information Loss and Exponential Families. Qiang Liu Department of Computer Science Dartmouth College
Distributed Estimation, Information Loss and Exponential Families Qiang Liu Department of Computer Science Dartmouth College Statistical Learning / Estimation Learning generative models from data Topic
More informationParametric Models: from data to models
Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:
More informationAccuracy & confidence
Accuracy & confidence Most of course so far: estimating stuff from data Today: how much do we trust our estimates? Last week: one answer to this question prove ahead of time that training set estimate
More informationChapter 3. Point Estimation. 3.1 Introduction
Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationsimple if it completely specifies the density of x
3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely
More informationReduction of Variance. Importance Sampling
Reduction of Variance As we discussed earlier, the statistical error goes as: error = sqrt(variance/computer time). DEFINE: Efficiency = = 1/vT v = error of mean and T = total CPU time How can you make
More informationReview Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the
Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationParameter estimation and forecasting. Cristiano Porciani AIfA, Uni-Bonn
Parameter estimation and forecasting Cristiano Porciani AIfA, Uni-Bonn Questions? C. Porciani Estimation & forecasting 2 Cosmological parameters A branch of modern cosmological research focuses on measuring
More information4.2 Estimation on the boundary of the parameter space
Chapter 4 Non-standard inference As we mentioned in Chapter the the log-likelihood ratio statistic is useful in the context of statistical testing because typically it is pivotal (does not depend on any
More informationSpace Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses
Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationStatistics. Lent Term 2015 Prof. Mark Thomson. 2: The Gaussian Limit
Statistics Lent Term 2015 Prof. Mark Thomson Lecture 2 : The Gaussian Limit Prof. M.A. Thomson Lent Term 2015 29 Lecture Lecture Lecture Lecture 1: Back to basics Introduction, Probability distribution
More informationIntroduction to Reliability Theory (part 2)
Introduction to Reliability Theory (part 2) Frank Coolen UTOPIAE Training School II, Durham University 3 July 2018 (UTOPIAE) Introduction to Reliability Theory 1 / 21 Outline Statistical issues Software
More informationSTATISTICS OF OBSERVATIONS & SAMPLING THEORY. Parent Distributions
ASTR 511/O Connell Lec 6 1 STATISTICS OF OBSERVATIONS & SAMPLING THEORY References: Bevington Data Reduction & Error Analysis for the Physical Sciences LLM: Appendix B Warning: the introductory literature
More informationEstimation. Max Welling. California Institute of Technology Pasadena, CA
Preliminaries Estimation Max Welling California Institute of Technology 36-93 Pasadena, CA 925 welling@vision.caltech.edu Let x denote a random variable and p(x) its probability density. x may be multidimensional
More informationMotivation CRLB and Regularity Conditions CRLLB A New Bound Examples Conclusions
A Survey of Some Recent Results on the CRLB for Parameter Estimation and its Extension Yaakov 1 1 Electrical and Computer Engineering Department University of Connecticut, Storrs, CT A Survey of Some Recent
More informationLectures 5 & 6: Hypothesis Testing
Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationLecture for Week 2 (Secs. 1.3 and ) Functions and Limits
Lecture for Week 2 (Secs. 1.3 and 2.2 2.3) Functions and Limits 1 First let s review what a function is. (See Sec. 1 of Review and Preview.) The best way to think of a function is as an imaginary machine,
More informationTheory of Maximum Likelihood Estimation. Konstantin Kashin
Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical
More informationStatistical Methods in Particle Physics
Statistical Methods in Particle Physics Lecture 11 January 7, 2013 Silvia Masciocchi, GSI Darmstadt s.masciocchi@gsi.de Winter Semester 2012 / 13 Outline How to communicate the statistical uncertainty
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationStatistics for the LHC Lecture 1: Introduction
Statistics for the LHC Lecture 1: Introduction Academic Training Lectures CERN, 14 17 June, 2010 indico.cern.ch/conferencedisplay.py?confid=77830 Glen Cowan Physics Department Royal Holloway, University
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More informationHypothesis Testing: The Generalized Likelihood Ratio Test
Hypothesis Testing: The Generalized Likelihood Ratio Test Consider testing the hypotheses H 0 : θ Θ 0 H 1 : θ Θ \ Θ 0 Definition: The Generalized Likelihood Ratio (GLR Let L(θ be a likelihood for a random
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationMATH4427 Notebook 2 Fall Semester 2017/2018
MATH4427 Notebook 2 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationLecture 18: Learning probabilistic models
Lecture 8: Learning probabilistic models Roger Grosse Overview In the first half of the course, we introduced backpropagation, a technique we used to train neural nets to minimize a variety of cost functions.
More informationBasic concepts in estimation
Basic concepts in estimation Random and nonrandom parameters Definitions of estimates ML Maimum Lielihood MAP Maimum A Posteriori LS Least Squares MMS Minimum Mean square rror Measures of quality of estimates
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationPhysics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester
Physics 403 Credible Intervals, Confidence Intervals, and Limits Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Summarizing Parameters with a Range Bayesian
More informationSGN Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection
SG 21006 Advanced Signal Processing: Lecture 8 Parameter estimation for AR and MA models. Model order selection Ioan Tabus Department of Signal Processing Tampere University of Technology Finland 1 / 28
More information[y i α βx i ] 2 (2) Q = i=1
Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationPhysics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester
Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More information1 Degree distributions and data
1 Degree distributions and data A great deal of effort is often spent trying to identify what functional form best describes the degree distribution of a network, particularly the upper tail of that distribution.
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationHT Introduction. P(X i = x i ) = e λ λ x i
MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationIrr. Statistical Methods in Experimental Physics. 2nd Edition. Frederick James. World Scientific. CERN, Switzerland
Frederick James CERN, Switzerland Statistical Methods in Experimental Physics 2nd Edition r i Irr 1- r ri Ibn World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI CONTENTS
More informationPhysics 509: Bootstrap and Robust Parameter Estimation
Physics 509: Bootstrap and Robust Parameter Estimation Scott Oser Lecture #20 Physics 509 1 Nonparametric parameter estimation Question: what error estimate should you assign to the slope and intercept
More informationInterval Estimation III: Fisher's Information & Bootstrapping
Interval Estimation III: Fisher's Information & Bootstrapping Frequentist Confidence Interval Will consider four approaches to estimating confidence interval Standard Error (+/- 1.96 se) Likelihood Profile
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationECE 275B Homework # 1 Solutions Version Winter 2015
ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2
More information