Methods of Estimation
|
|
- Alexander McDaniel
- 6 years ago
- Views:
Transcription
1 Methods of Estimation MIT Dr. Kempthorne Spring
2 Outline Methods of Estimation I 1 Methods of Estimation I 2
3 X X, X P P = {P θ, θ Θ}. Problem: Finding a function θˆ(x ) which is close to θ. Consider ρ : X Θ R. and define D(θ 0, θ) to measure the discrepancy between θ and the true value θ 0. D(θ 0, θ) = E θ0 ρ(x, θ). As a discrepancy measure, D makes sense if the value of θ minimizing the function is θ = θ 0. If P θ0 were true, and we knew D(θ 0, θ), we could obtain θ 0 as the minimizer. Instead of observing D(θ 0, θ), we observe ρ(x, θ). ρ(, ) is a contrast function ˆθ(X ) is a minimum-contrast estimate. 3
4 The definition extends to Euclidean Θ R d. θ 0 an interior point of Θ. Smooth mapping: θ D(θ 0, θ). θ = θ 0 solves ' θ D(θ 0, θ) = 0. where ' θ = (,..., θ 1 θ ) T d Substitute ρ(x, θ) for D(θ 0, θ) and solve ˆ ' θ ρ(x, θ) = 0 at θ = θ. 4
5 Estimating Equations: Ψ : X R d R d, where Ψ = (ψ 1,..., ψ d ) T. For every θ 0 Θ, the expectation of Ψ given P θ0 has a unique solution V (θ 0, θ) = E θ0 [Ψ(X, θ)] = 0 at θ = θ 0. 5
6 Example Least Squares. µ(z) = g(β, z), β R d. x = {(z i, Y i ) : 1 i n}, where Y 1,..., Y n are independent. 1 Define ρ(x, β) = Y µ 2 = n i=1[y i g(β, z i )] 2. Consider Y i = µ(z i ) + E i, where µ(z i ) = g(β, z i ) and the E i are iid N(0, σ 2 0 ). Then, β parametrizes the model and we can write: D(β 0, β) = E β0 ρ(x 1, β) = nσ2 n 0 + i=1[g(β 0, z i ) g(β, z i )] 2 ]. This is minimized by β = β 0 and uniquely so iff β identifiable. 6
7 ˆ The least-squares estimate β minimizes ρ(x, β). Conditions to guarantee existence of βˆ: Continuity of g(, z i ). Minimum of ρ(x, ) existing on compact set {β} e.g., lim g(β, z i ) =. β If g(β, z i ) is differentiable in β, then βˆ satisfies the Normal Equations obtained by 1 taking partial derivatives of n ρ(x, β) = Y µ 2 = i=1[y i g(β, z i )] 2 and solving: ρ(x, β) β j = 0 7
8 1 n ρ(x, β) = Y µ 2 = [Y i g(β, z i )] 2 i=1 Solve: ρ(x, β) β j = 0 n g(β, z i ) 2[Y i g(β, z i )] ( 1) β j = 0 i=1 n n g(β, z i ) g(β, z i ) Y i g(β, z i ) = 0 β j β j i=1 i=1 8
9 Linear case: g(β, z i ) = 1 d j=1 z ij β = z T j i β ρ(x, β) = 0 β j n n g(β, z i ) g(β, z i ) Y i g(β, z i ) = 0 β j β j i=1 i=1 n n T z ij Y i z i,j (z β) = 0 i=1 i=1 n d n i z ij Y i z i,j z i,k β k = 0, j = 1,..., d i=1 k=1 i=1 Z T D Y ZT D Z D β = 0 where Z D is the (n d) design matrix with (i, j) element z i,j 9
10 Note: Least Squares exemplifies minimum contrast and estimating equation methodology. Distribution assumptions are not necessary to motivate the estimate as a mathematical approximation. 10
11 Method of Moments Methods of Estimation I Method of Moments X 1,..., X n iid X P θ, θ R d. µ 1 (θ), µ 2 (θ),..., µ d (θ): µ j (θ) = µ j = E [X j θ] the jth moment of X. Sample moments: n µˆj = X j i, j = 1,..., d. i=1 Method of Moments: Solve for θ in the system of equations µ 1 (θ) = µˆ1 µ 2 (θ) = µˆ2.. µ d (θ) = µˆd 11
12 Note:. θ must be identifiable Existence of µ j : lim µˆj = µ j with µ j <. n If q(θ) = h(µ 1,..., µ d ), then the Method-of-Moments Estimate of q(θ) is qˆ(θ) = h(ˆµ 1,..., µˆd ). The MOM estimate of θ may not be unique! (See Problem ) 12
13 Plug-In and Extension Principles Frequency Plug-In Multinomial Sample: X 1,..., X n with K values v 1,..., v K P(X i = v j ) = p j j = 1,..., K Plug in estimates: ˆp j = N j /n where N j = count({i : X i = v j }) Apply to any function q(p 1,..., p K ): qˆ = q(ˆp 1,..., pˆk ) Equivalent to substituting the true distribution function P θ (t) = P(X t θ) underlying an iid sample with the empirical distribution function: n ˆP(t) = 1 1{x i t} n i=1 Pˆ is an estimate of P, and ν(pˆ) is an estimate of ν(p). 13
14 14 Example: αth population quantile ν α (P) = 1 [F 1 (α) + F 1 2 U (α)], with 0 < α < 1: where F 1 (α) = inf {x : F (x) α} F 1 U (α) = sup{x : F (x) α} The plug-in estimate is νˆα(p) = ν α (Pˆ) = 1 [Fˆ 1 (α) + Fˆ 1 2 U (α)]. Example: Method of Moments Estimates of jth Moment ν(p) = µ j = E (X j ) 1 νˆ(p) = = ν(ˆ n j µˆj P) = 1 Extension Principle n i=1 x i Objective: estimate q(θ), a function of θ. Assume q(θ) = h(p 1 (θ),..., p K (θ)), where h() is continuous. The extension principle estimates q(θ) with qˆ(θ) = h(ˆp 1,..., pˆk ) h() may not be unique: what h() is optimal? MIT Methods of Estimation
15 Notes on Method-of-Moments/Frequency Plug-In Estimates Easy to compute Valuable as initial estimates in iterative algorithms. Consistent estimates (close to true parameter in large samples). Best Frequency Plug-In Estimates are Maximum-Likelihood Estimates. In some cases, MOM estimators are foolish (See Example 2.1.7). 15
16 Outline Methods of Estimation I 1 Methods of Estimation I 16
17 Least Squares Methods of Estimation I General Model: Only Y Random X = {(z i, Y i ) : 1 i n}, where Y 1,..., Y n are independent. z 1,..., z n R d are fixed, non-random. For cases i = 1,..., n Y i = µ(z i ) + E i, where µ(z) = g(β, z), β R d. E i are independent with E [E i ] = 0. The Least-Squares Contrast function 1 is ρ(x, β) = Y µ 2 = n i=1[y i g(β, z i )] 2. β parametrizes the model and we can write the discrepancy function D(β 0, β) = E β0 ρ(x, β) 17
18 Least Squares: Only Y Random Contrast Function: 1 ρ(x, β) = Y µ 2 = n i=1[y i g(β, z i )] 2. Discrepancy Function: D(β 0, β) = E 1 β0 ρ(x, β) n 1n = Var(E i ) + [g(β 0, z i ) g(β, z i )] 2 ]. i=1 i=1 The model is semiparametric with unknown parameter β and unknown (joint) distribution P e of E= (E 1,..., E n ). Gauss-Markov Assumptions Assume that the distribution of E satisfy: E (E i ) = 0 Var(E i ) = σ 2 Cov(E i, E j ) = 0 for i = j 18
19 General Model: (Y,Z) Both Random (Y 1, Z 1 ),..., (Y n, Z n ) are i.i.d. as X = (Y, Z ) P Define µ(z) = E [Y Z = z] = g(β, z), where g(, ) is a known function and β R d is unknown parameter Given Z i = z i, define E i = Y i µ(z i ) for i = 1,..., n Conditioning on the z i we can write: Y i = g(β, z i ) + E i, i = 1, 2,..., n where E = (E 1,..., E n ) has (joint) distribution P e The Least-Squares Estimate of βˆ is the plug-in estimate β(pˆ), where Pˆ is the empirical distribution for the sample {(Z i, Y i ), i = 1,..., n} The function g(β, z) can be linear in β and z or nonlinear. Closed-form solutions exist for βˆ when g is linear in β. 19
20 Outline Methods of Estimation I 1 Methods of Estimation I 20
21 : Assumptions y1 x 1,1 x 1,2 x 1,p y 2 Data y =. and X = x 2,1 x 2,2 x 2,p y n x n,1 x n,2 x p,n follow a linear model satisfying the Gauss-Markov Assumptions if y is an observation of random vector Y = (Y 1, Y 2,... Y N ) T and E (Y X, β) = Xβ, where β = (β 1, β 2,... β p ) T is the p-vector of regression parameters. Cov(Y X, β) = σ 2 I n, for some σ 2 > 0. I.e., the random variables generating the observations are uncorrelated and have constant variance σ 2 (conditional on X, and β). 21
22 For known constants c 1, c 2,..., c p, c p+1, consider the problem of estimating θ = c 1 β 1 + c 2 β 2 + c p β p + c p+1. Under the Gauss-Markov assumptions, the estimator θˆ = c 1 βˆ1 + c 2 βˆ2 + c p βˆp + c p+1, where βˆ1, βˆ 2,... βˆ p are the least squares estimates is 1) An Unbiased Estimator of θ 2) A Linear Estimator 1 of θ, that is θ = n b i y i, for some known (given X) constants b i. i=1 Theorem: Under the Gauss-Markov Assumptions, the estimator ˆθ has the smallest (Best) variance among all Linear Unbiased Estimators of θ, i.e., θˆ is BLUE. 22
23 : Proof Proof: Without loss of generality, assume c p+1 = 0 and define c =(c 1, c 2,..., c p ) T. The Least Squares Estimate of θ = c T β is: θˆ = c T βˆ = c T (X T X) 1 X T y d T y a linear estimate in y given by coefficients d = (d 1, d 2,..., d n ) T. Consider an alternative linear estimate of θ: θ = b T y with fixed coefficients given by b = (b 1,..., b n ) T. Define f = b d and note that θ = b T y = (d + f) T y = θˆ + f T y If θ is unbiased then because θˆ is unbiased 0 = E (f T y) = f T E (y) = f T (Xβ) for all β R p = f is orthogonal to column space of X = f is orthogonal to d = X(X T X) 1 c 23
24 If θ is unbiased then The orthogonality of f to d implies Var(θ ) = Var(b T y) = Var(d T y + f T y) = Var(d T y) + Var(f T y) + 2Cov(d T y, f T y) = Var(θˆ) + Var(f T y) + 2d T Cov(y)f = Var(θˆ) + Var(f T y) + 2d T (σ 2 I n )f = Var(θˆ) + Var(f T y) + 2σ 2 d T f = Var(θˆ) + Var(f T y) + 2σ 2 0 Var(θˆ) 24
25 Outline Methods of Estimation I 1 Methods of Estimation I 25
26 Estimates Consider generalizing the Gauss-Markov assumptions for the linear regression model to Y = Xβ + E where the random n-vector E: E [E] = 0 n and E [EE T ] = σ 2 Σ. σ 2 is an unknown scale parameter Σ is a known (n n) positive definite matrix specifying the relative variances and correlations of the component observations. Transform the data (Y, X) to Y = Σ 1 2 Y and X = Σ 1 2 X and the model becomes Y = X β + E, where E [E ] = 0 n and E [E (E ) T ] = σ 2 I n By the, the BLUE ( GLS ) of β is βˆ = [(X ) T (X )] 1 (X ) T (Y ) = [X T Σ 1 X] 1 (X T Σ 1 Y) 26
27 Outline Methods of Estimation I 1 Methods of Estimation I 27
28 Estimation 28 X P θ, θ Θ with density or pmf function p(x θ). Given an observation X = x, define the likelihood function L x (θ) = p(x θ) : a mapping: Θ R. θˆ ML = θˆ ML (x): the Maximum-Likelihood Estimate of θ is the value making L x ( ) a maximum θ ˆis the MLE if L x (θˆ) = max L x (θ). θ Θ The MLE θˆml (x) identifies the distribution making x most likely The MLE coincides with the mode of the Posterior Distribution if the Prior Distribution on Θ is uniform: π(θ x) p(x θ)π(θ) p(x θ). MIT Methods of Estimation
29 Methods of Estimation I Examples Example 2.2.4: Normal Distribution with Known Variance Example 2.2.5: Size of a Population X 1,..., X n are iid U{1, 2,..., θ}, with θ {1, 2,...}. For x = (x 1,.., x n ), L n x (θ) = i=1 θ 1 1(1 x i θ) = θ n 1(max(x 1,..., x n )) θ) 0, if θ = 0, 1,..., max(x i ) 1 = θ n if θ max(x i ) 29
30 As a Minimum Contrast Method Define l x (θ) = log L x (θ) = log p(x θ) Because log( ) is monotone decreasing, ˆ θ ML (x) minimizes l x (θ) For an iid sample X = (X 1,..., X n ) with densities p(x i θ), l X (θ) = log p(x. 1,..., x n theta) = log [ n 1 i=1 p(x i θ)] = n i=1 log p(x i θ) As a minimum contrast function, ρ(x, θ) = l X (θ) yields the MLE θˆml (x) The discrepancy function corresonding to the contrast function ρ(x, θ) is D(θ 0, θ) = E [ρ(x, θ) θ 0 ] = E [log p(x θ) θ 0 ] 30
31 Suppose that θ = θ 0 uniquely minimizes D(θ 0, ). Then D(θ 0, θ) D(θ 0, θ 0 ) = E [log p(x θ) θ 0 ] ( E[log p(x θ 0 ) θ 0 ]) p( x θ) = E [log p ( x θ0) θ 0] > 0, unless θ = θ 0. This difference is the Kullback-Leibler Information Divergence between distribution P θ0 and P θ : p( x θ) K (P θ0, P θ ) = E [log( p ( x θ0) ) θ 0] Lemma (Shannon, 1948) The mutual entropy K (P θ0, P θ ) is always well defined and K (P θ0, P θ ) 0 Equality holds if and only if {x : p(x θ) = p(x θ 0 )} has probability 1 under both P θ0 and P θ. Proof Apply Jensen s Inequality (B.9.3) 31
32 Likelihood Equations Suppose: X P θ, with θ Θ, an open parameter space the likelihood function l X (θ) is differentiable in θ ˆθ ML (x) exists Then: θˆml (x) must satisfy the Likelihood Equation(s) ' θ l X (θ) = 0. Important Cases For independent X i with 1 densities/pmfs p i (x i θ), ' n θ l X (θ) = i=1 ' θ log p i (x i θ) = 0 NOTE: p i ( θ) may vary with i. 32
33 Examples Hardy-Weinberg Proportions (Example 2.2.6) Queues: Poisson Process Models (Exponential Arrival Times and Poisson Counts) (Example 2.2.7) Multinomial Trials (Example 2.2.8) Normal Regression Models (Example 2.2.9). 33
34 MIT OpenCourseWare Mathematical Statistics Spring 2016 For information about citing these materials or our Terms of Use, visit:
MIT Spring 2015
Regression Analysis MIT 18.472 Dr. Kempthorne Spring 2015 1 Outline Regression Analysis 1 Regression Analysis 2 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent)
More informationMaximum Likelihood Large Sample Theory
Maximum Likelihood Large Sample Theory MIT 18.443 Dr. Kempthorne Spring 2015 1 Outline 1 Large Sample Theory of Maximum Likelihood Estimates 2 Asymptotic Results: Overview Asymptotic Framework Data Model
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationStatistics 3858 : Maximum Likelihood Estimators
Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,
More informationMIT Spring 2016
Generalized Linear Models MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Generalized Linear Models 1 Generalized Linear Models 2 Generalized Linear Model Data: (y i, x i ), i = 1,..., n where y i : response
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationCOMP2610/COMP Information Theory
COMP2610/COMP6261 - Information Theory Lecture 9: Probabilistic Inequalities Mark Reid and Aditya Menon Research School of Computer Science The Australian National University August 19th, 2014 Mark Reid
More information3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.
Appendix A Information Theory A.1 Entropy Shannon (Shanon, 1948) developed the concept of entropy to measure the uncertainty of a discrete random variable. Suppose X is a discrete random variable that
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationLecture 2: Statistical Decision Theory (Part I)
Lecture 2: Statistical Decision Theory (Part I) Hao Helen Zhang Hao Helen Zhang Lecture 2: Statistical Decision Theory (Part I) 1 / 35 Outline of This Note Part I: Statistics Decision Theory (from Statistical
More informationPrediction. Prediction MIT Dr. Kempthorne. Spring MIT Prediction
MIT 18.655 Dr. Kempthorne Spring 2016 1 Problems Targets of Change in value of portfolio over fixed holding period. Long-term interest rate in 3 months Survival time of patients being treated for cancer
More information1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).
Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More information18.175: Lecture 8 Weak laws and moment-generating/characteristic functions
18.175: Lecture 8 Weak laws and moment-generating/characteristic functions Scott Sheffield MIT 18.175 Lecture 8 1 Outline Moment generating functions Weak law of large numbers: Markov/Chebyshev approach
More informationSTAT 100C: Linear models
STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationMIT Spring 2016
Decision Theoretic Framework MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Decision Theoretic Framework 1 Decision Theoretic Framework 2 Decision Problems of Statistical Inference Estimation: estimating
More informationEconometrics I, Estimation
Econometrics I, Estimation Department of Economics Stanford University September, 2008 Part I Parameter, Estimator, Estimate A parametric is a feature of the population. An estimator is a function of the
More informationAsymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands
Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department
More informationMathematical statistics
October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter
More informationParameter Estimation
Parameter Estimation Chapters 13-15 Stat 477 - Loss Models Chapters 13-15 (Stat 477) Parameter Estimation Brian Hartman - BYU 1 / 23 Methods for parameter estimation Methods for parameter estimation Methods
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationMathematical statistics
October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:
More informationLECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline
LECTURE 2 Convexity and related notions Last time: Goals and mechanics of the class notation entropy: definitions and properties mutual information: definitions and properties Lecture outline Convexity
More informationPeter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8
Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationLecture 1: Introduction
Principles of Statistics Part II - Michaelmas 208 Lecturer: Quentin Berthet Lecture : Introduction This course is concerned with presenting some of the mathematical principles of statistical theory. One
More informationA Resampling Method on Pivotal Estimating Functions
A Resampling Method on Pivotal Estimating Functions Kun Nie Biostat 277,Winter 2004 March 17, 2004 Outline Introduction A General Resampling Method Examples - Quantile Regression -Rank Regression -Simulation
More informationReview. December 4 th, Review
December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter
More informationBTRY 4090: Spring 2009 Theory of Statistics
BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)
More informationChapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)
Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December
More informationPost-exam 2 practice questions 18.05, Spring 2014
Post-exam 2 practice questions 18.05, Spring 2014 Note: This is a set of practice problems for the material that came after exam 2. In preparing for the final you should use the previous review materials,
More informationMaximum Likelihood Estimation
Maximum Likelihood Estimation Assume X P θ, θ Θ, with joint pdf (or pmf) f(x θ). Suppose we observe X = x. The Likelihood function is L(θ x) = f(x θ) as a function of θ (with the data x held fixed). The
More informationConvergence of Random Variables Probability Inequalities
Convergence, MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline Convergence, 1 Convergence, 2 MIT 18.655 Convergence, /Vectors Framework Z n = (Z n,1, Z n,2,..., Z n,d ) T, a sequence of random
More informationNotes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed
18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,
More informationCherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009 there were participants
18.650 Statistics for Applications Chapter 5: Parametric hypothesis testing 1/37 Cherry Blossom run (1) The credit union Cherry Blossom Run is a 10 mile race that takes place every year in D.C. In 2009
More informationModel Selection for Geostatistical Models
Model Selection for Geostatistical Models Richard A. Davis Colorado State University http://www.stat.colostate.edu/~rdavis/lectures Joint work with: Jennifer A. Hoeting, Colorado State University Andrew
More informationIntroduction to Machine Learning Lecture 14. Mehryar Mohri Courant Institute and Google Research
Introduction to Machine Learning Lecture 14 Mehryar Mohri Courant Institute and Google Research mohri@cims.nyu.edu Density Estimation Maxent Models 2 Entropy Definition: the entropy of a random variable
More informationFor more information about how to cite these materials visit
Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/
More informationMinimum Message Length Analysis of the Behrens Fisher Problem
Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction
More informationEstimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator
Estimation Theory Estimation theory deals with finding numerical values of interesting parameters from given set of data. We start with formulating a family of models that could describe how the data were
More information6.041/6.431 Fall 2010 Final Exam Wednesday, December 15, 9:00AM - 12:00noon.
6.041/6.431 Fall 2010 Final Exam Wednesday, December 15, 9:00AM - 12:00noon. DO NOT TURN THIS PAGE OVER UNTIL YOU ARE TOLD TO DO SO Name: Recitation Instructor: TA: Question Score Out of 1.1 4 1.2 4 1.3
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationLecture 6: Linear models and Gauss-Markov theorem
Lecture 6: Linear models and Gauss-Markov theorem Linear model setting Results in simple linear regression can be extended to the following general linear model with independently observed response variables
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More information6.041/6.431 Fall 2010 Final Exam Solutions Wednesday, December 15, 9:00AM - 12:00noon.
604/643 Fall 200 Final Exam Solutions Wednesday, December 5, 9:00AM - 2:00noon Problem (32 points) Consider a Markov chain {X n ; n 0,, }, specified by the following transition diagram 06 05 09 04 03 2
More informationENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM
c 2007-2016 by Armand M. Makowski 1 ENEE 621 SPRING 2016 DETECTION AND ESTIMATION THEORY THE PARAMETER ESTIMATION PROBLEM 1 The basic setting Throughout, p, q and k are positive integers. The setup With
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationLecture 2: August 31
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: August 3 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationMIT Spring 2015
Assessing Goodness Of Fit MIT 8.443 Dr. Kempthorne Spring 205 Outline 2 Poisson Distribution Counts of events that occur at constant rate Counts in disjoint intervals/regions are independent If intervals/regions
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1
MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationBayesian estimation of the discrepancy with misspecified parametric models
Bayesian estimation of the discrepancy with misspecified parametric models Pierpaolo De Blasi University of Torino & Collegio Carlo Alberto Bayesian Nonparametrics workshop ICERM, 17-21 September 2012
More informationLast lecture 1/35. General optimization problems Newton Raphson Fisher scoring Quasi Newton
EM Algorithm Last lecture 1/35 General optimization problems Newton Raphson Fisher scoring Quasi Newton Nonlinear regression models Gauss-Newton Generalized linear models Iteratively reweighted least squares
More informationMLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22
MLE and GMM Li Zhao, SJTU Spring, 2017 Li Zhao MLE and GMM 1 / 22 Outline 1 MLE 2 GMM 3 Binary Choice Models Li Zhao MLE and GMM 2 / 22 Maximum Likelihood Estimation - Introduction For a linear model y
More information5 Mutual Information and Channel Capacity
5 Mutual Information and Channel Capacity In Section 2, we have seen the use of a quantity called entropy to measure the amount of randomness in a random variable. In this section, we introduce several
More informationSTAT 135 Lab 3 Asymptotic MLE and the Method of Moments
STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,
More informationLECTURE 2 LINEAR REGRESSION MODEL AND OLS
SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another
More informationModelling of Dependent Credit Rating Transitions
ling of (Joint work with Uwe Schmock) Financial and Actuarial Mathematics Vienna University of Technology Wien, 15.07.2010 Introduction Motivation: Volcano on Iceland erupted and caused that most of the
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationLECTURE 5 NOTES. n t. t Γ(a)Γ(b) pt+a 1 (1 p) n t+b 1. The marginal density of t is. Γ(t + a)γ(n t + b) Γ(n + a + b)
LECTURE 5 NOTES 1. Bayesian point estimators. In the conventional (frequentist) approach to statistical inference, the parameter θ Θ is considered a fixed quantity. In the Bayesian approach, it is considered
More informationGraphical Models for Collaborative Filtering
Graphical Models for Collaborative Filtering Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Sequence modeling HMM, Kalman Filter, etc.: Similarity: the same graphical model topology,
More informationMIT Spring 2016
Exponential Families II MIT 18.655 Dr. Kempthorne Spring 2016 1 Outline Exponential Families II 1 Exponential Families II 2 : Expectation and Variance U (k 1) and V (l 1) are random vectors If A (m k),
More informationCentral Limit Theorem ( 5.3)
Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately
More informationStatistics Ph.D. Qualifying Exam: Part II November 9, 2002
Statistics Ph.D. Qualifying Exam: Part II November 9, 2002 Student Name: 1. Answer 8 out of 12 problems. Mark the problems you selected in the following table. 1 2 3 4 5 6 7 8 9 10 11 12 2. Write your
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationFoundations of Statistical Inference
Foundations of Statistical Inference Julien Berestycki Department of Statistics University of Oxford MT 2016 Julien Berestycki (University of Oxford) SB2a MT 2016 1 / 32 Lecture 14 : Variational Bayes
More informationContinuous Data with Continuous Priors Class 14, Jeremy Orloff and Jonathan Bloom
1 Learning Goals Continuous Data with Continuous Priors Class 14, 18.0 Jeremy Orloff and Jonathan Bloom 1. Be able to construct a ian update table for continuous hypotheses and continuous data. 2. Be able
More informationarxiv: v2 [math.st] 13 Apr 2012
The Asymptotic Covariance Matrix of the Odds Ratio Parameter Estimator in Semiparametric Log-bilinear Odds Ratio Models Angelika Franke, Gerhard Osius Faculty 3 / Mathematics / Computer Science University
More informationIntroduction to Estimation Methods for Time Series models. Lecture 1
Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationRegression Models - Introduction
Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,
More informationSOLUTION FOR HOMEWORK 6, STAT 6331
SOLUTION FOR HOMEWORK 6, STAT 633. Exerc.7.. It is given that X,...,X n is a sample from N(θ, σ ), and the Bayesian approach is used with Θ N(µ, τ ). The parameters σ, µ and τ are given. (a) Find the joinf
More informationESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS
ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,
More informationThe Method of Types and Its Application to Information Hiding
The Method of Types and Its Application to Information Hiding Pierre Moulin University of Illinois at Urbana-Champaign www.ifp.uiuc.edu/ moulin/talks/eusipco05-slides.pdf EUSIPCO Antalya, September 7,
More informationELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)
1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationf(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain
0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher
More informationLecture 4 September 15
IFT 6269: Probabilistic Graphical Models Fall 2017 Lecture 4 September 15 Lecturer: Simon Lacoste-Julien Scribe: Philippe Brouillard & Tristan Deleu 4.1 Maximum Likelihood principle Given a parametric
More informationGraduate Econometrics I: Maximum Likelihood II
Graduate Econometrics I: Maximum Likelihood II Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Maximum Likelihood
More informationEstimating Unnormalised Models by Score Matching
Estimating Unnormalised Models by Score Matching Michael Gutmann Probabilistic Modelling and Reasoning (INFR11134) School of Informatics, University of Edinburgh Spring semester 2018 Program 1. Basics
More informationProblem 1 (20) Log-normal. f(x) Cauchy
ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5
More informationSection 8.2. Asymptotic normality
30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 00 MODULE : Statistical Inference Time Allowed: Three Hours Candidates should answer FIVE questions. All questions carry equal marks. The
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationRandom vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.
Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just
More informationA Very Brief Summary of Bayesian Inference, and Examples
A Very Brief Summary of Bayesian Inference, and Examples Trinity Term 009 Prof Gesine Reinert Our starting point are data x = x 1, x,, x n, which we view as realisations of random variables X 1, X,, X
More informationA General Overview of Parametric Estimation and Inference Techniques.
A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying
More information15 Discrete Distributions
Lecture Note 6 Special Distributions (Discrete and Continuous) MIT 4.30 Spring 006 Herman Bennett 5 Discrete Distributions We have already seen the binomial distribution and the uniform distribution. 5.
More informationMax. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes
Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationPart III. A Decision-Theoretic Approach and Bayesian testing
Part III A Decision-Theoretic Approach and Bayesian testing 1 Chapter 10 Bayesian Inference as a Decision Problem The decision-theoretic framework starts with the following situation. We would like to
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More informationECE534, Spring 2018: Solutions for Problem Set #3
ECE534, Spring 08: Solutions for Problem Set #3 Jointly Gaussian Random Variables and MMSE Estimation Suppose that X, Y are jointly Gaussian random variables with µ X = µ Y = 0 and σ X = σ Y = Let their
More information10-704: Information Processing and Learning Fall Lecture 24: Dec 7
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 24: Dec 7 Note: These notes are based on scribed notes from Spring5 offering of this course. LaTeX template courtesy of
More information