The loss function and estimating equations

Size: px
Start display at page:

Download "The loss function and estimating equations"

Transcription

1 Chapter 6 he loss function and estimating equations 6 Loss functions Up until now our main focus has been on parameter estimating via the maximum likelihood However, the negative maximum likelihood is criterion belong to a group of criterions known as loss functions Loss functions are usually distances, such as the and 2 distance ypically we estimate a parameter by minimising the loss function, and using as the estimator the parameter which minimises the loss Usually (but not always) the way to solve the loss function is to differentiate it and equate it to zero his motivates another means of estimating parameters, by finding the solution of a equation (these are usually called estimating functions) For many examples, there is an estimating function which corresponds to a loss function and loss function which corresponds to an estimating equation But this will not always be the case Example 6 Examples where the loss function does not have a derivative) Consider the Laplacian (also known as the double exponential), which is defined as f(y; θ µ) = 2θ exp( xy θ /θ) = 2θ exp((y θ)/θ) y < θ exp( (y θ)/θ) y θ We observe {Y t } and our objective is to estimate the location parameter µ, for now the scale parameter θ is not of interest he log-likelihood is L (θ) = log 2θ θ 6 2θ Y i θ i=

2 We maximise the above to estimate µ We see that this is equivalent to minimising the loss function L (θ) = Y i θ = 2 (Y i θ) + 2 (θ Y i) i= Y i >θ Y i θ If we make a plot of L over θ, and consider how L, behaves at the ordered observations {Y (t) }, we see that it is piecewise continuous (that is it is continuous but non-differentiable at the points Y (t) ) On closer inspection (if is odd) we see that L has its minimum at θ = Y (/2), which is the sample median (to prove this to yourself take an example of four observation and construct L, you can then extend this argument via a psuedo induction method) Heuristically, the derivative of the loss function is (number of Y i less than µ) minus (number of Y i greater than or equal to µ), using this reasoning gives us a another argument as to why the sample median minimises the loss Note that it is a little ambigiuous what the minimum is when is even In summary, the normal distribution gives rise to the 2 -loss function and the sample mean In contrast the Laplacian gives rise to the -loss function and the sample median Consider the generalisation of the Laplacian, usually called the assymmetric Laplacian, which is defined as f(y; θ µ) = where 0 < p < he log-likelihood is p θ exp(p(y θ)/θ) y < θ ( p) θ exp( ( p)(y θ)/θ) y θ L (θ) = ( p)(y i θ) + p(θ Y i ) Y i >θ Y i θ Using similar arguments to those in part (i), it can be shown that the minimum of L is approximately the pth quantile 62 Estimating Functions See also Section 72 in Davison (2002) and Lars Peter Hansen (982, Econometrica) Estimating functions is a unification and generalisation of the maximum likelihood methods and the method of moments It should be noted that it is a close cousin of the generalised method of moments and generalised estimating equation We first consider a few examples and will later describe a feature common to all these examples 62

3 Example 62 (i) Let us suppose that {Y t } are iid random variables with Y t N (µ σ 2 ) he log-likelihood in proportional to L (µ σ 2 ) = 2 log σ2 2σ 2 (X t µ) 2 We know that to estimate µ and σ 2 we use the µ and σ 2 which are the solution of 2σ 2 + 2σ 4 (X t µ) 2 = 0 2σ 2 (X t µ) 2 = 0 (6) (ii) In general suppose {Y t } are iid random variables with Y i f( ; θ) he log-likelihood is L (θ) = log f(θ; Y t) If the regularity conditions are satisfied then to estimate θ we use the solution of L (θ) = 0 (62) (iii) Let us suppose that {Y t } are iid random variables with a Weibull distribution f(x; θ) = ( α φ )( x φ )α exp( (x/φ) α ), where α φ > 0 We know that (see Section 9, Example 92) (X) = φγ(+α ) and (X 2 ) = φ 2 Γ(+ 2α ) herefore (X) φγ( + α ) = 0 and (X 2 ) φ 2 Γ( + 2α ) = 0 Hence by solving X t φγ( + α ) = 0 Xt 2 φ 2 Γ( + 2α ) = 0 (63) we obtain estimators of α and Γ his is essentially a method of moments estimator of the parameters in a Weibull distribution (iv) We can generalise the above It can be shown that (X r ) = φ r Γ( + rα ) herefore, for any distinct s and r we can estimate α and Γ using the solution of Xt r φ r Γ( + rα ) = 0 Xt s φ s Γ( + sα ) = 0 (64) (v) Consider the simple linear regression Y t = αx t + ε t, with (ε t ) = 0 and var(ε t ) =, the least squares estimator of α is the solution of (Y t ax t )x t = 0 (65) 63

4 We observe that all the above estimators can be written as the solution of a homogenous equations - see equations (6), (62), (63), (64) and (65) In other words, for each case we can define a random function G (θ), such that the above estimators are the solutions of G ( θ ) = 0 In the case that {Y t } are iid then G (θ) = g(y t; θ), for some function g(y t ; θ) he function G ( θ) is called an estimating function All the function G, defined above, satisfy the unbiased property which we define below Definition 62 An estimating function G is called unbiased if at the true parameter θ 0 G ( ) satisfies G (θ 0 ) = 0 Hence the estimating function is alternative way of viewing a parameter estimator Until now, parameter estimators have been defined in terms of the maximum of the likelihood However, an alternative method for defining an estimator is as the solution of a function For example, suppose that {Y t } are random variables, whose distribution depends in some way on the parameter θ 0 We want to estimate θ 0, and we know that there exists a function such that G(θ 0 ) = 0 herefore using the data {Y t } we can define a random function, G where (G (θ)) = G(θ) Hence we can use the parameter θ, which satisfies G ( θ) = 0, as an estimator of θ We observe that such estimators include most maximum likelihood estimators and method of moment estimators Example 622 Based on the examples above we see that (i) he estimating function is G (µ σ) = (ii) he estimating function is G (θ) = L (θ) 2σ 2 + 2σ 4 (X t µ) 2 2σ 2 (X t µ) 2 (iii) he estimating function is G (α φ) = X t φγ( + α ) X2 t φ 2 Γ( + 2α ) (iv) he estimating function is G (α φ) = Xs t φ s Γ( + sα ) Xr t φ r Γ( + rα ) 64

5 (v) he estimating function is G (a) = (Y t ax t )x t he advantage of this approach is that sometimes the solution of an estimating equation will have a smaller finite sample variance than the MLE Even though asymptotically (under certain conditions) the MLE will asymptotically attain the Cramer-Rao bound (which is the smallest variance) Moreover, MLE estimators are based on the assumption that the distribution is known (else the estimator is misspecified - see Section 9), however sometimes an estimating equation can be free of such assumptions Recall that in Example 62(v) that for (Y t αx t )x t = 0 (66) is true regardless of the distribution of Y t ({ε t }) and is also true if there {Y t } are dependent random variables We recall that Example 62(v) is the least squares estimato, which is a consistent estimator of the parameters for a variety of settings (see Rao (973), Linear Statistical Inference and its applications, and your SA62 notes) Example 623 In many statistical situations it is relatively straightforward to find a suitable estimating function rather than find the likelihood Consider the time series {X t } which satisfies X t = a X t + a 2 X t 2 + σ(a a 2 )ε t where {ε t } are iid random variables We do not know the distribution of ε t, but because ε t is independent of X t and X t 2, by multiplying the above equation by X t an taking expections we have (X t X t ) = a (X 2 t ) + a 2 (X t X t 2 ) (X t X t 2 ) = a (X 2 t 2) + a 2 (X 2 t 2) Since the above time series is stationary (we have not formally defined this - but basically it means the properties of {X t } do not vary over time), it can be shown that X tx t r is an estimator of (X t X t ) and that ( X tx t r ) = (X t X t r ) Hence replacing the above with its estimators we obtain the estimating equations G (a a 2 ) = X tx t a X2 t a 2 t X t X t 2 X tx t 2 a X t X t 2 a 2 X2 t 2 We now show that under certain conditions θ is a consistent estimator of θ 65

6 heorem 62 Suppose that G (θ) is an unbiased estimating function, where G ( θ ) = 0 and (G (θ 0 )) = 0 (i) If θ is a scalar, for every G (θ) is a continuous monotonically decreasing function in θ and (G (θ)) 0, then we have θ P θ0 (ii) If we can show that sup θ G (θ) (G (θ)) P 0 and (G (θ)) is uniquely zero at θ 0 then we have θ P θ0 PROOF he proof of case (i) is relatively straightforward (see also page 38 in Davison (2002)), and is best understand by making a plot of G (θ) with θ < θ 0 ε < theta 0 We first note that since (G (θ 0 )), for any fixed ε > 0 we have G (θ 0 ε) P G (θ 0 ε) > 0 (67) since G is monotonically decreasing for all Now, since G (θ) is monotonically decreasing we see that θ < (θ 0 ε) implies G ( θ ) G (θ 0 ε) > 0 (and visa-versa) hence P θ (θ 0 ε) 0 = P G ( θ ) G (θ 0 ε) > 0 But we have from (67) that (G (θ 0 ε)) P (G (θ 0 ε)) > 0 hus P G ( θ ) G (θ 0 ε) > 0 P 0 and P θ (θ 0 ε) < 0 P 0 as A similar argument can be used to show that that P θ (θ 0 + ε) > 0 P 0 as As the above is true for all ε, together they imply that θ P θ0 as he proof of (ii) is more involved, but essentially follows the lines of the proof of heorem 4 (exercise for those wanting to do some more theory) We now show normality, which will give us the variance of the limiting distribution of θ heorem 622 Let us suppose that {Y t } are iid random variables and G (θ) = g(y t; θ) Suppose that θ P θ0 and the first and second derivatives of G ( ) have a finite expectation (we will assume that θ is a scalar to simplify notation) hen we have as P var(g(y t ; θ 0 )) θ θ 0 N 0 g(yt;θ) 2 θ0 66

7 PROOF We use the standard aylor expansion to prove the result (which you should be expert in by now) Using a aylor expansion we have G ( θ ) = G (θ 0 ) + ( θ θ 0 ) G (θ) θ (68) ( θ θ 0 ) = ( G (θ) θ0 ) G (θ 0 ) + o p ( ) /2 where the above is due to G (θ) P θ ( G (θ) θ0 ) as Now, since G (θ 0 ) = t g(y t; θ) is the sum of iid random variables we have G (θ 0 ) P N 0 var(g (θ 0 )) (69) (since (g(y t ; θ 0 )) = 0) herefore (68) and (69) together give P θ θ 0 N 0 as required var(g(y t ; θ 0 )) g(yt;θ) θ0 2 Remark 62 In general if {Y t } are not iid random variables then (under certain conditions) we have P θ θ 0 N 0 var(g (θ 0 )) G (θ) 2 θ0 Example 624 he Huber estimator) We describe the Huber estimator which is a well known estimator of the mean which is robust to outliers he estimator can be written as an estimating function Let us suppose that {Y t } are iid random variables with mean θ, and density function which is symmetric about the mean θ So that outliers do not effect the mean, a robust method of estimation is to truncate the outliers and define the function c Y t < c + θ g (c) (Y t ; θ) = Y t c c + θ Y t c + θ c Y t > c + θ he estimating equation is G c (θ) = g (c) (Y t ; θ) And we use as an estimator of θ, the θ which solves G c ( θ ) = 0 67

8 (i) In the case that c =, then we observe that G (θ) = (Y t θ), and the estimator is θ = Ȳ Hence without truncation, the estimator of the mean is the sample mean (ii) In the case that c is small, then we have truncated many observations var(g(y t;θ 0)) gyt ;θ) It is clear that 2 I(θ 0 ) (where I(θ) is the Fisher information) Hence for θ0 extremely large sample sizes, the MLE is the best estimator 62 Optimal estimating functions As in illustrated in Example 622(iii,iv) there are several different estimators of the same parameters A natural question, is which estimator does one use? We can answer this question by using the result in heorem 622 Roughly speaking heorem 622 implies (though this is not true in the strictest sense) that var θ var(g(y t ; θ 0 )) g(yt;θ) 2 (60) θ0 In the case that θ is a multivariate vector the above generalises to var θ ( g(y t; θ) ) var(g(y t ; θ 0 )) ( g(y t; θ) ) (6) Hence, we should choose the estimating function which leads to the smallest variance Example 625 Consider Example 622(iv) where we are estimating the Weibull parameters θ and α Different s and r give different equations, but are all are estimating the same parameters o decide which s and r to use, we calculate the variances for different s and r and choose the one with the smallest variance Now substituting ( g(y t; θ) rφ r Γ( + rα ) ) = sφ s Γ( + sα ) and (using that (X r ) = φ r Γ( + rα ) var(x r t ) cov(xt r Xt s ) t cov(xt r Xt s ) var(xt s ) φr Γ( + rα ) α 2 φs Γ( + sα ) α 2 into (6) for different r and s we can calculate the variance and and choose the estimating function with the smallest variance We now consider an example, which is a close precursor to GEE (generalised estimating equations) Usually GEE is a method for estimating the parameters in generalised linear models, where there is dependence in the data 68

9 Example 626 Suppose that {Y t } are independent random variables with mean {µ t (θ 0 )} and variance {V t (θ 0 )}, where the parametric form of {µ t ( )} and {V t ( )} are known, but θ 0 is unknown A possible estimating function is where we note that (G (θ 0 )) = 0 G (θ) = (Y t µ(θ)) variance, instead let us consider the general weighted function G (W ) (θ) = However, this function does not take into account the w t (θ)(y t µ(θ)) Again we observe that (G (W ) (θ 0 )) = 0 But we need to know which weight w(θ) to use Hence we choose the weight which minimises the variance in (60) Since {Y t } are independent we observe that var(g (W ) (θ 0 )) = w t (θ 0 ) 2 V t (θ 0 ) G (W ) (θ) θ0 = ( w t(θ 0 )(Y t µ t (θ 0 )) w t (θ 0 )µ t(θ 0 )) = w t (θ 0 )µ t(θ 0 ) Hence we have var θ w t(θ 0 ) 2 V t (θ 0 ) ( w t(θ 0 )µ t (θ 0)) 2 Now we want to choose the weights, thus the estimation function, which has the smallest variance herefore we look for weights which minimise the above Since the above is a ratio, and we observe that a small w t (θ) leads to a large denominator but a large numerator, we do the minimisation by using Lagrangian multiplers hat is set ( w t(θ 0 )µ (θ 0 )) 2 = (which means w t(θ 0 )µ (θ 0 )) = ) and minimise w t (θ 0 ) 2 V t (θ 0 ) + λ w t (θ 0 )µ (θ 0 ) with respect to {w t (θ 0 )} and λ Partially differentiating the above with respect to {w t (θ 0 )} and λ and setting to zero leads to w t (θ) = λµ t(θ)/v t (θ)/2 µ t(θ)/v t Hence the optimal estimating function is G (µ V ) (θ) = µ t(θ) V t (θ) (Y t µ t (θ)) 69

10 Observe that the above equations resembles the derivative of weighted least squared criterion V t (θ) (Y t µ t (θ)) 2 But an important difference us that the term involving the derivative of V t (θ)) has been ignored Remark 622 We conclude this section by mentioning that one generalisation of estimating equations is the generalised method of moments We observe the random vectors {Y t } and it is known that there exist a function g( ; θ) such that (g(y t ; θ 0 )) = 0 o estimate θ 0, rather than find the solution of g(y t; θ), a matrix M is defined and the parameter which mimimises is used as an estimator of θ g(y t ; θ) M g(y t ; θ) 70

Parameter Estimation

Parameter Estimation Parameter Estimation Consider a sample of observations on a random variable Y. his generates random variables: (y 1, y 2,, y ). A random sample is a sample (y 1, y 2,, y ) where the random variables y

More information

The Maximum Likelihood Estimator

The Maximum Likelihood Estimator Chapter 4 he Maximum Likelihood Estimator 4. he Maximum likelihood estimator As illustrated in the exponential family of distributions, discussed above, the maximum likelihood estimator of θ 0 the true

More information

6. MAXIMUM LIKELIHOOD ESTIMATION

6. MAXIMUM LIKELIHOOD ESTIMATION 6 MAXIMUM LIKELIHOOD ESIMAION [1] Maximum Likelihood Estimator (1) Cases in which θ (unknown parameter) is scalar Notational Clarification: From now on, we denote the true value of θ as θ o hen, view θ

More information

Link lecture - Lagrange Multipliers

Link lecture - Lagrange Multipliers Link lecture - Lagrange Multipliers Lagrange multipliers provide a method for finding a stationary point of a function, say f(x, y) when the variables are subject to constraints, say of the form g(x, y)

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the

Review Quiz. 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Review Quiz 1. Prove that in a one-dimensional canonical exponential family, the complete and sufficient statistic achieves the Cramér Rao lower bound (CRLB). That is, if where { } and are scalars, then

More information

4.2 Estimation on the boundary of the parameter space

4.2 Estimation on the boundary of the parameter space Chapter 4 Non-standard inference As we mentioned in Chapter the the log-likelihood ratio statistic is useful in the context of statistical testing because typically it is pivotal (does not depend on any

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given.

Final Exam. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. 1. (6 points) True/False. Please read the statements carefully, as no partial credit will be given. (a) If X and Y are independent, Corr(X, Y ) = 0. (b) (c) (d) (e) A consistent estimator must be asymptotically

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain

f(x θ)dx with respect to θ. Assuming certain smoothness conditions concern differentiating under the integral the integral sign, we first obtain 0.1. INTRODUCTION 1 0.1 Introduction R. A. Fisher, a pioneer in the development of mathematical statistics, introduced a measure of the amount of information contained in an observaton from f(x θ). Fisher

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Statistics & Data Sciences: First Year Prelim Exam May 2018

Statistics & Data Sciences: First Year Prelim Exam May 2018 Statistics & Data Sciences: First Year Prelim Exam May 2018 Instructions: 1. Do not turn this page until instructed to do so. 2. Start each new question on a new sheet of paper. 3. This is a closed book

More information

Information in a Two-Stage Adaptive Optimal Design

Information in a Two-Stage Adaptive Optimal Design Information in a Two-Stage Adaptive Optimal Design Department of Statistics, University of Missouri Designed Experiments: Recent Advances in Methods and Applications DEMA 2011 Isaac Newton Institute for

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Graduate Econometrics I: Unbiased Estimation

Graduate Econometrics I: Unbiased Estimation Graduate Econometrics I: Unbiased Estimation Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Unbiased Estimation

More information

Z-estimators (generalized method of moments)

Z-estimators (generalized method of moments) Z-estimators (generalized method of moments) Consider the estimation of an unknown parameter θ in a set, based on data x = (x,...,x n ) R n. Each function h(x, ) on defines a Z-estimator θ n = θ n (x,...,x

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

A General Overview of Parametric Estimation and Inference Techniques.

A General Overview of Parametric Estimation and Inference Techniques. A General Overview of Parametric Estimation and Inference Techniques. Moulinath Banerjee University of Michigan September 11, 2012 The object of statistical inference is to glean information about an underlying

More information

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1

Chapter 4 HOMEWORK ASSIGNMENTS. 4.1 Homework #1 Chapter 4 HOMEWORK ASSIGNMENTS These homeworks may be modified as the semester progresses. It is your responsibility to keep up to date with the correctly assigned homeworks. There may be some errors in

More information

LECTURE 18: NONLINEAR MODELS

LECTURE 18: NONLINEAR MODELS LECTURE 18: NONLINEAR MODELS The basic point is that smooth nonlinear models look like linear models locally. Models linear in parameters are no problem even if they are nonlinear in variables. For example:

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

A Few Notes on Fisher Information (WIP)

A Few Notes on Fisher Information (WIP) A Few Notes on Fisher Information (WIP) David Meyer dmm@{-4-5.net,uoregon.edu} Last update: April 30, 208 Definitions There are so many interesting things about Fisher Information and its theoretical properties

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models, two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent variable,

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Principles of Statistical Inference Recap of statistical models Statistical inference (frequentist) Parametric vs. semiparametric

More information

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak.

P n. This is called the law of large numbers but it comes in two forms: Strong and Weak. Large Sample Theory Large Sample Theory is a name given to the search for approximations to the behaviour of statistical procedures which are derived by computing limits as the sample size, n, tends to

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Measuring robustness

Measuring robustness Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

1. (Regular) Exponential Family

1. (Regular) Exponential Family 1. (Regular) Exponential Family The density function of a regular exponential family is: [ ] Example. Poisson(θ) [ ] Example. Normal. (both unknown). ) [ ] [ ] [ ] [ ] 2. Theorem (Exponential family &

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018

Mathematics Ph.D. Qualifying Examination Stat Probability, January 2018 Mathematics Ph.D. Qualifying Examination Stat 52800 Probability, January 2018 NOTE: Answers all questions completely. Justify every step. Time allowed: 3 hours. 1. Let X 1,..., X n be a random sample from

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

7. Estimation and hypothesis testing. Objective. Recommended reading

7. Estimation and hypothesis testing. Objective. Recommended reading 7. Estimation and hypothesis testing Objective In this chapter, we show how the election of estimators can be represented as a decision problem. Secondly, we consider the problem of hypothesis testing

More information

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR

CALCULATION METHOD FOR NONLINEAR DYNAMIC LEAST-ABSOLUTE DEVIATIONS ESTIMATOR J. Japan Statist. Soc. Vol. 3 No. 200 39 5 CALCULAION MEHOD FOR NONLINEAR DYNAMIC LEAS-ABSOLUE DEVIAIONS ESIMAOR Kohtaro Hitomi * and Masato Kagihara ** In a nonlinear dynamic model, the consistency and

More information

STA 260: Statistics and Probability II

STA 260: Statistics and Probability II Al Nosedal. University of Toronto. Winter 2017 1 Properties of Point Estimators and Methods of Estimation 2 3 If you can t explain it simply, you don t understand it well enough Albert Einstein. Definition

More information

All other items including (and especially) CELL PHONES must be left at the front of the room.

All other items including (and especially) CELL PHONES must be left at the front of the room. TEST #2 / STA 5327 (Inference) / Spring 2017 (April 24, 2017) Name: Directions This exam is closed book and closed notes. You will be supplied with scratch paper, and a copy of the Table of Common Distributions

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise. 1. Suppose Y 1, Y 2,..., Y n is an iid sample from a Bernoulli(p) population distribution, where 0 < p < 1 is unknown. The population pmf is p y (1 p) 1 y, y = 0, 1 p Y (y p) = (a) Prove that Y is the

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

STAT Financial Time Series

STAT Financial Time Series STAT 6104 - Financial Time Series Chapter 4 - Estimation in the time Domain Chun Yip Yau (CUHK) STAT 6104:Financial Time Series 1 / 46 Agenda 1 Introduction 2 Moment Estimates 3 Autoregressive Models (AR

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions.

For iid Y i the stronger conclusion holds; for our heuristics ignore differences between these notions. Large Sample Theory Study approximate behaviour of ˆθ by studying the function U. Notice U is sum of independent random variables. Theorem: If Y 1, Y 2,... are iid with mean µ then Yi n µ Called law of

More information

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets

Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Chapter 7. Confidence Sets Lecture 30: Pivotal quantities and confidence sets Confidence sets X: a sample from a population P P. θ = θ(p): a functional from P to Θ R k for a fixed integer k. C(X): a confidence

More information

10. Linear Models and Maximum Likelihood Estimation

10. Linear Models and Maximum Likelihood Estimation 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe y i iid pθ, θ Θ and the goal is to determine the θ that

More information

Theory of Statistics.

Theory of Statistics. Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased

More information

Lecture 8: Information Theory and Statistics

Lecture 8: Information Theory and Statistics Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang

More information

Generalized Estimating Equations

Generalized Estimating Equations Outline Review of Generalized Linear Models (GLM) Generalized Linear Model Exponential Family Components of GLM MLE for GLM, Iterative Weighted Least Squares Measuring Goodness of Fit - Deviance and Pearson

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =

Spring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n = Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically

More information

ECE531 Lecture 8: Non-Random Parameter Estimation

ECE531 Lecture 8: Non-Random Parameter Estimation ECE531 Lecture 8: Non-Random Parameter Estimation D. Richard Brown III Worcester Polytechnic Institute 19-March-2009 Worcester Polytechnic Institute D. Richard Brown III 19-March-2009 1 / 25 Introduction

More information

6.1 Variational representation of f-divergences

6.1 Variational representation of f-divergences ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016

More information

The Hybrid Likelihood: Combining Parametric and Empirical Likelihoods

The Hybrid Likelihood: Combining Parametric and Empirical Likelihoods 1/25 The Hybrid Likelihood: Combining Parametric and Empirical Likelihoods Nils Lid Hjort (with Ingrid Van Keilegom and Ian McKeague) Department of Mathematics, University of Oslo Building Bridges (at

More information

1 of 7 7/16/2009 6:12 AM Virtual Laboratories > 7. Point Estimation > 1 2 3 4 5 6 1. Estimators The Basic Statistical Model As usual, our starting point is a random experiment with an underlying sample

More information

Economic modelling and forecasting

Economic modelling and forecasting Economic modelling and forecasting 2-6 February 2015 Bank of England he generalised method of moments Ole Rummel Adviser, CCBS at the Bank of England ole.rummel@bankofengland.co.uk Outline Classical estimation

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Take-home final examination February 1 st -February 8 th, 019 Instructions You do not need to edit the solutions Just make sure the handwriting is legible The final solutions should

More information

Review and continuation from last week Properties of MLEs

Review and continuation from last week Properties of MLEs Review and continuation from last week Properties of MLEs As we have mentioned, MLEs have a nice intuitive property, and as we have seen, they have a certain equivariance property. We will see later that

More information

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics

Mathematics Qualifying Examination January 2015 STAT Mathematical Statistics Mathematics Qualifying Examination January 2015 STAT 52800 - Mathematical Statistics NOTE: Answer all questions completely and justify your derivations and steps. A calculator and statistical tables (normal,

More information

Normalising constants and maximum likelihood inference

Normalising constants and maximum likelihood inference Normalising constants and maximum likelihood inference Jakob G. Rasmussen Department of Mathematics Aalborg University Denmark March 9, 2011 1/14 Today Normalising constants Approximation of normalising

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 19, 010, 8:00 am - 1:00 noon Instructions: 1. You have four hours to answer questions in this examination.. You must show your

More information

Robustness of location estimators under t- distributions: a literature review

Robustness of location estimators under t- distributions: a literature review IOP Conference Series: Earth and Environmental Science PAPER OPEN ACCESS Robustness of location estimators under t- distributions: a literature review o cite this article: C Sumarni et al 07 IOP Conf.

More information

1. Fisher Information

1. Fisher Information 1. Fisher Information Let f(x θ) be a density function with the property that log f(x θ) is differentiable in θ throughout the open p-dimensional parameter set Θ R p ; then the score statistic (or score

More information

Large Sample Properties of Estimators in the Classical Linear Regression Model

Large Sample Properties of Estimators in the Classical Linear Regression Model Large Sample Properties of Estimators in the Classical Linear Regression Model 7 October 004 A. Statement of the classical linear regression model The classical linear regression model can be written in

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

Ch. 5 Hypothesis Testing

Ch. 5 Hypothesis Testing Ch. 5 Hypothesis Testing The current framework of hypothesis testing is largely due to the work of Neyman and Pearson in the late 1920s, early 30s, complementing Fisher s work on estimation. As in estimation,

More information

Limiting Distributions

Limiting Distributions Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data Song Xi CHEN Guanghua School of Management and Center for Statistical Science, Peking University Department

More information

Detection Theory. Composite tests

Detection Theory. Composite tests Composite tests Chapter 5: Correction Thu I claimed that the above, which is the most general case, was captured by the below Thu Chapter 5: Correction Thu I claimed that the above, which is the most general

More information

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) 1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Final Examination Statistics 200C. T. Ferguson June 11, 2009

Final Examination Statistics 200C. T. Ferguson June 11, 2009 Final Examination Statistics 00C T. Ferguson June, 009. (a) Define: X n converges in probability to X. (b) Define: X m converges in quadratic mean to X. (c) Show that if X n converges in quadratic mean

More information

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3.

t x 1 e t dt, and simplify the answer when possible (for example, when r is a positive even number). In particular, confirm that EX 4 = 3. Mathematical Statistics: Homewor problems General guideline. While woring outside the classroom, use any help you want, including people, computer algebra systems, Internet, and solution manuals, but mae

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford.

Generalized Method of Moments: I. Chapter 9, R. Davidson and J.G. MacKinnon, Econometric Theory and Methods, 2004, Oxford. Generalized Method of Moments: I References Chapter 9, R. Davidson and J.G. MacKinnon, Econometric heory and Methods, 2004, Oxford. Chapter 5, B. E. Hansen, Econometrics, 2006. http://www.ssc.wisc.edu/~bhansen/notes/notes.htm

More information

Section 9: Generalized method of moments

Section 9: Generalized method of moments 1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i

More information

Eco517 Fall 2014 C. Sims MIDTERM EXAM

Eco517 Fall 2014 C. Sims MIDTERM EXAM Eco57 Fall 204 C. Sims MIDTERM EXAM You have 90 minutes for this exam and there are a total of 90 points. The points for each question are listed at the beginning of the question. Answer all questions.

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Minimum Message Length Analysis of the Behrens Fisher Problem

Minimum Message Length Analysis of the Behrens Fisher Problem Analysis of the Behrens Fisher Problem Enes Makalic and Daniel F Schmidt Centre for MEGA Epidemiology The University of Melbourne Solomonoff 85th Memorial Conference, 2011 Outline Introduction 1 Introduction

More information

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω)

Online Appendix. j=1. φ T (ω j ) vec (EI T (ω j ) f θ0 (ω j )). vec (EI T (ω) f θ0 (ω)) = O T β+1/2) = o(1), M 1. M T (s) exp ( isω) Online Appendix Proof of Lemma A.. he proof uses similar arguments as in Dunsmuir 979), but allowing for weak identification and selecting a subset of frequencies using W ω). It consists of two steps.

More information

Econ 583 Final Exam Fall 2008

Econ 583 Final Exam Fall 2008 Econ 583 Final Exam Fall 2008 Eric Zivot December 11, 2008 Exam is due at 9:00 am in my office on Friday, December 12. 1 Maximum Likelihood Estimation and Asymptotic Theory Let X 1,...,X n be iid random

More information

Maximum Likelihood Tests and Quasi-Maximum-Likelihood

Maximum Likelihood Tests and Quasi-Maximum-Likelihood Maximum Likelihood Tests and Quasi-Maximum-Likelihood Wendelin Schnedler Department of Economics University of Heidelberg 10. Dezember 2007 Wendelin Schnedler (AWI) Maximum Likelihood Tests and Quasi-Maximum-Likelihood10.

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) =

σ(a) = a N (x; 0, 1 2 ) dx. σ(a) = Φ(a) = Until now we have always worked with likelihoods and prior distributions that were conjugate to each other, allowing the computation of the posterior distribution to be done in closed form. Unfortunately,

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

Generalized Linear Models Introduction

Generalized Linear Models Introduction Generalized Linear Models Introduction Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Linear Models For many problems, standard linear regression approaches don t work. Sometimes,

More information