Regression Estimation Least Squares and Maximum Likelihood

Size: px
Start display at page:

Download "Regression Estimation Least Squares and Maximum Likelihood"

Transcription

1 Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, Linear Regression Models Lecture 3, Slide 1

2 Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q= n i=1 (Y i (β 0 +β 1 X i )) 2 Minimize this by maximizing Q Find partials and set both equal to zero go to board Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 2

3 Normal Equations The result of this maximization step are called the normal equations. b 0 and b 1 are called point estimators of β 0 and β 1 respectively Yi = nb 0 +b 1 Xi Xi Y i = b 0 Xi +b 1 X 2 i This is a system of two equations and two unknowns. The solution is given by Write these on board Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 3

4 Solution to Normal Equations After a lot of algebra one arrives at b 1 = (Xi X)(Y i Ȳ) (Xi X) 2 b 0 = Ȳ b 1 X X = Ȳ = Xi n Yi n Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 4

5 Least Squares Fit? Estimate, y = 2.09x , mse: 4.15 True, y = 2x + 9, mse: 4.22 Response/Output Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 5

6 Guess # Guess, y = 0x , mse: 37.1 True, y = 2x + 9, mse: 4.22 Response/Output Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 6

7 Guess # Guess, y = 1.5x + 13, mse: 7.84 True, y = 2x + 9, mse: 4.22 Response/Output Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 7

8 Looking Ahead: Matrix Least Squares Y 1 Y 2.. = X 1 1 X [ β1 β 0 ] Y n X n 1 Solution to this equation is solution to least squares linear regression (and maximum likelihood under normal error distribution assumption) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 8

9 Questions to Ask Is the relationship really linear? What is the distribution of the of errors? Is the fit good? How much of the variability of the response is accounted for by including the predictor variable? Is the chosen predictor variable the best one? Frank Wood, Linear Regression Models Lecture 3, Slide 9

10 Is This Better? 40 7 Order, mse: Response/Output Predictor/Input Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 10

11 Goals for First Half of Course How to do linear regression Self familiarization with software tools How to interpret standard linear regression results How to derive tests How to assess and address deficiencies in regression models Frank Wood, Linear Regression Models Lecture 3, Slide 11

12 Properties of Solution The i th residual is defined to be e i =Y i Ŷi The sum of the residuals is zero: e i = (Y i b 0 b 1 X i ) i = Y i nb 0 b 1 Xi = 0 By first normal equation. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 12

13 Properties of Solution The sum of the observed values Y i equals the sum of the fitted valuesŷ i i Y i = i = i Ŷ i (b 1 X i +b 0 ) = (b 1 X i +Ȳ b 1 X) i = b 1 X i +nȳ b 1 n X = b 1 n X+ i i Y i b 1 n X Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 13

14 Properties of Solution The sum of the weighted residuals is zero when the residual in the i th trial is weighted by the level of the predictor variable in the i th trial i X i e i = (X i (Y i b 0 b 1 X i )) = i X i Y i b 0 Xi b 1 (X 2 i ) = 0 By second normal equation. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 14

15 Properties of Solution The sum of the weighted residuals is zero when the residual in the i th trial is weighted by the fitted value of the response variable for the i th trial Ŷ i e i = (b 0 +b 1 X i )e i i i = b 0 e i +b 1 e i X i = 0 i i By previous properties. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 15

16 Properties of Solution The regression line always goes through the point X,Ȳ Frank Wood, Linear Regression Models Lecture 3, Slide 16

17 Estimating Error Term Variance σ 2 Review estimation in non-regression setting. Show estimation results for regression setting. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 17

18 Estimation Review An estimator is a rule that tells how to calculate the value of an estimate based on the measurements contained in a sample i.e. the sample mean Ȳ = 1 n n i=1 Y i Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 18

19 Point Estimators and Bias Point estimator ˆθ=f({Y 1,...,Y n }) Unknown quantity / parameter Definition: Bias of estimator θ B(ˆθ)=E(ˆθ) θ Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 19

20 One Sample Example µ = 5, σ = 0.75 samples θ est. θ run bias_example_plot.m Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 20

21 Distribution of Estimator If the estimator is a function of the samples and the distribution of the samples is known then the distribution of the estimator can (often) be determined Methods Distribution (CDF) functions Transformations Moment generating functions Jacobians (change of variable) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 21

22 Example Samples from a Normal(µ,σ 2 ) distribution Y i Normal(µ,σ 2 ) Estimate the population mean θ=µ, ˆθ= Ȳ = 1 n n i=1 Y i Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 22

23 Sampling Distribution of the Estimator First moment E(ˆθ) = E( 1 n = 1 n n i=1 n Y i ) i=1 This is an example of an unbiased estimator B(ˆθ)=E(ˆθ) θ=0 E(Y i )= nµ n =θ Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 23

24 Variance of Estimator Definition: Variance of estimator V(ˆθ)=E([ˆθ E(ˆθ)] 2 ) Remember: V(cY)=c 2 V(Y) V( n i=1 Y i)= n i=1 V(Y i) Only if the Y i are independent with finite variance Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 24

25 Example Estimator Variance For N(0,1) mean estimator V(ˆθ) = V( 1 n = 1 n 2 n Note assumptions i=1 n i=1 Y i ) V(Y i )= nσ2 n = σ2 2 n Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 25

26 Distribution of sample mean estimator samples Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 26

27 Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ)=E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ)=V(ˆθ)+(B(ˆθ) 2 ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 27

28 MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]+[E(ˆθ) θ]) 2 ) = E([ˆθ E(ˆθ)] 2 )+2E([E(ˆθ) θ][ˆθ E(ˆθ)])+E([E(ˆθ) θ] 2 ) = V(ˆθ)+2E([E(ˆθ)[ˆθ E(ˆθ)] θ[ˆθ E(ˆθ)]))+(B(ˆθ)) 2 = V(ˆθ)+2(0+0)+(B(ˆθ)) 2 = V(ˆθ)+(B(ˆθ)) 2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 28

29 Trade-off Think of variance as confidence and bias as correctness. Intuitions (largely) apply Sometimes a biased estimator can produce lower MSE if it lowers the variance. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 29

30 Estimating Error Term Variance σ 2 Regression model Variance of each observation Y i is σ 2 (the same as for the error term ǫ i ) Each Y i comes from a different probability distribution with different means that depend on the level X i The deviation of an observation Y i must be calculated around its own estimated mean. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 30

31 s 2 estimator for σ 2 s 2 =MSE= SSE n 2 = (Yi Ŷ i ) 2 n 2 = e 2 i n 2 MSE is an unbiased estimator of σ 2 E(MSE)=σ 2 The sum of squares SSE has n-2 degrees of freedom associated with it. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 31

32 Normal Error Regression Model No matter how the error terms ǫ i are distributed, the least squares method provides unbiased point estimators of β 0 and β 1 that also have minimum variance among all unbiased linear estimators To set up interval estimates and make tests we need to specify the distribution of the ǫ i We will assume that the ǫ i are normally distributed. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 32

33 Normal Error Regression Model Y i =β 0 +β 1 X i +ǫ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor variable in the i th trial ǫ i ~ iid N(0,σ 2 ) i = 1,,n Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 33

34 Notational Convention When you see ǫ i ~ iid N(0,σ 2 ) It is read as ǫ i is distributed identically and independently according to a normal distribution with mean 0 and variance σ 2 Examples θ ~ Poisson(λ) z ~ G(θ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 34

35 Maximum Likelihood Principle The method of maximum likelihood chooses as estimates those values of the parameters that are most consistent with the sample data. Frank Wood, Linear Regression Models Lecture 3, Slide 35

36 If Likelihood Function X i F(Θ),i=1...n then the likelihood function is L({X i } n i=1,θ)= n i=1 F(X i;θ) Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 36

37 Example, N(10,3) Density, Single Obs Samples N(10, 3) Density N=10, - log likelihood = Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 37

38 Example, N(10,3) Density, Single Obs. Again Samples N(10, 3) Density N=10, - log likelihood = Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 38

39 Example, N(10,3) Density, Multiple Obs Samples N(10, 3) Density N=10, - log likelihood = Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 39

40 Maximum Likelihood Estimation The likelihood function can be maximized w.r.t. the parameter(s) Θ, doing this one can arrive at estimators for parameters as well. L({X i } n i=1,θ)= n i=1 F(X i;θ) To do this, find solutions to (analytically or by following gradient) dl({x i } n i=1,θ) dθ =0 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 40

41 Important Trick Never (almost) maximize the likelihood function, maximize the log likelihood function instead. n log(l({x i } n i=1,θ)) = log( F(X i ;Θ)) = i=1 n log(f(x i ;Θ)) i=1 Quite often the log of the density is easier to work with mathematically. Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 41

42 ML Normal Regression Likelihood function L(β 0,β 1,σ 2 ) = = n i=1 1 (2πσ 2 ) 1/2e 1 2σ 2(Y i β 0 β 1 X i ) 2 1 (2πσ 2 ) n/2e 1 2σ 2 n i=1 (Y i β 0 β 1 X i ) 2 which if you maximize (how?) w.r.t. to the parameters you get Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 42

43 Maximum Likelihood Estimator(s) β 0 b 0 same as in least squares case β 1 b 1 same as in least squares case σ 2 ˆσ 2 = i (Y i Ŷ i ) 2 n Note that ML estimator is biased as s 2 is unbiased and s 2 =MSE= n n 2ˆσ2 Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 43

44 Comments Least squares minimizes the squared error between the prediction and the true output The normal distribution is fully characterized by its first two central moments (mean and variance) Food for thought: What does the bias in the ML estimator of the error variance mean? And where does it come from? Frank Wood, Linear Regression Models Lecture 3, Slide 44

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Inference in Regression Analysis

Inference in Regression Analysis Inference in Regression Analysis Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 4, Slide 1 Today: Normal Error Regression Model Y i = β 0 + β 1 X i + ǫ i Y i value

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1

Multiple Regression. Dr. Frank Wood. Frank Wood, Linear Regression Models Lecture 12, Slide 1 Multiple Regression Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 12, Slide 1 Review: Matrix Regression Estimation We can solve this equation (if the inverse of X

More information

Remedial Measures, Brown-Forsythe test, F test

Remedial Measures, Brown-Forsythe test, F test Remedial Measures, Brown-Forsythe test, F test Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 7, Slide 1 Remedial Measures How do we know that the regression function

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

ECE531 Lecture 10b: Maximum Likelihood Estimation

ECE531 Lecture 10b: Maximum Likelihood Estimation ECE531 Lecture 10b: Maximum Likelihood Estimation D. Richard Brown III Worcester Polytechnic Institute 05-Apr-2011 Worcester Polytechnic Institute D. Richard Brown III 05-Apr-2011 1 / 23 Introduction So

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Statistics and Econometrics I

Statistics and Econometrics I Statistics and Econometrics I Point Estimation Shiu-Sheng Chen Department of Economics National Taiwan University September 13, 2016 Shiu-Sheng Chen (NTU Econ) Statistics and Econometrics I September 13,

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 12: Frequentist properties of estimators (v4) Ramesh Johari ramesh.johari@stanford.edu 1 / 39 Frequentist inference 2 / 39 Thinking like a frequentist Suppose that for some

More information

Inference in Normal Regression Model. Dr. Frank Wood

Inference in Normal Regression Model. Dr. Frank Wood Inference in Normal Regression Model Dr. Frank Wood Remember We know that the point estimator of b 1 is b 1 = (Xi X )(Y i Ȳ ) (Xi X ) 2 Last class we derived the sampling distribution of b 1, it being

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

HT Introduction. P(X i = x i ) = e λ λ x i

HT Introduction. P(X i = x i ) = e λ λ x i MODS STATISTICS Introduction. HT 2012 Simon Myers, Department of Statistics (and The Wellcome Trust Centre for Human Genetics) myers@stats.ox.ac.uk We will be concerned with the mathematical framework

More information

1. Simple Linear Regression

1. Simple Linear Regression 1. Simple Linear Regression Suppose that we are interested in the average height of male undergrads at UF. We put each male student s name (population) in a hat and randomly select 100 (sample). Then their

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Chapters 9. Properties of Point Estimators

Chapters 9. Properties of Point Estimators Chapters 9. Properties of Point Estimators Recap Target parameter, or population parameter θ. Population distribution f(x; θ). { probability function, discrete case f(x; θ) = density, continuous case The

More information

5.2 Fisher information and the Cramer-Rao bound

5.2 Fisher information and the Cramer-Rao bound Stat 200: Introduction to Statistical Inference Autumn 208/9 Lecture 5: Maximum likelihood theory Lecturer: Art B. Owen October 9 Disclaimer: These notes have not been subjected to the usual scrutiny reserved

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1

Estimation MLE-Pandemic data MLE-Financial crisis data Evaluating estimators. Estimation. September 24, STAT 151 Class 6 Slide 1 Estimation September 24, 2018 STAT 151 Class 6 Slide 1 Pandemic data Treatment outcome, X, from n = 100 patients in a pandemic: 1 = recovered and 0 = not recovered 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 0 1 1 1

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Intro to Linear Regression

Intro to Linear Regression Intro to Linear Regression Introduction to Regression Regression is a statistical procedure for modeling the relationship among variables to predict the value of a dependent variable from one or more predictor

More information

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009 EAMINERS REPORT & SOLUTIONS STATISTICS (MATH 400) May-June 2009 Examiners Report A. Most plots were well done. Some candidates muddled hinges and quartiles and gave the wrong one. Generally candidates

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30

MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD. Copyright c 2012 (Iowa State University) Statistics / 30 MISCELLANEOUS TOPICS RELATED TO LIKELIHOOD Copyright c 2012 (Iowa State University) Statistics 511 1 / 30 INFORMATION CRITERIA Akaike s Information criterion is given by AIC = 2l(ˆθ) + 2k, where l(ˆθ)

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Parametric Techniques Lecture 3

Parametric Techniques Lecture 3 Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Measuring the fit of the model - SSR

Measuring the fit of the model - SSR Measuring the fit of the model - SSR Once we ve determined our estimated regression line, we d like to know how well the model fits. How far/close are the observations to the fitted line? One way to do

More information

Advanced Signal Processing Introduction to Estimation Theory

Advanced Signal Processing Introduction to Estimation Theory Advanced Signal Processing Introduction to Estimation Theory Danilo Mandic, room 813, ext: 46271 Department of Electrical and Electronic Engineering Imperial College London, UK d.mandic@imperial.ac.uk,

More information

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation

BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation BIO5312 Biostatistics Lecture 13: Maximum Likelihood Estimation Yujin Chung November 29th, 2016 Fall 2016 Yujin Chung Lec13: MLE Fall 2016 1/24 Previous Parametric tests Mean comparisons (normality assumption)

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Remedial Measures Wrap-Up and Transformations Box Cox

Remedial Measures Wrap-Up and Transformations Box Cox Remedial Measures Wrap-Up and Transformations Box Cox Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Last Class Graphical procedures for determining appropriateness

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Due Thursday, September 19, in class What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments

STAT 135 Lab 3 Asymptotic MLE and the Method of Moments STAT 135 Lab 3 Asymptotic MLE and the Method of Moments Rebecca Barter February 9, 2015 Maximum likelihood estimation (a reminder) Maximum likelihood estimation Suppose that we have a sample, X 1, X 2,...,

More information

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58

ML and REML Variance Component Estimation. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 58 ML and REML Variance Component Estimation Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 58 Suppose y = Xβ + ε, where ε N(0, Σ) for some positive definite, symmetric matrix Σ.

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11

Econometrics A. Simple linear model (2) Keio University, Faculty of Economics. Simon Clinet (Keio University) Econometrics A October 16, / 11 Econometrics A Keio University, Faculty of Economics Simple linear model (2) Simon Clinet (Keio University) Econometrics A October 16, 2018 1 / 11 Estimation of the noise variance σ 2 In practice σ 2 too

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Parametric Techniques

Parametric Techniques Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure

More information

Theory of Statistics.

Theory of Statistics. Theory of Statistics. Homework V February 5, 00. MT 8.7.c When σ is known, ˆµ = X is an unbiased estimator for µ. If you can show that its variance attains the Cramer-Rao lower bound, then no other unbiased

More information

Statistics II Lesson 1. Inference on one population. Year 2009/10

Statistics II Lesson 1. Inference on one population. Year 2009/10 Statistics II Lesson 1. Inference on one population Year 2009/10 Lesson 1. Inference on one population Contents Introduction to inference Point estimators The estimation of the mean and variance Estimating

More information

Linear Models in Machine Learning

Linear Models in Machine Learning CS540 Intro to AI Linear Models in Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu We briefly go over two linear models frequently used in machine learning: linear regression for, well, regression,

More information

Parametric Inference

Parametric Inference Parametric Inference Moulinath Banerjee University of Michigan April 14, 2004 1 General Discussion The object of statistical inference is to glean information about an underlying population based on a

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Introduction to Estimation Methods for Time Series models Lecture 2

Introduction to Estimation Methods for Time Series models Lecture 2 Introduction to Estimation Methods for Time Series models Lecture 2 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 2 SNS Pisa 1 / 21 Estimators:

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

F & B Approaches to a simple model

F & B Approaches to a simple model A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 215 http://www.astro.cornell.edu/~cordes/a6523 Lecture 11 Applications: Model comparison Challenges in large-scale surveys

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Machine Learning Basics: Maximum Likelihood Estimation

Machine Learning Basics: Maximum Likelihood Estimation Machine Learning Basics: Maximum Likelihood Estimation Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics 1. Learning

More information

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2

Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Lecture 2: Basic Concepts and Simple Comparative Experiments Montgomery: Chapter 2 Fall, 2013 Page 1 Random Variable and Probability Distribution Discrete random variable Y : Finite possible values {y

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Linear Models A linear model is defined by the expression

Linear Models A linear model is defined by the expression Linear Models A linear model is defined by the expression x = F β + ɛ. where x = (x 1, x 2,..., x n ) is vector of size n usually known as the response vector. β = (β 1, β 2,..., β p ) is the transpose

More information

Machine Learning Basics: Estimators, Bias and Variance

Machine Learning Basics: Estimators, Bias and Variance Machine Learning Basics: Estiators, Bias and Variance Sargur N. srihari@cedar.buffalo.edu This is part of lecture slides on Deep Learning: http://www.cedar.buffalo.edu/~srihari/cse676 1 Topics in Basics

More information

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That

Statistics. Lecture 2 August 7, 2000 Frank Porter Caltech. The Fundamentals; Point Estimation. Maximum Likelihood, Least Squares and All That Statistics Lecture 2 August 7, 2000 Frank Porter Caltech The plan for these lectures: The Fundamentals; Point Estimation Maximum Likelihood, Least Squares and All That What is a Confidence Interval? Interval

More information

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO.

SSR = The sum of squared errors measures how much Y varies around the regression line n. It happily turns out that SSR + SSE = SSTO. Analysis of variance approach to regression If x is useless, i.e. β 1 = 0, then E(Y i ) = β 0. In this case β 0 is estimated by Ȳ. The ith deviation about this grand mean can be written: deviation about

More information

EIE6207: Estimation Theory

EIE6207: Estimation Theory EIE6207: Estimation Theory Man-Wai MAK Dept. of Electronic and Information Engineering, The Hong Kong Polytechnic University enmwmak@polyu.edu.hk http://www.eie.polyu.edu.hk/ mwmak References: Steven M.

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

STAT 730 Chapter 4: Estimation

STAT 730 Chapter 4: Estimation STAT 730 Chapter 4: Estimation Timothy Hanson Department of Statistics, University of South Carolina Stat 730: Multivariate Analysis 1 / 23 The likelihood We have iid data, at least initially. Each datum

More information

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow)

STAT420 Midterm Exam. University of Illinois Urbana-Champaign October 19 (Friday), :00 4:15p. SOLUTIONS (Yellow) STAT40 Midterm Exam University of Illinois Urbana-Champaign October 19 (Friday), 018 3:00 4:15p SOLUTIONS (Yellow) Question 1 (15 points) (10 points) 3 (50 points) extra ( points) Total (77 points) Points

More information

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise.

p y (1 p) 1 y, y = 0, 1 p Y (y p) = 0, otherwise. 1. Suppose Y 1, Y 2,..., Y n is an iid sample from a Bernoulli(p) population distribution, where 0 < p < 1 is unknown. The population pmf is p y (1 p) 1 y, y = 0, 1 p Y (y p) = (a) Prove that Y is the

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee

Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee Stochastic Processes Methods of evaluating estimators and best unbiased estimators Hamid R. Rabiee 1 Outline Methods of Mean Squared Error Bias and Unbiasedness Best Unbiased Estimators CR-Bound for variance

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE)

ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) 1 ELEG 5633 Detection and Estimation Minimum Variance Unbiased Estimators (MVUE) Jingxian Wu Department of Electrical Engineering University of Arkansas Outline Minimum Variance Unbiased Estimators (MVUE)

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng

Nonparametric Regression and Bonferroni joint confidence intervals. Yang Feng Nonparametric Regression and Bonferroni joint confidence intervals Yang Feng Simultaneous Inferences In chapter 2, we know how to construct confidence interval for β 0 and β 1. If we want a confidence

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information