The Ordinary Least Squares (OLS) Estimator

The Ordnary Least Squares (OLS) Estmator 1

Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal scence, economcs, management, lfe and bologcal scence, and the socal scence Regresson analyss may be the most wdely used statstcal technque 2

Example 1: delvery tme v.s. delvery volume Suspect that the tme requred by a route delveryman to load and servce a machne s related to the number of cases of product delvered 25 randomly chosen retal outlet The n-outlet delvery tme and the volume of product delvery Scatter dagram: dsplay a relatonshp between delvery tme and delvery volume 3

Y: delvery tme, x: delvery volume Y = 0 + 1 x + ε Error, ε: The dfference between y and 0 + 1 x A statstcal error,.e. a random varable The effects of the other varables on delvery tme, measurement errors, 6

Smple lnear regresson model: Y = 0 + 1 x + ε x: ndependent (predctor, regressor) varable Y: dependent (response) varable ε : error If x s fxed, Y s determned by ε. Suppose that E(ε) = 0 and Var(ε) = 2. Then E(Y x) = E( 0 + 1 x + ) = 0 + 1 x Var(Y x) = Var( 0 + 1 x + ) = 2 7

The true regresson lne s a lne of mean values: the heght of the regresson lne at any x s the expected value of Y for that x. The slope, 1 : the change n the mean of Y for a unt change n x The varablty of Y at x s determned by the varance of the error 8

Example: E(Y x) = 3.5 + 2 x, and Var(Y x) = 2 Y x ~ N( 0 + 1 x, 2 ) 2 small: the observed values wll fall close the lne. 2 large: the observed values may devate consderably from the lne. 9

The regresson equaton s only an approxmaton to the true functonal relatonshp between the varables. Regresson model: Emprcal model 11

Vald only over the regon of the regressor varables contaned n the observed data! 13

Multple lnear regresson model: Y = 0 + 1 x 1 + + k x k + ε Lnear: the model s lnear n the parameters, 0, 1,, k, not because Y s a lnear functon of x s. 14

Two mportant objectves: Estmate the unknown parameters (fttng the model to the data): The method of least squares. Model adequacy checkng: An teratve procedure to choose an approprate regresson model to descrbe the data. Remarks: Don t mply a cause-effect relatonshp between the varables Can ad n confrmng a cause-effect relatonshp, but t s not the sole bass! Part of a broader data-analyss approach 15

The Least Squares Estmator Y = 0 + 1 x + ε x: regressor varable Y: response varable 0 : the ntercept, unknown 1 : the slope, unknown ε: error wth E(ε) = 0 and Var(ε) = 2 (unknown) The errors are uncorrelated. 16

Gven x, E(Y x) = E( 0 + 1 x + ) = 0 + 1 x Var(Y x) = Var( 0 + 1 x + ) = 2 Responses are also uncorrelated. Regresson coeffcents: 0, 1 1 : the change of E(Y x) by a unt change n x 0 : E(Y x=0) 17

Least-squares Estmaton of the Parameters Estmaton of 0 and 1 Data: n pars: (y, x ), = 1,, n Method of least squares: Mnmze n S( 0, 1) [ y ( 0 1x )] 1 2 18

Least-squares normal equatons: 19

The least-squares estmator: 20

The ftted smple regresson model: A pont estmate of the mean of y for a partcular x Resdual: An mportant role n nvestgatng the adequacy of the ftted regresson model and n detectng departures from the underlyng assumpton! 21

Example 2: The Rocket Propellant Data Shear strength s related to the age n weeks of the batch of sustaner propellant. 20 observatons From scatter dagram, there s a strong relatonshp between shear strength (Y) and propellant age (x). Assumpton Y = 0 + 1 x + ε 22

S S ˆ ˆ xx xy 1 0 S S y xy xx x x 2 y nx 2 1106.56 nxy 41112.65 37.15 ˆ x 1 2627.82 The least-square ft: yˆ 2627.82 37. 15x 24

How well does ths equaton ft the data? Is the model lkely to be useful as a predctor? Are any of the basc assumpton volated and f so how serous s ths? 25

Propertes of the Least-Squares Estmators and the Ftted Regresson Model ˆ and ˆ are lnear combnatons of y ˆ ˆ 0 1 0 n 1 y, c ( x 1 y ˆ x 1 c x) / ˆ and ˆ are unbased estmators. 1 0 S xx 26

27 0 1 1 0 1 0 1 1 0 1 1 ) ˆ ( ) ˆ ( ) ( ) ( ) ( ) ˆ ( x x x y E E x c y E c y c E E n xx xx S x x S c y Var c y c Var Var 2 2 2 2 2 2 2 1 ) ( ) ( ) ( ) ˆ ( ) 1 ( ) ˆ ( 2 2 0 S xx x n Var

Classcal Lnear Regresson Assumptons 1. Regresson s lnear n parameters 2. Error term has zero populaton mean 3. Error term s not correlated wth X s 4. No seral correlaton 5. No heteroskedastcty 6. No perfect multcollnearty and we usually add: 7. Error term s normally dstrbuted (*We dd not use ths n dervng the OLS for t s a non-parametrc estmator. A good property.)

Gauss-Markov Theorem Gven OLS assumptons 1 through 6, the OLS estmator of β k s the mnmum varance estmator from the set of all lnear unbased estmators of β k for k=0,1,2,,k. That s, the OLS s the BLUE (Best Lnear Unbased Estmator) ~~~~~~~~ * Furthermore, by addng assumpton 7 (normalty), one can show that OLS = MLE and s the BUE (Best Unbased Estmator) also called the UMVUE.

Gauss-Markov Theorem Can you prove ths theorem? Ths s your Quz 2. Last but not the least, we thank colleagues who have uploaded ther lecture notes on the nternet!