University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation

Size: px
Start display at page:

Download "University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation"

Transcription

1 Uversty of Belgrade Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Faculty of Mathematcs Master thess Regresso ad Correlato The caddate Supervsor Karma Ibrahm Soufya Vesa Jevremovć Jue 014 Cotas

2 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Item Theme Page Itroducto Regresso aalyss Chapter oe 1 Correlato 1.1 Pearso`s coeffcet correlato Scatter Dagram Spearma coeffcet correlato Regresso 7 1. Regresso Equato Curve of Regresso Types of Regresso Lear Regresso Equato of Y o X Assumptos ecessary about the regresso model Estmatg the populato slope ad tercept 9 Chapter two The Method of least squares The Method of least squares 10.1 Resduals 1. Propertes of estmator of the slope 14.3 Propertes of estmator of the tercept 16.4 Testg the valdty of the model 17.5 Predcto tervals for the actual value of Y 18.6 Coeffcet of determato 19 Chapter three Nolear Models 3.1 Polyomal regresso 0 3. Expoetal regresso 3.3 Other curvlear models 4 Chapter four Dagostcs for smple lear regresso 4.1 Vald ad Ivald Regresso Models: Ascombe s Four Data Sets 4. Plots of resduals Regresso Dagostcs: Tools for Checkg the Valdty of a Model 4.3 Leverage Pots Stadardzed Resduals Assessg the Ifluece of Certa Cases Normalty of the Errors 39 Chapter fve Cocluso 5

3 Refereces Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Itroducto Regresso s a statstcal tool for the vestgato of relatoshps betwee varables. Usually, the vestgator seeks to ascerta the casual effect of oe varable upo aother. Regresso methods are meat to determe the best fuctoal relatoshp betwee a depedet varable Y wth oe or more depedet varables X. The earlest form of regresso was the method of least squares, whch was publshed by Legedre 1805 ad by Gauss Legedre ad Gauss both appled the method to the problem of determg, from astroomcal observatos, the orbts of bodes about the Su. Gauss publshed a further developmet of the theory of least squares 181. The term regresso was coed by Sr Fracs Galto, whle studyg the lear relatoshp betwee heghts of sos ad heghts of ther fathers. Ths thess focuses o tools ad techques for buldg regresso models usg real data ad assessg ther valdty. A key theme throughout the thess s that t makes sese to base fereces or coclusos oly o vald models. We wll show that plots are mportat tool for buldg regresso models ad assessg ther valdty through approprate dagostc procedures. The regresso output ad plots that appear thess have bee geerated usg statstcal software R. I addto, real data sets that have appeared lterature are also cosdered. The thess s dvded to fve chapters. I the frst chapter we troduce Pearso`s correlato coeffcet, dscuss mportace of scatter plots ad defe regresso. The secod chapter focuses o method of least squares ad o statstcal propertes of regresso coeffcets. Ad we show how to test valdty of the method ad we defe predcto tervals. I the thrd chapter we gve examples of olear regresso models. The fourth chapter dscusses dagostc procedures for smple lear regresso. I the ffth chapter we preset our coclusos. 1- Regresso Aalyss Frst we wll troduce Pearso`s ad Spearma`s correlato coeffcets as two dfferet measures of correlato or assocato betwee two varables.

4 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade 1-1 Correlato We wll ote that the correlato betwee the two varables the populato wth the Greek letter ρ, ad Pearso`s correlato coeffcet, as a estmate of ρ, wth the letter r,pearso`s correlato coeffcet ca assume ay value the terval (-1,1).The absolute value of r (.e. r ) dcate the stregth of the relatoshp betwee the two varables.as the absolute value of r approaches 1, the degree of lear relatoshp betwee the varables becomes stroger, achevg the maxmum whe r =1 (.e. whe r equals +1 or -1 ). The closer the absolute value of r s to 0, the weaker the lear relatoshp s betwee the two varables. Pearso s correlato coeffcet determes the degree to whch a lear relatoshp exsts betwee two varables. The sg of r dcates the ature or drecto of the lear relatoshp whch betwee two varables, the postve sg dcates a drect lear relatoshp, ad the egatve sg dcates a drect lear relatoshp. A drect lear relatoshp s oe whch a chage o oe varable s assocated wth a chage o the other varable the same drecto (.e., a crease o oe varable s assocated wth a crease o the other varable, ad a decrease o oe varable s assocated wth a decrease o the other varable). A drect relatoshp s oe whch a chage o oe varable s assocated wth a chage o the other varable the opposte drecto (.e., a crease o oe varable s assocated wth a decrease o the other varable, ad a decrease o oe varable s assocated wth a crease o the other varable). The use of the Pearso s correlato coeffcet assumes that a lear fucto best descrbes the relatoshp betwee the two varables, ad these two varables have ormal dstrbuto. If, however, the relatoshp betwee the varables s better descrbed by a curvlear fucto, Pearso`s correlato coeffcet s ot approprate measure of correlato. Calculato of Pearso s correlato coeffcet r Let ( x 1, y 1 ),( coeffcet s equal to x, y ( x, y ) be pared observatos, the Pearso s correlato r= (x x ) ( y ý) ( x x ) ( y ý) or smply

5 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade where x= x, ý= y r= x y x ý s x s y are sample meas ad = s 1 y ( y ý) are stadard sample devatos. If we use x =x x y = y ý, the we get smplfed for Pearso s correlato coeffcet Scatter Dagram Let us have pars of values ( r= x ' ' y x ' x, y x 1, y 1, ) ( y ' s x = 1 x, y. (x x ), I scatter dagram the varable X s show alog the x-axs ad the varable Y s show alog the y-axs ad all the pars of values of X ad Y are show by pots (or dots) o the graph. The scatter dagram of these pots reveals the ature ad stregth of correlato betwee these varable X ad Y. Degrees of correlato betwee two varables are show o Fgure 1. As we ca see, whe there s o correlato, pots o the scatter plot are dstrbuted radomly. Also, we observe the followg: If the pots le o a straght le rsg from lower left to upper rght, the there s a perfect postve correlato betwee the varables X ad Y. If all the pots do ot le o a straght le, but ther tedecy s to rse from lower left to upper rght

6 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade the there s a postve correlato betwee the varable X ad Y. I these cases the two varables X ad Y are the same drecto ad the assocato betwee the varables s drect. If the movemets of the varables X ad Y are opposte drecto ad the scatter dagram s a straght le, the correlato s sad to be egatve, assocato betwee the varables s sad to be drect. A scatter plot of the data lke that gve Fgure1 should always be draw to obta a dea of the sort of relatoshp ( f ay) that exsts betwee two varables (e.g., lear, quadratc, expoetal, etc.). Fgure 1. The degrees of correlato Example 1: For the followg data draw scatter dagram ad calculate Pearso s correlato coeffcet. X Y

7 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Fgure. Scatter plot for gve data Frst we calculate sample meas, sample stadard devatos. x 9, y 1.57, 7 1 x y 90 s, X Ad the Pearso s correlato coeffcet s equal to r 7 1 x 7 s X y s xy Y x x 4, s Y y y We ca see that correlato betwee varables X ad Y s very strog. By lookg at the scatter plot t seems that Y s a lear fucto of X. It s mportat to ote that strog correlato does ot mea that there s cause-effect relatoshp betwee two varables. By examg the value of correlato coeffcet, we may coclude that two varables are related, but we ca`t draw cocluso that oe varable causes the other. Varable (or varables ) whch have ot bee cosdered ca be resposble for the observed correlato betwee the two varables.

8 1..3-Spearma`s correlato coeffcet Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Spearma s correlato coeffcet measures degree of mootoc relatoshp betwee two radom varables. Mootoc relatoshp ca be creasg (postve correlato - crease values of varable X s followed by crease values of varable Y) or decreasg (egatve correlato - crease values of varable X s followed by decrease values of varable Y). Mootocally creasg, mootocally decreasg ad o-mootoc relatoshp betwee varables are show o Fgure 3 Fgure 3. Mootoc ad o-mootoc relatoshps betwee varables Spearma s populato correlato coeffcet s deoted wth r S S ad Spearma s sample correlato coeffcet wth. Whe ths coeffcet s equal to -1 or +1, perfect mootoc relatoshp exsts betwee radom varables X ad Y. We wll descrbe how to calculate Spearma s correlato coeffcet three steps. We ( x1, y1),( x, y ),..., ( x, y ) have pars of sample values of radom varables X ad Y. 1. Rak the values of varable X ad the values of varable Y, separately. For each elemet of sample we wll have par of raks R, R X Y.. Form dffereces of these raks d R X R Y.

9 3. Calculate Spearma s correlato coeffcet by the formula r S 6 1 d 1 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade 1.- Regresso Whe we are terested what value of radom varable Y s expected whe X=x, we come to codtoal mathematcal expectato E(Y X =x), the expected value of Y whe X takes the specfc value x. ad f If (X,Y) s b-dmesoal cotuous radom varable wth desty fucto f x (x) s desty fucto of radom varable X the Y /= X =x E y f (x, y ) f x ( x ) Set of all values E(Y X =x) s set of values of radom varable E(Y X ), so to calculate E(Y X ), we frst eed to calculate E(Y X =x) for all x. f (x, y) Radom varable E(Y X ) = R(X ) s called regresso. For example, f varable X represets day of the week ad varable Y sales at a gve compay, the the regresso of Y o X represets the mea (or average) sales o a gve day If (X,Y) s b-dmesoal ormally dstrbuted radom varable the E(Y X ) = β + β X Regresso Equato The fuctoal relatoshp of a depedet varable wth oe or more depedet varables s called a regresso equato: It s also called predcto equato (or estmatg equato) Curve of Regresso The graph of the regresso equato s called the curve of regresso: If the curve s a straght le; the t s called the le of regresso Types of Regresso

10 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade If there are oly two varables uder cosderato, the the regresso s called smple regresso. If the relatoshp betwee X ad Y s o-lear, the the regresso s curvlear. If there are more tha two varables uder cosderato the the regresso s called multple regresso. For example, Multple regresso ca be used to model relatoshp betwee sugar blood ad weght, age ad blood pressure of dabetes patets Lear Regresso Equato of Y o X x, y Data are collected pars ( x 1, y 1, ) ( x, y, where x 1 deotes the frst value of y 1 the X -varable ad deotes the frst value of the Y -varable. The X varable s called the predctor varable, whle the Y -varable s called the depedet varable. The regresso of Y o X s lear f X Y /= β 0 +β 1 E 0 1 where the ukow parameters ad determe the tercept ad the slope of a specfc Y 1, Y,..., Y straght le, respectvely. Suppose that are depedet realzatos of the radom x 1 x,..., x varable Y that are observed at the values of a radom varable X. If the regresso of Y o X s lear, the for = 1,,, Y E( Y X x ) e x e where e s the radom error 0 1 Y, wth zero expectato Assumptos ecessary about the regresso model 1. I what follows we shall make the followg assumptos: Y s related to x by the smple lear regresso model ( y =β 0 +β 1 x +e.. The errors e 1, e,,e are depedet of each other. 3. The errors e 1,e,,e have a commo varace σ.

11 4. The errors are ormally dstrbuted wth a mea of 0 ad varace e X N (0, S) e ~ N(0, ). σ, that s, Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Because the errors e are ormally dstrbuted, we also have that Y 1,Y,,Y β 0 + β 1 x,σ : ) the ý N = 1 y : N ( β +β x, σ 0 1 =1 ) Methods for checkg these four assumptos wll be cosdered the fourth chapter Estmatg the populato slope ad tercept Suppose for example that varable X represets heght ad varable Y weght of a perso. For a smple regresso model the mea weght of dvduals of a gve heght would be a lear fucto of that heght. I practce, we usually have a sample of data stead of the whole populato. The slope b 1 1 ad tercept 0 are ukow, sce these are the values for the whole populato. Thus, we wsh to use the gve data to estmate the slope ad the tercept. Ths ca be acheved by fdg the equato of the le whch best fts our data, that s, choose b 0 ad b 1 ŷ ˆ such that y b0 b1 x ^y=b 0 +b 1 x the th shall refer to as predcted value or the ftted value of ukow parameters we wll use the method of least squares. -The Method of Least Squares s as close as possble to y Y y. We Y. For estmatg these Ths method of curve fttg was suggested early the eteeth cetury by the Frech mathematca Adra Legedre. The method of least squares assumes that the best fttg le s oe for whch the sum of the squares of the vertcal dstaces of the pot (x,y) from the le s mmal. For the smple lear regresso we ca use least squares method to fd the estmators b 0 ad b 1 such that the sum of the squared dstaces betwee value y ad predcted value ^y

12 S 1 ( y x 0 1 ) Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade reaches the mmum amog all possble choces of regresso coeffcets 1. Vertcal dstaces of pots from regresso le are show o Fgure 4 Fgure 4. Vertcal dstaces of pots from regresso le 0 b o ad b 1 Mathematcally, the least squares estmates of the smple lear regresso could be obtaed by solvg the followg system: S =0, S =0 β 0 β 1 It s more coveet to solve ths system usg the ftted lear model:

13 x ^y =β o +β 1, Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade where Now sum of squared dstaces equals to β o =β o β 1 x S= [ y ( β 0 +β 1 (x x ))] =1 ad we eed to solve the followg system S β 0 =0, S β 1 =0 Takg the partal dervatves wth respect to ormal equatos). x ) [ y ( β o +β 1 (x ]=0 [ y ( β o +β 1 ( x x ) ] ( x x )=0 Note that y =β 0 + β 1 ( x x )=β 0. β 0 β 1 we get system of equatos (called Therefore, we have b 0 =^β 0 = 1 y = ý. Substtutg b 0 by ý we obta

14 x ) [ y ( ý +β 1 (x ] ( x x )=0 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Now t s easy to see ad We have b 1 = Estmator varables b 1 = b (x x) ² = b o ( y ý ) ( x x) - = S XY S XX b 1 x= ý b₁ x The ftted value of the smple regresso s defed as y 1, y,, y : N ( β 0 +β 1 x,σ ) the 1 ( y ý ) (x x ) 1 = ( x x ) y (x x ) (x x ) ^y =b o +b 1 x. ý = 1 y : N ( β 0 +β 1 x, σ ) b 1 s ormally dstrbuted, as lear combato of ormally dstrbuted radom Y.1-.Resduals. The, t also follows that estmator The dfferece betwee a observed s referred to as the th squares le to model for that partcular pot. b 0 has ormal dstrbuto. y ad the ftted value of ^y, e = y ^y regresso resdual. Its magtude reflects the falure of the least

15 We ca use these resduals to estmate ukow varace σ of radom errors ( σ =var(e) wth estmator ^σ = 1 ^e. Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Example : A regresso model for the tmg of producto rus We shall cosder the followg data: varable Y represets the tme take ( mutes) for a producto ru (ru tme) ad varable X the umber of tems (ru sze) produced for 0 radomly selected orders. We wsh to develop a equato to model the relatoshp betwee varables Y ad X. The data are gve Table 1 ad correspodg scatter plot Fgure 5. Table 1. The producto data Case Ru tme Ru sze Case Ru tme Ru sze

16 Fgure 5. A scatter plot of the producto data Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Scatter plot of the producto data shows us that there mght be lear depedece betwee ru Y tme ad ru sze. We wll cosder the lear regresso model calculate least squares estmators of the ukow parameters We have that 0 1 b b ( x x) 0 ( x x 01.75, y 0. 05, , so the estmators x)( y y) ( x x) y b1 x ( x b x)( y ad b ad y) are equal to 0 x e. Now we wll yˆ x The ftted regresso le s Now we shall gve some propertes of estmators smple lear regresso model, always havg md the assumptos gve o page 9..- Propertes of estmator of the slope Theorem 1: The least squares estmator b 1 s a ubased estmator of Proof Here we take x = 1,,, as costats,whle E (b 1 )=E ( S xy S xx ) = 1 E 1 S xx (Y Ý ) (x X ) 1. ad β 1. y Y s a radom varable.

17 Usg the fact that (x X )=0 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade We get: E (b 1 )= 1 1 (x S XX x ) Ey = 1 1 (x S xx x ) (β o +β 1 x )= 1 1 ( x S xx x )= 1 1 ( x S xx x ) β 1 (x x )= 1 1 ( x S xx x ) β 1 = S xx β =1 S 1 =β 1. xx Theorem : Varace of the estmator of the slope s Proof Y ý ()( x x)= 1 S Var xx ( 1 1 Var (b 1 )=Var( S xy S xx ) 1 S 1 xx Y ( x x) ) = 1 S Var xx (x x² Var(Y )= 1 S xx Var ( b 1 ) = = 1 σ S xx (x x ²σ ²= σ ² S xx.

18 Theorem 3: The least square estmator b 1 ad ý are ucorrelated. The uder the Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade ormalty assumpto of depedet. Proof y As we metoed before that calculate the covarace betwee for x x ( ( y ý ), ý)= b 1 ad ý =1,,,, b 1 ad ý. Cov (b 1, ý )=Cov ( S xy S xx, ý) = 1 S xx Cov (S xy, ý )= 1 S xx Cov x ( x) y, ý x x ( y,, y )= 1 S xx Cov b 1 ad ý are ormally dstrbuted ad are ormally dstrbuted ad depedet, let us

19 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade (x x)cov ( y, y j ) j=1 1 Cov S xx Note sce that, Cov ( y, y j ) =E Ee 0 Thus, we coclude that x ( x )σ =0 Cov(b 1, ý)= 1 S xx ad y Ey y j Ey ] e =E ( s are depedet, we ca wrte E, E j e e j ) = { σ f = j o f j Recall that zero correlato s equvalet to the depedece betwee two ormal varables. Thus, we coclude that b 1 ad ý are depedet..3- Propertes of estmator of the tercept Theorem 4. The least squares estmator Proof Here also we take b 0 s a ubased estmator of x,,,., as costats, whle Y s a radom varable. β o 1 β o + β 1 x β 1 x=β o E b o =E (Y b 1 x )=( 1 E Y ) x E b 1 = 1.

20 - Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Theorem 5: Varace of the estmator of the slope s: Var ( b o )=( 1 + x S xx ) σ Proof Var (b o )=Var ( ý b 1 x )=Var ( ý )+( x ) Var(b 1 )= σ + x σ S xx =( 1 + x S xx ) σ..4 - Testg the valdty of the model reduces to If the slope of the regresso le β 1 s equal to zero, the the lear regresso model Y =β 0 +e ad we ca ot predct values of varable Y based o the kow values of varable X. To test ull hypothess use the test statstcs H 0 : β 1 =0 agast the alteratve H 1 : β 1 0, b1 1 T ˆ S X we ca whch has uder ull hypothess, Studet t-dstrbuto wth - degrees of freedom. We used the fact that estmator Test s of the form: reject b 1 : N 1, S X H 0 at level α f b 1 : N (β 1, σ s ), where c t T c, where S X,1 1 x x. s obtaed from table of t Studet dstrbuto. Otherwse, we do ot reject ull hypothess ad we coclude that lear regresso model could be vald model for our data..5- Predcto tervals for the actual value of Y I ths secto we cosder the problem of fdg a predcto terval for the

21 x p Y p X=x p actual value of Y at, a gve value of X. We ote Y varable Y. We have that p =β 1 x p +β 0 +e p ad ^Y p =b 1 x p +b 0. as ths predcto value of Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade S Frst we wll derve expectato ad varace of Y ˆ p Y p E (Y p ^Y p )=E ( β 1.x P +β 0 +e P b 1. x p +b 0 )=E (b 1 β 1 ) x p + E (b 0 β 0 )+ E e P =0 \ Var (Y p ^Y p =Var (β 1.x p +β 0 +e p b 1 x p b 0 )=Var (e p b 1 x p b 0 )= b b (0) Cov ( e p, b 0 )+ Cov (b 1 x p,b 0 ) cov(e p, b 1.x p ) (1)+Var Var (e p )+x p Var X Var 0 Cov b, e, p Cov b0 e As we have that x x 1, we get 1 p ad Cov( b 1, b0 ) ( ) ˆ x x 1 x p x Y Y x x 1 p p p p S X S X S X S X So, we get that ˆ gves Yˆ 1 ( x p x ) 0, 1 S p Yp : N X x S X, where. Stadardzg ad replacg, by

22 T ˆ Yˆ p 1 1 Y x p p x S X : t Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade A 100(1 α) % predcto terval for where c t IY p,1 Y p, the value of Y at X = x p, s gve by x x x x ˆ 1 p ˆ 1 p Y ˆ 1, ˆ p c Y 1 p c S S..6 - Coeffcet of determato X The square of the correlato betwee varables Y ad X s called the coeffcet of R determato ad s deoted wth. It represets the proporto of the total sample varablty the Y s explaed by the regresso model. R Hgher value of s preferable because t meas that regresso model s descrbg well relatoshp that exsts betwee observed varables. 3- Nolear Models X, Obvously, ot all x y -relatoshps ca be descrbed by straght les. Curvlear relatoshps of all sorts ca be foud every feld of applcato. Most of these olear models ca be ftted usg least squares method, provded the data have bee tally learzed by a sutable trasformato Polyomal regresso

23 Polyomal regresso s used whe depedece betwee varables Y ad X s of the type Y a a X a X... a 0 1 m X m e Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade We wll cosder the case whe m=. Y a bx cx Whe a set of pots exhbts a parabolc tred the we ca use quadratc fucto e y=a+bx+c x method of least squares. We get system of ormal equatos a a We ca fd ukow coeffcets a, b ad c usg the x +b x +b a+b x +c x 3 +c x +c x 3 = x y x 4 = x y x = By solvg ths system of equatos, we fd estmates of a, b ad c. Example 3 : Fd a formula for the le of the form ft the followg data y=a+b x+c x y Y a bx cx x e to y Soluto: Substtutg the values of x, x, x 3, =1 x y, x y, y ad above ormal equatos we get

24 10a 4.5b.85c a.85b.05c Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade.85a.05b c 8.89 Solvg these equatos, we obta aˆ 3.16, b ˆ ad cˆ 0.91 The requred equato of quadratc regresso s yˆ x 0.91x Scatter plot wth ftted regresso fucto s gve below. Fgure 6. Scatter plot ad ftted quadratc fucto for gve data 3. - Expoetal Regresso Suppose the relatoshp betwee two varables s best descrbed by a expoetal fucto of the form y=ae bx

25 I case t s possble to obta approxmate solutos usg umercal methods or the dea of learzato. Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Trasformg the prevous equato by takg logarthms o both sdes we get whch mples that squares method to ad l y =1 a= x ( =1 ( b= x )( l y ad x x l y ) x ) l y b x l ad l y l y=l a+b x have a lear relatoshp. Whe we apply the least that yeld to Example 4 : Fd a least square curve of the form y=ae bx (a>0 ) Soluto : to the data gve below x y Cosder y=ae bx, applyg logarthm (wth base y=l a+bx (1) l e o both sdes, we get

26 Takg y=y, l the equato (1) ca be wrtte as Y a ~ bx () Y =à+bx ( ), where Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade a~ l a a= l a `. Equato () s lear equato of Y o x. The ormal equatos are a~ b a~ x b 11 x Y 11 1 x x Y After calculatos, we get 11a ~ +b=6.59 a~ 48.4b After solvg these equatos, we obta a ˆ~ b ˆ 0.96, a= a= à=0,0001 l The requred curve s So we get yˆ 1.64 e 0.96x aˆ e aˆ~ Scatter plot wth ftted regresso le s gve below.

27 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Fgure 7. Scatter plot ad ftted expoetal fucto for gve data Other curvlear models We wll cosder two other olear regresso models. b Y ax l Y l a b l x 1. Let. Takg logarthms of both sdes of equato we get. l Y U l a l x v Usg the otes, ad we get lear regresso model U bv. We fd estmates of α ad b by solvg the followg system of equatos b 1 1 v b 1 v U v 1 1 U v ˆ aˆ e We fd estmate of a as 1 Y U 1 ax b Y. Let. We use ad we get lear regresso model U ax b. We fd estmates of a ad b by solvg the followg system of equatos:

28 a 1 x b 1 U Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade x b Dagostcs for smple lear regresso a I Chapter we studed the smple lear regresso model. Throughout Chapter, we assumed that the smple lear regresso model was a vald model for the data, that s, the codtoal mea of Y gve X s a lear fucto of X ad the codtoal varace of Y gve X s costat. I other words, E (Y / X )= β 0 +β 1 X var (Y / X )=σ I Secto 4.1, we start by examg the mportat ssue of decdg whether the model uder cosderato s deed vald. I Secto 4., we wll see that whe we use a regresso model we mplctly make a seres of assumptos. We the cosder a seres of tools kow as regresso dagostcs to check each assumpto Vald ad Ivald Regresso Models: Ascombe s Four Data Sets Throughout ths secto we shall cosder four data sets costructed by Ascombe. Ths example llustrates the pot that lookg oly at the umercal regresso output may lead to very msleadg coclusos about the data, ad lead to acceptg the wrog model. The data are gve the table below (Table 4.1 ) ad are plotted Fgure 8. Values of radom varable Y dffer each of four data sets, whle x-values of varable X are same frst three sets. Table 4.1 Ascombe s four data sets Case x 1 x x 3 x 4 y 1 y y 3 y x 1 U x

29 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Fgure 8. Plots of Ascombe s four data sets Whe a regresso model s ftted to data sets 1,, 3 ad 4, each case the ftted regresso model s ^y = x, The regresso output for data sets 1 to 4 s gve below. The regresso output for the four data sets s detcal (to two decmal places) every respect. I all cases, results of t-test for valdty of a model (slope of regresso le s dfferet from zero) are sgfcat (p-value s smaller tha 0.05). Lookg at Fgure 8 t s obvous that a straght-le regresso model s approprate oly for Data Set 1. O the other had, the data Data Set seem to have o lear rather tha a straght-le relatoshp. The thrd data set has a extreme outler that should be vestgated. For the fourth data set, the slope of the regresso le s solely determed by a sgle pot, amely, the pot wth the largest x value.

30 Regresso output from R Frst model Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) * x ** Sgf. codes: 0 *** ** 0.01 * Resdual stadard error: 1.37 o 9 degrees of freedom Multple R squared: , Adjusted R squared: F statstc: o 1 ad 9 DF, p value: Secod model Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) x ** Sgf. codes: 0 *** ** 0.01 * Resdual stadard error: 1.37 o 9 degrees of freedom Multple R squared: 0.666, Adjusted R squared: 0.69 F statstc: o 1 ad 9 DF, p value:

31 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Thrd model Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) * x ** Sgf. codes: 0 *** ** 0.01 * Resdual stadard error: 1.36 o 9 degrees of freedom Multple R squared: , Adjusted R squared: 0.69 F statstc: o 1 ad 9 DF, p value: Forth model Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) * x ** Sgf. codes: 0 *** ** 0.01 * Resdual stadard error: 1.36 o 9 degrees of freedom Multple R squared: , Adjusted R squared: 0.697

32 F statstc: 18 o 1 ad 9 DF, p value: Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Ths example demostrates that the umercal regresso output should always go together wth aalyss to esure that a approprate model has bee ftted to the data. I ths case t s eough to look at the scatter plots Fgure 8 to determe whether a approprate model has bee ft. However, some stuatos we shall eed some addtoal tools order to check the valdty of the ftted model 4.- Plots of resduals Oe tool we wll use to valdate a regresso model s oe or more plots of resduals (or stadardzed resduals, whch wll be defed later ths chapter). These plots wll eable us to assess vsually whether a approprate model has bee ft to the data. Fgure 9 provdes plots of the resduals agast X for each of Ascombe s four data sets. X There s o patter ca be recogzed the plot of the resduals for data Set 1 agast 1. Fgure 9. Resdual plots for Ascombe s data sets

33 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade We shall see ext that ths dcates that a approprate model has bee ft to the data. We shall see that a plot of resduals agast X that produces a radom patter dcates a approprate model has bee ft to the data. Addtoally, we shall see that a plot of resduals agast X that produces a o radom patter dcates a correct model has bee ft to the data. Usg Plots of Resduals to Determe Whether the Proposed Regresso Model Is a Vald Model Oe way of checkg whether a vald smple lear regresso model has bee ft s to plot resduals versus x ad look for patters. If o patter s foud the ths dcates that the model provdes a adequate summary of the data,.e., s a vald model. If a patter s foud the the shape of the patter provdes formato o the fucto of x that s mssg from the model. For example, suppose that the true model s a straght le Y =β 0 +β 1 x +e Ad we ft a straght le ^y =b 0 +b 1 x b 0 b 1 are close to the ukow populato parameters. The, assumg that the least squares estmates ^e =Y ^y = ( β 0 b 0 )+(β 1 b 1 ) x + e e β 0 β 1, we fd that that s, the resduals should resemble radom errors. If the resduals vary wth x the ths dcates that a correct model has bee ft. For example, suppose that the true model s a quadratc Y =β 0 +β 1 x +β x +e ad that we ft a straght le ^y =b 0 +b 1 x The, somewhat smplstcally assumg that the least squares estmates the ukow populato parameters Example of a Quadratc Model β 0 ad β 1, we fd that ^e =Y ^y =( β 0 b 0 )+ (β 1 b 1 ) x + β x +e β x +e ^β 0 ^β 1 are close to Suppose that Y s a quadratc fucto of X wthout ay radom error. The, the resduals from the straght-le ft of Y ad X wll have a quadratc patter. Hece, we ca coclude that there s eed for a quadratc term to be added to the orgal straght-le regresso model. Ascombe s data set s a example of such a stuato. Fgure 10 cotas scatter plots of the data ad the resduals from a straght-le model for data set. As expected, a clear quadratc patter s evdet the resduals Fgure 10.

34 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Fgure 10 Scatter plots ad the resduals from a straght-le model for data set Regresso Dagostcs: Tools for Checkg the Valdty of a Model We ext look at tools (called regresso dagostcs) whch are used to check the valdty of all aspects of regresso models. Whe fttg a regresso model we wll dscover that t s mportat to: 1. Determe whether the proposed regresso model s a vald model (.e., determe whether t provdes a adequate ft to the data). The ma tools we wll use to valdate regresso assumptos are plots of stadardzed resduals. The plots eable us to assess vsually whether the assumptos are beg volated.. Determe whch (f ay) of the data pots have x -values that have a uusually large effect o the estmated regresso model (such pots are called leverage pots ). 3. Determe whch (f ay) of the data pots are outlers, that s, pots whch do ot follow the patter set by the rest of the data, whe oe takes to accout the gve model. 4. If leverage pots exst, determe whether each s a bad leverage pot. If a bad leverage pot exsts we shall assess ts fluece o the ftted model. 5. Exame whether the assumpto of costat varace of the errors s reasoable. We beg by lookg at the secod tem of the above lst, leverage pots, as these wll be eeded the explaato of stadardzed resduals.

35 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Leverage Pots Data pots whch have cosderable fluece o the ftted model are called leverage pots. To make thgs as smple as possble, we shall beg by descrbg leverage pots as ether good or bad. Example of a good ad a bad leverage pot Twety pots are radomly geerated from a kow straght-le regresso model. We get a plot lke that show Fgure 11. Oe of the 0 pots has a x -value whch makes t dstat from the other pots o the x -axs. We shall see that ths pot, whch s marked o the Y plot, s a good leverage pot. True populato regresso le (amely, β = 0 + β 1 x ) ad the least squares regresso le (amely, ^y = b 0 + b 1 x 1 Fgure 11. Good leverage pot ) are marked o the plot. Next we move oe of the pots away from the true populato regresso le. I partcular, we focus o the pot wth the largest x -value. Movg ths pot vertcally dow (so that ts x -value stays the same) produces the results show Fgure 1. Notce how the least squares regresso le has chaged dramatcally respose to chagg the Y -value of just a sgle pot. The least squares regresso le has bee levered dow by sgle pot. Hece we

36 call ths pot a leverage pot. It s a bad leverage pot sce ts the patter set by the other 19 pots. Y -value does ot follow Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade I summary, a leverage pot s a pot whose x -value s dstat from the other x -values. A pot s a bad leverage pot f ts Y -value does ot follow the patter set by the other data pots. I other words, a bad leverage pot s a leverage pot whch s also a outler. Fgure 1. Bad leverage pot Returg to Fgure 11, the pot marked o the plot s sad to be a good leverage pot sce ts Y -value closely follows the upward tred patter set by the other 19 pots. I other words, a good leverage pot s a leverage pot whch s NOT also a outler. Next we vestgate what happes whe we chage the Y -value of a pot Fgure 11 whch has a cetral x -value. We move oe of these pots away from the true populato regresso le. I partcular, we focus o the pot wth the 11th largest x -value. Movg ths pot vertcally up (so that ts x value stays the same) produces the results show Fgure 13. Notce how the least squares regresso le has chaged relatvely lttle respose to chagg the Y -value of cetrally located x. Ths pot s sad to be a outler that s ot a leverage pot.

37 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade s based o: The dstace Fgure 13 A plot of Numercal rule that wll detfy x s away from the bulk of the x s. Y agast x showg a outler that s ot a leverage pot x as a leverage pot (.e., a pot of hgh leverage) The extet to whch the ftted regresso le s flueced by the gve pot. of The secod bullet pot above deals wth the extet to whch Y at x ) depeds o y (the actual value of Y at b0 y b1 x where ad b 1 j1 c j y j, ^y =b 0 +b 1 x c j = x j x S x. So that ^y (the predcted value x=x ). We have that ^y = ý b 1 x+b 1 x = ý+b 1 (x x )

38 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade where 1 j=1 x x j S x Notce that y j + j=1 x x j x x 1 + (x x )= + h j = j=1 j=1 We ca express the predcted value, =[ h 1 j + ( x x)(x x) ] j S x ^y as ^y =h y + h j y j () j

39 where h = 1 + ( x x) j=1 ( x j x ) Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade The term h s commoly called the leverage of the h shows how y affects close to zero sce h j =1, j=1 I ths stuato, the predcted value ^y. For example, f h 1 ^y =1 y +other terms y. th data pot, otce that the the other h j terms are ^y wll be close to the actual value matter what values of the rest of the data take. Notce also h that depeds oly o the y o x s. Thus a pot of hgh leverage (or a leverage pot) ca be foud by lookg at just the values of the x s ad ot at the values of the y s. We have 1 h = 1 ( 1 + ( x x ) ) (x j x ) j= Rule for detfyg leverage pots j=1 ( x x ) 1 = + 1 = ( x j x) A popular rule, whch we shall accept, s to classfy leverage pot) a smple lear regresso model f x as a pot of hgh leverage ( a h > average (h )= = 4 Strateges for dealg wth bad leverage pots

40 1. Remove vald data pots We ask a questo about the valdty of the data pots correspodg to bad leverage pots, Are these data pots uusual or dfferet some way from the rest of the data? If so, we remove these pots ad reft the model wthout them. Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade. Ft a dfferet regresso model We ask a questo about the valdty of the regresso model that has bee ftted, has a correct model bee ftted to the data? If so, we should cosder a dfferet model. Good leverage pots Whle good leverage pots do ot have a adverse effect o the estmated regresso R coeffcets, they do decrease ther estmated stadard errors as well as crease the value of. Hece, t s mportat to check extreme leverage pots for valdty, eve whe they are so-called good Stadardzed Resduals I ths secto we wll cosder stadardzed resduals ad ther mportace lear regresso dagostcs. Frst we wll show that the th least squares resdual has varace gve by var ( ^e )=σ [1 h ] Proof Recall from formula (*) that Thus, ^y =h y + h j y j j= ^e = y ^y = y h y j = h j y j =(1 h ) y h j y j j= var ( ^e )=var[ ( 1 h ) y j h j y j] = (1 h ) σ + h j σ j

41 σ [ 1 h +h + j h j ] Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Next, otce that h j j=1 So that, resdual, = j=1 [ 1 ]=. (x x ) (x j x ) + 1 S x + ( x x ) ( x j x) j=1 var ( ^y )=var( j=1 Thus, f ^e cosders that f h h j y j) = h j j 1, so that the th, has small varace (sce h 1 the vary much). We have also show that cosder the fact that whe var ( y j )=σ ^y y so that Var ^y =σ h h 1 S x + j=1 j h j =σ h (x x ) ( x j x ) = 1 (S x ) + ( x x ) (x (S x ) j x ) = 1 + ( pot s a leverage pot, the the correspodg 1 h 0 ). Ths seems reasoable whe oe ^e wll always be small (ad so t does ot. Ths aga seems reasoable whe we the ^y y. I ths case, var (^y )=σ h σ =var ( y ) The problem of the resduals havg dfferet varaces ca be overcome by stadardzg each resdual by dvdg t by a estmate of ts stadard devato. Thus, the th stadardzed resdual r, s gve by r = ^e = ^e ^σ r s 1 h where s= 1 ^e j j =1 s the estmate of σ obtaed from the model.

42 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade A crucal assumpto ay regresso aalyss s that the errors have costat varace. I lterature t s recommeded that a effectve plot to dagose o costat error varace s a plot of Resduals ftted values. 0.5 agast x or a plot of Stadardzed Resduals 0.5 agast x. (or agast the Whe pots of hgh leverage exst, t s formatve to look at plots of stadardzed resduals. The advatage of stadardzed resduals s that they mmedately tell us how may estmated stadard devatos ay pot s away from the ftted regresso model. For example, suppose that the 6th pot has a stadardzed resdual of 4.3, the ths meas that the 6th pot s a estmated 4.3 stadard devatos away from the ftted regresso le. If the errors are ormally dstrbuted, the observg a pot 4.3 stadard devatos away from the ftted regresso le s hghly uusual. Such a pot would commoly be referred to as a outler ad as such t should be vestgated. We shall follow the commo practce of labelg pots as outlers small- to moderate-sze data sets f the stadardzed resdual for the pot falls outsde the terval from to. I very large data sets, we shall chage ths rule to 4 to 4. Idetfcato ad examato of ay outlers s a key part of regresso aalyss. Recall that a bad leverage pot s a leverage pot whch s also a outler. Thus, a bad leverage pot s a leverage pot whose stadardzed resdual falls outsde the terval from to for small to moderate sze data sets. O the other had, a good leverage pot s a leverage pot whose stadardzed resdual falls sde the terval from to Assessg the Ifluece of Certa Cases Oe or more cases ca strogly cotrol or fluece the least squares ft of a regresso model. I ths secto we look at summary statstcs that measure the fluece of a sgle case o the least squares ft of a regresso model. Cook proposed measure of the fluece of dvdual cases whch s called Cook`s dstace ad t s gve by: D = r h 1 h

43 where r s the th stadardzed resdual ad h s the th leverage value, Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade lterature t s recommeded to use for smple lear 4 as a rough cut-off for oteworthy values of regresso. (Meag f Cook`s dstace of a pot s bgger tha fluece o ftted regresso model). Recommedatos for Hadlg Outlers ad Leverage Pots D 4, t has sgfcat We coclude ths secto wth some geeral advce about how to deal wth outlers ad leverage pots. Pots should ot be routely deleted from a aalyss just because they do ot ft the model. Outlers ad bad leverage pots are sgals, flaggg potetal problems wth the model. Outlers ofte pot out a mportat feature of the problem ot cosdered before. They may pot to a alteratve model whch the pots are ot a outler. I ths case t s the worth cosderg fttg a alteratve model Normalty of the Errors The assumpto of ormal errors s eeded small samples for the valdty of t- dstrbuto based hypothess tests ad cofdece tervals ad for all sample szes for predcto tervals. Ths assumpto s geerally checked by lookg at the dstrbuto of the resduals or stadardzed resduals. A commo way to assess ormalty of the errors s to look at what s commoly referred to as a ormal probablty plot or a ormal Q Q plot of the stadardzed resduals. A ormal probablty plot of the stadardzed resduals s obtaed by plottg the ordered stadardzed resduals o the vertcal axs agast the expected order statstcs from a stadard ormal dstrbuto o the horzotal axes. If the resultg plot produces pots close to a straght le the the data are sad to be cosstet wth that from a ormal dstrbuto. O the other had, departures from learty provde evdece of o-ormalty.

44 Example 6: Recall the example from Chapter o the tmg of producto rus for whch we ft a straght-le regresso model to ru tme from ru sze. The data are gve Table (alog wth values of leverages, resduals ad stadardzed resduals) ad are plotted Fgure 14. Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Table Regresso dagostcs for the model Fgure 14 Case Ru tme Ru sze Leverage Resduals Std. resduals Fgure 14. A plot of the producto data wth the least squares le cluded

45 We beg by cosderg the smple regresso model where Y = ru sze ad x = ru tme. Regresso output from R s gve below. Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Resduals: M 1Q Meda 3Q Max Coeffcets: Estmate Std. Error t value Pr(> t ) (Itercept) *** ru.tme e-06 *** --- Sgf. codes: 0 *** ** 0.01 * Resdual stadard error: o 18 degrees of freedom Multple R-squared: 0.730, Adjusted R-squared: F-statstc: 48.7 o 1 ad 18 DF, p-value: 1.615e-06

46 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Lookg at the gve output R, we could coclude that smple regresso output s vald for gve data coeffcet of slope of regresso le s sgfcatly dfferet from zero (based o results of t test) ad 73.0% of varace of varable ru sze s explaed by the regresso model (value of coeffcet of determato). But, before we make such a cocluso we should look at the dagostc plots. Also, we should check leverage values as we sad before, pot s classfed as hgh 4 0. leverage pot f ts leverage value s bgger that. Oly the 17 th pot has leverage value bgger tha ths cut-off pot. O scatter plot, we ca see that y-value of ths pot s ot dstat from other pots, meag that ths s good leverage pot (also ts stadardzed resdual falls sde the terval from to ). Fgure 15 provdes dagostc plots produced by R: o the top left s the plot for radomess of resduals, top rght s ormal Q-Q plot, dow left plot for varace of resduals ad o dow rght s plot for Cook s dstace. Varace of the error term appears to be costat. These plots suggest that lear model s approprate for our producto data: resduals seem dstrbuted radomly, havg approxmately ormal dstrbuto wth costat varace. Noe of the pots have Cook s dstace greater tha cut-off pot 4/18= 0..

47 Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Fgure 15. Dagostc plots Cocluso

48 Ths master thess deals wth a mportat area of statstcs - amely correlato ad regresso. These topcs could be also very useful teachg statstc at the uversty. We have chose ad explaed some aspects of ths topc. Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Frst we dscussed about correlato as measure of assocato betwee varables. We troduced Pearso`s correlato coeffcet as measure of lear relatoshp ad Spearma`s correlato coeffcet as measure of mootoc relatoshp. After, we emphaszed the mportace of costructg the scatter plot of observed varables to reveal the ature ad stregth of correlato. We dscussed dagostc plots as methods for checkg assumptos of smple lear regresso. We preseted example based o Ascombe s four data sets whch llustrates the pot that lookg oly at the umercal regresso output may lead to very msleadg coclusos about the data ad to accept the wrog model. We troduced regresso models to descrbe causal relatoshp betwee two varables. Further, we dealt especally wth smple lear regresso. For estmatg ukow parameters of lear regresso we used method of least squares. We proved propertes of ubasedess ad cosstecy of these parameters, as well as some other propertes cocerg dstrbuto ad depedece of statstcs we have used. We talked about testg the valdty of the model usg Studet s t- test ad about the sgfcace of coeffcet of determato. Regresso models are prmarly used for predcto, so we cosdered predcto tervals for values of depedet varables. Some of the examples gve here are oly for llustratos of deftos ad procedures, some of them, wth larger sets of data were solved usg statstcal package R. A part of the refereces gve here, I have used teachg materals from my Arabc courses at the Uversty of Lbya (Al Mergeb, Zlte ). For future research, t would be terestg to cosder other types of regresso for example, whe depedet varable s bary (logstc regresso) or whe we have more tha oe depedet varable (multvarate regresso). Also, we could explore how s regresso used to solve problems bology, medce ad other areas. Refereces

49 Cohe,Y. ad Cohe, J.Y. (008), Statstcs ad Data wth R: A Appled Approach Through Examples. Wley, New York. Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Jevremovć,V ad Malšć, J. (00). Statstčke metode u meteorologj žejerstvu. Savez hdrometeorološk zavod, Beograd. Larso, R. J. ad Marx, M.L. (005). A troducto to Mathematcal statstcs ad ts Applcatos. Pretce Hall, New Jersey. Ldley,D.V. (1987). Regresso ad correlato aalyss. The New Palgrave: A Dctoary of Ecoomcs. Mlto, J. S, Corbet, J.J. ad McTeer, M.X. (1997). A troducto to Statstcs. McGraw- Hll Hgher Educato, Burr Rdge. Shaker, R. G. (006). Numercal Aalyss, New Age Iteratoal Ltd., Publshers, New Delh. Sheather, S.J. (009). A Moder Approach to Regresso wth R. Sprger, New York.

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn: Chapter 3 3- Busess Statstcs: A Frst Course Ffth Edto Chapter 2 Correlato ad Smple Lear Regresso Busess Statstcs: A Frst Course, 5e 29 Pretce-Hall, Ic. Chap 2- Learg Objectves I ths chapter, you lear:

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

Chapter 13 Student Lecture Notes 13-1

Chapter 13 Student Lecture Notes 13-1 Chapter 3 Studet Lecture Notes 3- Basc Busess Statstcs (9 th Edto) Chapter 3 Smple Lear Regresso 4 Pretce-Hall, Ic. Chap 3- Chapter Topcs Types of Regresso Models Determg the Smple Lear Regresso Equato

More information

Statistics MINITAB - Lab 5

Statistics MINITAB - Lab 5 Statstcs 10010 MINITAB - Lab 5 PART I: The Correlato Coeffcet Qute ofte statstcs we are preseted wth data that suggests that a lear relatoshp exsts betwee two varables. For example the plot below s of

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

Regresso What s a Model? 1. Ofte Descrbe Relatoshp betwee Varables 2. Types - Determstc Models (o radomess) - Probablstc Models (wth radomess) EPI 809/Sprg 2008 9 Determstc Models 1. Hypothesze

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uversty Mchael Bar Sprg 5 Mdterm am, secto Soluto Thursday, February 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes eam.. No calculators of ay kd are allowed..

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model 1. Estmatg Model parameters Assumptos: ox ad y are related accordg to the smple lear regresso model (The lear regresso model s the model that says that x ad y are related a lear fasho, but the observed

More information

Probability and. Lecture 13: and Correlation

Probability and. Lecture 13: and Correlation 933 Probablty ad Statstcs for Software ad Kowledge Egeers Lecture 3: Smple Lear Regresso ad Correlato Mocha Soptkamo, Ph.D. Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of

More information

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y. .46. a. The frst varable (X) s the frst umber the par ad s plotted o the horzotal axs, whle the secod varable (Y) s the secod umber the par ad s plotted o the vertcal axs. The scatterplot s show the fgure

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Cofdece terval estmato The cofdece tervals multple

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

STA302/1001-Fall 2008 Midterm Test October 21, 2008

STA302/1001-Fall 2008 Midterm Test October 21, 2008 STA3/-Fall 8 Mdterm Test October, 8 Last Name: Frst Name: Studet Number: Erolled (Crcle oe) STA3 STA INSTRUCTIONS Tme allowed: hour 45 mutes Ads allowed: A o-programmable calculator A table of values from

More information

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uverst Mchael Bar Sprg 5 Mdterm xam, secto Soluto Thursda, Februar 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes exam.. No calculators of a kd are allowed..

More information

Simple Linear Regression

Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uversty Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation 4//6 Appled Statstcs ad Probablty for Egeers Sth Edto Douglas C. Motgomery George C. Ruger Chapter Smple Lear Regresso ad Correlato CHAPTER OUTLINE Smple Lear Regresso ad Correlato - Emprcal Models -8

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

Correlation and Regression Analysis

Correlation and Regression Analysis Chapter V Correlato ad Regresso Aalss R. 5.. So far we have cosdered ol uvarate dstrbutos. Ma a tme, however, we come across problems whch volve two or more varables. Ths wll be the subject matter of the

More information

Lecture Notes Types of economic variables

Lecture Notes Types of economic variables Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the

More information

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal. Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot

More information

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger Example: Multple lear regresso 5000,00 4000,00 Tro Aders Moger 0.0.007 brthweght 3000,00 000,00 000,00 0,00 50,00 00,00 50,00 00,00 50,00 weght pouds Repetto: Smple lear regresso We defe a model Y = β0

More information

Module 7: Probability and Statistics

Module 7: Probability and Statistics Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

Multivariate Transformation of Variables and Maximum Likelihood Estimation

Multivariate Transformation of Variables and Maximum Likelihood Estimation Marquette Uversty Multvarate Trasformato of Varables ad Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Assocate Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Copyrght 03 by Marquette Uversty

More information

Chapter Two. An Introduction to Regression ( )

Chapter Two. An Introduction to Regression ( ) ubject: A Itroducto to Regresso Frst tage Chapter Two A Itroducto to Regresso (018-019) 1 pg. ubject: A Itroducto to Regresso Frst tage A Itroducto to Regresso Regresso aalss s a statstcal tool for the

More information

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model ECON 48 / WH Hog The Smple Regresso Model. Defto of the Smple Regresso Model Smple Regresso Model Expla varable y terms of varable x y = β + β x+ u y : depedet varable, explaed varable, respose varable,

More information

Chapter 2 Supplemental Text Material

Chapter 2 Supplemental Text Material -. Models for the Data ad the t-test Chapter upplemetal Text Materal The model preseted the text, equato (-3) s more properl called a meas model. ce the mea s a locato parameter, ths tpe of model s also

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

STA 105-M BASIC STATISTICS (This is a multiple choice paper.) DCDM BUSINESS SCHOOL September Mock Eamatos STA 0-M BASIC STATISTICS (Ths s a multple choce paper.) Tme: hours 0 mutes INSTRUCTIONS TO CANDIDATES Do ot ope ths questo paper utl you have bee told to do

More information

Special Instructions / Useful Data

Special Instructions / Useful Data JAM 6 Set of all real umbers P A..d. B, p Posso Specal Istructos / Useful Data x,, :,,, x x Probablty of a evet A Idepedetly ad detcally dstrbuted Bomal dstrbuto wth parameters ad p Posso dstrbuto wth

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

Multiple Choice Test. Chapter Adequacy of Models for Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression Multple Choce Test Chapter 06.0 Adequac of Models for Regresso. For a lear regresso model to be cosdered adequate, the percetage of scaled resduals that eed to be the rage [-,] s greater tha or equal to

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

Lecture 1 Review of Fundamental Statistical Concepts

Lecture 1 Review of Fundamental Statistical Concepts Lecture Revew of Fudametal Statstcal Cocepts Measures of Cetral Tedecy ad Dsperso A word about otato for ths class: Idvduals a populato are desgated, where the dex rages from to N, ad N s the total umber

More information

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs CLASS NOTES for PBAF 58: Quattatve Methods II SPRING 005 Istructor: Jea Swaso Dael J. Evas School of Publc Affars Uversty of Washgto Ackowledgemet: The structor wshes to thak Rachel Klet, Assstat Professor,

More information

Analysis of Variance with Weibull Data

Analysis of Variance with Weibull Data Aalyss of Varace wth Webull Data Lahaa Watthaacheewaul Abstract I statstcal data aalyss by aalyss of varace, the usual basc assumptos are that the model s addtve ad the errors are radomly, depedetly, ad

More information

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1 Hadout #8 Ttle: Foudatos of Ecoometrcs Course: Eco 367 Fall/05 Istructor: Dr. I-Mg Chu Lear Regresso Model So far we have focused mostly o the study of a sgle radom varable, ts correspodg theoretcal dstrbuto,

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Linear Regression with One Regressor

Linear Regression with One Regressor Lear Regresso wth Oe Regressor AIM QA.7. Expla how regresso aalyss ecoometrcs measures the relatoshp betwee depedet ad depedet varables. A regresso aalyss has the goal of measurg how chages oe varable,

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5 STAT 0 Dr. Kar Lock Morga Exam 2 Grades: I- Class Multple Regresso SECTIONS 9.2, 0., 0.2 Multple explaatory varables (0.) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (0.2) Exam 2 Re- grades Re-

More information

Lecture 2: Linear Least Squares Regression

Lecture 2: Linear Least Squares Regression Lecture : Lear Least Squares Regresso Dave Armstrog UW Mlwaukee February 8, 016 Is the Relatoshp Lear? lbrary(car) data(davs) d 150) Davs$weght[d]

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1 STA 08 Appled Lear Models: Regresso Aalyss Sprg 0 Soluto for Homework #. Let Y the dollar cost per year, X the umber of vsts per year. The the mathematcal relato betwee X ad Y s: Y 300 + X. Ths s a fuctoal

More information

Bootstrap Method for Testing of Equality of Several Coefficients of Variation

Bootstrap Method for Testing of Equality of Several Coefficients of Variation Cloud Publcatos Iteratoal Joural of Advaced Mathematcs ad Statstcs Volume, pp. -6, Artcle ID Sc- Research Artcle Ope Access Bootstrap Method for Testg of Equalty of Several Coeffcets of Varato Dr. Navee

More information

Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression Chapter Smple Lear Regresso. Itroducto ad Least Squares Estmates Regresso aalyss s a method for vestgatg the fuctoal relatoshp amog varables. I ths chapter we cosder problems volvg modelg the relatoshp

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uverst Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

ENGI 4421 Propagation of Error Page 8-01

ENGI 4421 Propagation of Error Page 8-01 ENGI 441 Propagato of Error Page 8-01 Propagato of Error [Navd Chapter 3; ot Devore] Ay realstc measuremet procedure cotas error. Ay calculatos based o that measuremet wll therefore also cota a error.

More information

residual. (Note that usually in descriptions of regression analysis, upper-case

residual. (Note that usually in descriptions of regression analysis, upper-case Regresso Aalyss Regresso aalyss fts or derves a model that descres the varato of a respose (or depedet ) varale as a fucto of oe or more predctor (or depedet ) varales. The geeral regresso model s oe of

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for

More information

Qualifying Exam Statistical Theory Problem Solutions August 2005

Qualifying Exam Statistical Theory Problem Solutions August 2005 Qualfyg Exam Statstcal Theory Problem Solutos August 5. Let X, X,..., X be d uform U(,),

More information

22 Nonparametric Methods.

22 Nonparametric Methods. 22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

MEASURES OF DISPERSION

MEASURES OF DISPERSION MEASURES OF DISPERSION Measure of Cetral Tedecy: Measures of Cetral Tedecy ad Dsperso ) Mathematcal Average: a) Arthmetc mea (A.M.) b) Geometrc mea (G.M.) c) Harmoc mea (H.M.) ) Averages of Posto: a) Meda

More information

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation? Ca we tae the Mstcsm Out of the Pearso Coeffcet of Lear Correlato? Itroducto As the ttle of ths tutoral dcates, our purpose s to egeder a clear uderstadg of the Pearso coeffcet of lear correlato studets

More information

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018 /3/08 Sstems & Bomedcal Egeerg Departmet SBE 304: Bo-Statstcs Smple Lear Regresso ad Correlato Dr. Ama Eldeb Fall 07 Descrptve Orgasg, summarsg & descrbg data Statstcs Correlatoal Relatoshps Iferetal Geeralsg

More information

Third handout: On the Gini Index

Third handout: On the Gini Index Thrd hadout: O the dex Corrado, a tala statstca, proposed (, 9, 96) to measure absolute equalt va the mea dfferece whch s defed as ( / ) where refers to the total umber of dvduals socet. Assume that. The

More information

A New Family of Transformations for Lifetime Data

A New Family of Transformations for Lifetime Data Proceedgs of the World Cogress o Egeerg 4 Vol I, WCE 4, July - 4, 4, Lodo, U.K. A New Famly of Trasformatos for Lfetme Data Lakhaa Watthaacheewakul Abstract A famly of trasformatos s the oe of several

More information

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1 C. Statstcs a. Descrbe the stages the desg of a clcal tral, takg to accout the: research questos ad hypothess, lterature revew, statstcal advce, choce of study protocol, ethcal ssues, data collecto ad

More information

Point Estimation: definition of estimators

Point Estimation: definition of estimators Pot Estmato: defto of estmators Pot estmator: ay fucto W (X,..., X ) of a data sample. The exercse of pot estmato s to use partcular fuctos of the data order to estmate certa ukow populato parameters.

More information

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use. INTRODUCTORY NOTE ON LINEAR REGREION We have data of the form (x y ) (x y ) (x y ) These wll most ofte be preseted to us as two colum of a spreadsheet As the topc develops we wll see both upper case ad

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I Chapter 8 Heterosedastcty Recall MLR 5 Homsedastcty error u has the same varace gve ay values of the eplaatory varables Varu,..., = or EUU = I Suppose other GM assumptos hold but have heterosedastcty.

More information

PROPERTIES OF GOOD ESTIMATORS

PROPERTIES OF GOOD ESTIMATORS ESTIMATION INTRODUCTION Estmato s the statstcal process of fdg a appromate value for a populato parameter. A populato parameter s a characterstc of the dstrbuto of a populato such as the populato mea,

More information

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67.

1. The weight of six Golden Retrievers is 66, 61, 70, 67, 92 and 66 pounds. The weight of six Labrador Retrievers is 54, 60, 72, 78, 84 and 67. Ecoomcs 3 Itroducto to Ecoometrcs Sprg 004 Professor Dobk Name Studet ID Frst Mdterm Exam You must aswer all the questos. The exam s closed book ad closed otes. You may use your calculators but please

More information

Simple Linear Regression - Scalar Form

Simple Linear Regression - Scalar Form Smple Lear Regresso - Scalar Form Q.. Model Y X,..., p..a. Derve the ormal equatos that mmze Q. p..b. Solve for the ordary least squares estmators, p..c. Derve E, V, E, V, COV, p..d. Derve the mea ad varace

More information

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions

Estimation of Stress- Strength Reliability model using finite mixture of exponential distributions Iteratoal Joural of Computatoal Egeerg Research Vol, 0 Issue, Estmato of Stress- Stregth Relablty model usg fte mxture of expoetal dstrbutos K.Sadhya, T.S.Umamaheswar Departmet of Mathematcs, Lal Bhadur

More information

Continuous Distributions

Continuous Distributions 7//3 Cotuous Dstrbutos Radom Varables of the Cotuous Type Desty Curve Percet Desty fucto, f (x) A smooth curve that ft the dstrbuto 3 4 5 6 7 8 9 Test scores Desty Curve Percet Probablty Desty Fucto, f

More information

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) =

Applied Statistics and Probability for Engineers, 5 th edition February 23, b) y ˆ = (85) = Appled Statstcs ad Probablty for Egeers, 5 th edto February 3, y.8.7.6.5.4.3.. -5 5 5 x b) y ˆ.3999 +.46(85).6836 c) y ˆ.3999 +.46(9).744 d) ˆ.46-3 a) Regresso Aalyss: Ratg Pots versus Meters per Att The

More information

"It is the mark of a truly intelligent person to be moved by statistics." George Bernard Shaw

It is the mark of a truly intelligent person to be moved by statistics. George Bernard Shaw Chapter 0 Chapter 0 Lear Regresso ad Correlato "It s the mark of a truly tellget perso to be moved by statstcs." George Berard Shaw Source: https://www.google.com.ph/search?q=house+ad+car+pctures&bw=366&bh=667&tbm

More information

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 THE ROYAL STATISTICAL SOCIETY 06 EAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5 The Socety s provdg these solutos to assst cadtes preparg for the examatos 07. The solutos are teded as learg ads ad should

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

CHAPTER 2. = y ˆ β x (.1022) So we can write

CHAPTER 2. = y ˆ β x (.1022) So we can write CHAPTER SOLUTIONS TO PROBLEMS. () Let y = GPA, x = ACT, ad = 8. The x = 5.875, y = 3.5, (x x )(y y ) = 5.85, ad (x x ) = 56.875. From equato (.9), we obta the slope as ˆβ = = 5.85/56.875., rouded to four

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971))

Part 4b Asymptotic Results for MRR2 using PRESS. Recall that the PRESS statistic is a special type of cross validation procedure (see Allen (1971)) art 4b Asymptotc Results for MRR usg RESS Recall that the RESS statstc s a specal type of cross valdato procedure (see Alle (97)) partcular to the regresso problem ad volves fdg Y $,, the estmate at the

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

STATISTICAL INFERENCE

STATISTICAL INFERENCE (STATISTICS) STATISTICAL INFERENCE COMPLEMENTARY COURSE B.Sc. MATHEMATICS III SEMESTER ( Admsso) UNIVERSITY OF CALICUT SCHOOL OF DISTANCE EDUCATION CALICUT UNIVERSITY P.O., MALAPPURAM, KERALA, INDIA -

More information

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere Eco 388 R. Butler 06 rev Lecture 5 Multvarate I. Parttoed Regresso ad Partal Regresso Table : Projectos everywhere P = ( ) ad M = I ( ) ad s a vector of oes assocated wth the costat term Sample Model Regresso

More information

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations HP 30S Statstcs Averages ad Stadard Devatos Average ad Stadard Devato Practce Fdg Averages ad Stadard Devatos HP 30S Statstcs Averages ad Stadard Devatos Average ad stadard devato The HP 30S provdes several

More information

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic.

COV. Violation of constant variance of ε i s but they are still independent. The error term (ε) is said to be heteroscedastic. c Pogsa Porchawseskul, Faculty of Ecoomcs, Chulalogkor Uversty olato of costat varace of s but they are stll depedet. C,, he error term s sad to be heteroscedastc. c Pogsa Porchawseskul, Faculty of Ecoomcs,

More information

MS exam problems Fall 2012

MS exam problems Fall 2012 MS exam problems Fall 01 (From: Rya Mart) 1. (Stat 401) Cosder the followg game wth a box that cotas te balls two red, three blue, ad fve gree. A player selects two balls from the box at radom, wthout

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Marquette Uverst Maxmum Lkelhood Estmato Dael B. Rowe, Ph.D. Professor Departmet of Mathematcs, Statstcs, ad Computer Scece Coprght 08 b Marquette Uverst Maxmum Lkelhood Estmato We have bee sag that ~

More information

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d

9 U-STATISTICS. Eh =(m!) 1 Eh(X (1),..., X (m ) ) i.i.d 9 U-STATISTICS Suppose,,..., are P P..d. wth CDF F. Our goal s to estmate the expectato t (P)=Eh(,,..., m ). Note that ths expectato requres more tha oe cotrast to E, E, or Eh( ). Oe example s E or P((,

More information