BIOSTATISTICS TOPIC 8: ANALYSIS OF CORRELATIONS I. SIMPLE LINEAR REGRESSION

Size: px
Start display at page:

Download "BIOSTATISTICS TOPIC 8: ANALYSIS OF CORRELATIONS I. SIMPLE LINEAR REGRESSION"

Transcription

1 BIOSTATISTICS TOPIC 8: ANALYSIS OF CORRELATIONS I. SIMPLE LINEAR REGRESSION Gve a ma three weapos - correlato, regresso ad a pe - ad he wll use all three. Ao, 978 So far we have bee cocered wth aalyses of dffereces. Ad, dog so, we have cosdered measurg subjects o a sgle outcome varable (or two groups of subjects o oe varable). Such measuremets have yelded uvarate frequecy dstrbuto ad the aalyss s ofte referred to as uvarate aalyss. Now, we are cosderg subjects ad each subject has two measures avalable; other words, we have two varables per subject, say x ad y. Our terest ths kd of data s obvously to measure relatoshp betwee the two varables. We ca plot the value of y agast the value of x a scatter dagram ad assess whether the value of y vares systematcally wth the varato values of x. But we stll wats to have a sgle summary measure of the stregth of relatoshp betwee x ad y. I hs book "Natural Ihertace", Fracs Galto wrote: "each pecularty a ma s shared by hs ksma, but o the average, a less degree. For example, whte tall fathers would ted to have tall sos, the sos would be o the average shorter tha ther fathers, ad sos of short fathers, though havg heghts below the average for the etre populato, would ted to be taller tha ther fathers." He, the, cocluded a pheomeo called "law of uversal regresso" whch was the org of the topc we are learg rght ow. Today, the characterstcs of returg from extreme values toward the average of the full populato s well recogsed ad s termed "regresso toward the mea". We wll cosder methods for assessg the assocato betwee cotuous varables usg two methods kow as correlato aalyss ad lear regresso aalyss, whch are happeed to be some of the most popular statstcal techques medcal research.

2 I. CORRELATION ANALYSIS.. THE COVARIANCE AND COEFFICIENT OF CORRELATION I a prevous topc, we stated that f Y ad Y are depedet varables, the the varace of the sum or dfferece betwee X ad Y s equal to the varace of X plus the varace of Y, that s: var(x + Y) = var(x) + var(y) what happe f X ad Y are ot depedet? Before dscussg ths problem, we troduce the cocepts of covarace ad correlato. I elemetary trgoometry we lear that for a rght tragle, f we let the hypoteuse sde be c ad the other two sdes be a ad b, the Pythagoras' theorem states that: c = a + b ad ay tragle: c = a + b ab.cos C (Cose rule). Aalogously, f we have two radom varables X ad Y, where X may be the heght of father ad Y may be the heght of daughter, ther varace ca be estmated by: s x = = ( x x) [] respectvely. ad s = ( y y) y = Furthermore, f X ad Y are depedet, we have:

3 s + = s + s X Y X Y [] Let us ow dscuss X ad Y the cotext of geetcs. Let X be BMD of father ad Y be the BMD of daughter. It s clear that we ca fd aother expresso for the relatoshp x x by betwee X ad Y by multplyg each father's BMD from ts mea ( ) correspodg devato of hs daughter ( y y), stead of squarg the father's or daughter's devato, before summato. We refer ths quatty to as covarace betwee X ad Y ad s deoted by Cov(X, Y); that s: cov ( X, Y ) = ( x x)( y y) [3] = By defto ad aalogous to the Cose law ay tragle, we have: f X ad Y are ot depedet, the: : = σ + + Cov( X Y ) σ [4] X + Y X σy, A umber of pots eed to be oted here: (a) Varaces as defed [] are always postve sce they are derved from sums of squares, whereas, covaraces as defed [3] are derved from sum of cross-products of devatos ad so may be ether postve or egatve. (b) A postve value dcates that the devatos from the mea oe dstrbuto, say father's BMDs, are prepoderatly accompaed by devatos the other, say daughter's BMDs, the same drecto, postve or egatve. (c) A egatve covarace, o the other had, dcates that devatos the two dstrbutos are prepoderatly opposte drectos. (d) Whe the devato oe of the dstrbuto s equally lkely to be accompaed by devato of lke or opposte sg the other, the covarace, apart from errors of radom samplg, wll be zero. The mportace of covarace s ow obvous. If varato of BMD s uder geetc cotrol we would expect hgher BMD fathers geerally have hgh BMD daughters ad low 3

4 BMD fathers geerally have low BMD daughters. I other words, we should expect them to have postve covarace. Lack of geetc cotrol would produce a covarace of zero. It was by ths meas that Galto frst showed stature ma to be uder geetc cotrol. He foud that the covarace of paret ad offsprg, ad also that of pars of sblgs, was postve. The sze of the covarace relatve to some stadard gves a measure of the stregth of the assocato betwee the relatves. The stadard take s that afforded by the varaces of the two separate dstrbutos, our case, of father's BMD ad daughter's BMD. We may compare the covarace to these varaces separately ad we do ths by calculatg the regresso coeffcets whch have the forms: or Cov var Cov var ( X, Y ) ( X ) ( X, Y ) ( Y ) (regresso of daughters o father) (regresso of fathers o daughters) we ca also compare the covarace wth the two varaces at oce: ( X, Y ) ( Y ) var( Y ) Cov var Ths s called the coeffcet of correlato ad s deoted by r..e. r = ( X, Y ) ( X ).var( Y ) Cov var ( X, Y ) Cov = s x s y [5] r wll have a maxmum value of (a complete determato of daughter's BMD by father's BMD) ad mmum value of 0 (o relatoshp betwee father's ad daughter's BMDs). way: Wth some algebrac mapulato, we ca show that [5] ca be wrtte aother 4

5 r = = = ( x x)( y y) ( x x) ( y y) = = x y x = = = ( ) sxsy y [6] where s x ad s y are stadard devatos for X ad Y varable, respectvely... TEST OF HYPOTHESIS Oe obvous questo s that whether the observed coeffcet of correlato (r) s sgfcatly dfferet from zero. Uder the ull hypothess that there s o assocato the populato (r = 0), t ca be show that the statstc: t = r r has a t dstrbuto wth - df. O the other had, for a moderate or large sample sze, we ca set up a 95% cofdece terval of r by usg a theoretcal dstrbuto of r. It ca be show that the samplg dstrbuto of r s ot ormally dstrbuted. We ca, however, trasform t to a Normal dstrbuted quatty by usg the so-called Fsher's trasformato whch: + r z = l r [7] The stadard error of z s approxmately equal to: SE ( z) = [8] 3 5

6 Thus, approxmate 95% cofdece terval s: z to z Of course, we ca back-trasform the data to obta 95% cofdece terval for r (ths s left for exercse). Example : Cosder a clcal tral volvg patets presetg wth hyperlpoproteaema, basele values of the age of patets (years), total serum cholesterol (mg/ml) ad serum calcum level (mg/00ml) were recorded. Data for 8 patets are gve below: Patet Age Cholesterol (X) (Y) Mea S.D Let age be X ad cholesterol be Y, to calculate the correlato coeffcet, we eed to calculate the covarace Cov(X, Y) whch s: 6

7 Cov(X, Y) = ( x x)( y y) = = xy xy = = 0.68 The the coeffcet of correlato s: r = ( X Y ) Cov, s x s y = = To test for the sgfcace of r, we eed to covert t to the z score as gve [7]: z = + r l r = l = 0.56 wth the stadard error of z s gve [8]: SE ( z) = = = 0.58 The the t rato s 0.56 / 0.58 =.65 whch exceeds the expected value of. (wth 7 df ad 5% sgfcace level), we coclude that there s a assocato betwee age ad cholesterol ths sample of subjects. //.3. TEST FOR DIFFERENCE BETWEEN TWO COEFFICIENTS OF CORRELATION Suppose that we have two sample coeffcets of correlato r ad r whch were estmated from two ukow populato coeffcets ρ ad ρ, respectvely. Suppose further that r ad r were derved from two depedet samples of ad subjects, respectvely. To test the hypothess that ρ = ρ versus the alteratve hypothess that ρ ρ, we frstly covert these sample coeffcets to a z-score: 7

8 By theory, the statstc z z + r = l r ad + r = z l r z s dstrbuted about the mea Mea(z z ) = ρ ρ ( ) ( ) where ρ s the commo correlato coeffcet, wth varace Var(z z ) = If the samples are ot small or f ad are ot very dfferet, the statstc t = z z ca be used as a test statstc of the hypothess. 8

9 II. SIMPLE LINEAR REGRESSION ANALYSIS We are ow extedg the dea of correlato to a rather mechacal cocept called regresso aalyss. Before dog so, let us brefly recall ths dea the hstorcal cotext. As metoed earler, I 885, Fracs Galto troduced the cocept of "regresso" a study that demostrated that offsprg do ot ted toward the sze of parets, but rather toward the average as compared to the parets. The method of regresso has, however, a loger hstory. I fact, a legedary Frech mathematca by the ame of Adre Mare Legedre publshed the frst work o regresso (although he dd ot use the word) 805. Stll, the credt for dscovery of the method of least squares geerally gve to Carl Fredrch Gauss (aother legedary mathematca), who used the procedure the early part of the 9th cetury. Much used (ad perhaps overused) clche data aalyss "garbage - garbage out" ad "the results are oly as good as the data that produced them" apply the buldg of regresso models. If the data do ot reflect a tred volvg the varables, there wll be o success model developmet or drawg fereces regardg the system. Eve wth some types of relatoshp does exst, ths does ot mply that the data wll reveal t a clearly detectable fasho. May of the deas ad prcples used fttg lear models to data are best llustrated by usg smple lear regresso. These deas ca be exteded to more complex modellg techques oce the basc cocepts ecessary for model developmet, fttg ad assessmet have bee dscussed. Example (cotued): The plot of cholesterol (y-axs) versus age (x-axs) yelds the followg relatoshp: 9

10 From ths graph, we ca see that cholesterol level seems to vary systematcally wth age (whch was cofrmed earler by the correlato aalyss); moreover, the data pots seem to scatter aroud the le coects betwee two pots (0,.) ad (65, 4.5). Now, we leared earler ( Topc ) that for ay two gve pots a two-dmesoal space, we could costruct a straght le through two pots. The same prcple s appled here, although the techque of estmato s slghtly more complcated... ESTIMATES OF LINEAR REGRESSION MODEL Let the observed pars of values x ad y be ( x, y ), ( x ), y,..., ( x, ) y essece of a regresso aalyss s cocered wth relatoshps betwee a respose or depedet varable (y) ad explaatory or depedet varable (x). The smplest relatoshp s the straght le model: y = β0 + β x + ε [8]. The I ths model, β 0 ad β are ukow parameters ad are to be estmated from the observed data, ε s a radom error or departure term represetg the level of cosstecy preset repeated observatos uder smlar expermetal codtos. To proceed wth the parameter estmato, we have to make some assumptos () The value of x s fxed (ot radom); 0

11 ad o the radom error ε, we assume that ε's are: () ormally dstrbuted; () has expected value 0.e. E(ε) = 0 () costat varace σ for all levels of X; (v) ad successvely ucorrelated (statstcally depedet). Because β 0 ad β are parameters (hece, costats) ad that the value of x s fxed, we ca obta the expected value of [8] as : E(y ) = β 0 + β x [9] ad var( y ) = var(β 0 + β x + ε ) = var(ε ) = σ. [0] LEAST SQUARE ESTIMATORS To estmate β 0 ad β from a seres of data pots ( x, y ), ( x ), y,..., ( ) we use the method of least squares. Ths method estmates two costats b 0 ad b (correspodg to β 0 ad β ) so that they mmse the quatty: x,, y Q = [ y ( b + b x )] = 0 x It turs out that to mmse ths quatty, we eed to solve a system of smultaeous equatos: y = b + b x 0 xy = b x + b x 0 Ad the estmates tur out to be:

12 b = ( x x)( y y) ( x x) = cov var ( x, y) ( x) [] ad b0 = y bx [] Example (cotued): I our example, the estmates are: ( x, y) 0.68 ( x) Cov b = = = cov ad b0 = y bx = (38.83) =.089. Hece the regresso equato s: y = x That s, for ay dvdual, hs/her cholesterol s completely determed by the equato: Cholesterol = (Age) + e where e s the specfc error whch s ot accouted for by the equato (cludg measuremet error) assocated wth the subject. For stace, for subject (46 years old), hs/her expected cholesterol s: x 46 = ; whe compared wth hs/her actual value of 3.5, the resdual s e = = Smlarly, the expected cholesterol value for subject s x 6 =.45 ad s hgher tha hs/her actual level by The predcted value calculated usg the above equato, together wth the resduals (e) are tabulated the followg table.

13 I.D Observed Predcted Resdual (O) (P) (e = O - E) TEST OF HYPOTHESIS CONCERNING REGRESSION PARAMETERS. To some large extet, the terest wll le the values of slope. Iterpretato of ths parameter s meagless wthout a kowledge of ts dstrbuto. Therefore, havg calculate the estmates b ad b 0, we eed to determe the stadard error of these parameters so that we ca make fereces regardg ther sgfcace the model. Before dog ths, let us have a bref look at the sgfcace of the term e. We leared earler topc that f y s the sample mea of a varable Y, the the varace of Y s gve by ( y y). Now, the regresso case, y s actually equal = to β 0 + β x = $y. Hece, t s reasoable that the sample varace of the resduals e 3

14 should provde a estmator of σ [0]. It s from ths reasog that the ubased estmate of σ s defed as: s = = ( y yˆ ) = ( e ) [3] It ca be show that the expected values of b ad b 0 are β ad β 0 (true parameters), respectvely. Furthermore, from [3], t ca be show that the varaces of b ad b 0 are: s var ( b ) = [4] ( x x) = ad var( b ) = var( y) + ( x) ( b ) whch s: ( b ) 0 var x var 0 = s + [5] ( x x) = Oce ca go a step further by estmatg the covarace of b ad b 0 by: Cov ( b b ) ( x) s ( ) = [6], 0 b That s, b s ormally dstrbuted wth mea β ad varace gve [4], ad b 0 s ormally dstrbuted wth mea β 0 ad varace gve [5]. It follows that the test for sgfcace of b s the rato b bs t = = s s s x x whch s dstrbuted accordg to the t dstrbuto wth - df. 4

15 ad t = s b 0 ( / ) + ( x / s ) x s a test for b 0, whch s dstrbuted accordg to the t dstrbuto wth - df. Example (cotued): I our example, the estmate resdual varace s s calculated as follows: s = [(-0.475) + ( ) (0.079) ] / (8-) = We ca calculate the corrected sum of square of AGE, ( x x) from the estmate varace as: = ( x x) = s x ( - ) = (7) = =, by workg out Hece, the estmated varace of b s: var(b ) = / = SE(b ) = var( b ) = A test of hypothess of β = 0 ca be costructed as: t = b / SE(b ) = / = 0.70 whch s hghly sgfcat (p < 0.000). For the tercept we ca estmate ts varace as: 5

16 var(b 0 ) = x s + ( x x) = ( ) = = Ad the test of hypothess of β 0 = 0 ca be costructed as: t = b 0 / SE(b 0 ) =.089 / = 4.9 whch s also hghly sgfcat (p < 0.00)..3. ANALYSIS OF VARIANCE A aalyss of varace parttos the overall varato betwee the observatos Y to varato whch has bee accouted for by the regresso o X ad resdual or uexplaed varato. Thus, we ca say: Total varato = Varato explaed + Resdual about the mea by regresso model varato I ANOVA otato, we ca wrte equvaletly: SSTO = SSR + SSE or, = ( y y) = ( yˆ y) + ( y yˆ ) = = 6

17 Now, SSTO s assocated wth - df. For SSR, there are two parameters (b 0 ad b ) the model, but the costrat ( y y) = ˆ = 0 takes away df, hece t has fally df. For SSE, there are resduals (e ); however, df are lost because of two costrats o the e 's assocated wth estmatg the parameters β 0 ad β by the two ormal equatos see secto.). We ca assemble these data a ANOVA table as follows: Source df SS MS Regresso SSR = ( y y) = Resdual error - SSE = ( ) = Total - SSTO = ( y y) ˆ MSR = SSR/ y yˆ MSE = SSE / (-) = R-SQUARE From ths table t seems to be sesble to obta a "global" statstc to dcate how well the model fts the data. If we dvde the regresso sum of square (varato due to regresso model, SSR) by the total varato of Y (SSTO), we would have what statstcas called the coeffcet of determato, whch s deoted by R : : R = = = ( yˆ y) SSR SSTO ( y y) = = SSE SSTO [7] R. I fact, t ca be show that the coeffcet of correlato r defed [5] s equal to 7

18 Obvously, R s restrcted to 0 < R <. A R = 0 dcates that X ad Y are depedet (urelated), whereas a R = dcates that Y s completely determed by X. However, there are a lot of ptfalls ths statstc. A value of R = 0.75 s lkely to be vewed wth some satsfacto by expermeters. It s ofte more approprate to recogse that there s stll aother 5% of the total varato uexplaed by the model. We must ask why ths could be, ad whether a more complex model ad/or cluso of addtoal depedet varables could expla much of ths apparetly resdual varato. A large R value does ot ecessarly mea a good model. Ideed, R ca artfcally hgh whe ether the slope of the equato s large or the spread of the depedet varable s large. Also a large R ca be obtaed whe straght les are ftted to data that dsplay o-lear relatoshps. Addtoal methods for assessg the ft of a model are therefore eeded ad wll be descrbed later. F STATISTIC A assessmet of the sgfcace of the regresso (or a test of the hypothess that β = 0) s made from the rato of the regresso mea square (MSR) to the resdual mea square MSE (s ) whch s a F-rato wth ad - degrees of freedom. Ths calculato s usually exhbted a aalyss of varace table produced by most computer programs. F = MSR [8] MSE It s mportat that a hghly sgfcat F rato should ot seduce the expermeter to a belef that the straght le fts the data superbly. The F test s smply a assessmet of the exted to whch the ftted le has a slope whch s dfferet from zero. If the slope of the le s ear zero, the scatter of the data pots about the le would eed to be small order to obta a sgfcat F rato. However, a stuato wth a slope very dfferet from zero ca gve a hghly sgfcat F rato wth a cosderable scatter of pots about the le. 8

19 b bs The F test as defed [8] s actually equvalet to the t test t = = s s s The F test s therefore ca be used for testg β = 0 versus β 0 ad s ot for testg oe-sded alteratves. x x. Example (cotued): I our example, the sum of squares due to regresso le s: SSR = ( yˆ y) = = ( ) + ( ) ( ) = whch s assocated wth df, hece ts mea square s The sum of squares due to resduals s: SSE = ( ˆ ) = y y = (-0.475) + (0.3450) (0.079) =.4656 ad s assocated wth 8- = 6 df, hece ts mea square s.4656 / 6 = The F statstc s the: F = / = Hece, the ANOVA table ca be set up as follows: Source df SS MS F-test 9

20 Regresso Resdual errors Total Accordgly, the coeffcet of determato s: R = 0.49 /.96 = Ths meas that 87.75% of total varato cholesterol betwee subjects s "explaed" by the regresso equato. //.3. ANALYSIS OF RESIDUALS AND THE DIAGNOSIS OF REGRESSION MODEL A resdual s defed as the dfferece betwee the observed ad predcted y value, gve by e = y y$, the value whch s ot accouted for by the regresso equato. Hece, a examato of ths term should reveal how approprate the equato s. However, these resduals do ot have costat varace. I fact, var(e ) = (-h )s, where h s the th dagoal elemet of the matrx H whch s such that y = Hy. H s called the "hat matrx", sce t defes the trasformato that puts the "hat" o y! I vew of ths, t s preferable to work wth the stadardsed resduals. I smple lear regresso case, the stadardsed resdual r s defed as: r e = [9] MSE These stadardsed resduals have mea 0 ad varace. We ca use r to verfy assumptos of the regresso model whch we made secto.. These are: (a) are the regresso fucto s ot lear; (b) the dstrbutos of Y (cholesterol) do ot have costat varace at all level of X (age) or equvaletly the resduals do ot have costat varaces; (c) the dstrbutos of Y are ot ormal or equvaletly the resduals are ot ormal; (d) the resduals are ot depedet. 0

21 Useful graphcal methods for examg the stadard assumptos of costat varace, ormalty of the error terms ad approprateess of the ftted model clude: - A plot of resduals agast ftted values to detfy outlers, detect systematc departure from the model or detect o-costat varace; - A plot of resduals the order whch the observatos were take to detect o-depedece. - A ormal probablty plot of the resduals to detect from ormalty. - A plot of resduals agast X ca dcate whether a trasformato of the orgal X varable s ecessary, whle a plot of resduals agast X varables omtted from the model could reveal whether the y varable depeds o the omtted factors. OUTLIERS Outlers regresso are observatos that are ot well ftted by the assumed model. Such observatos wll have large resduals. A crude rule of thumb s that a observato wth a stadardsed resdual greater tha.5 absolute value s a outler ad the source of that data pot should be vestgated, f possble. More ofte tha ot, the oly evdece that somethg has goe wrog the data geeratg process s provded by the outlers themselves! A sesble way of proceedg wth the aalyss s to determe whether those values have substatal effect o the fereces to be draw from the regresso aalyss, that s, whether they are fluetal. INFLUENTIAL OBSERVATIONS Geerally speakg, t s more mportat to focus o fluetal outlers. But t s ot oly outlers that ca be fluetal. If observato s separated from the others terms of the values of the X-varables, ths observato s lkely to fluece the ftted regresso mode. Observatos separated from other ths way wll have a large value of h. We call h the leverage, A rough gude s that observatos wth h > 3p/ are fluetal, where p s the umber of beta coeffcets the model ( our example p = ).

22 There s a problem (drawback) to usg the leverage to detfy fluetal values - t does ot cota ay formato about the value of the Y varable, oly the value of the X varables. To detect a fluetal observato, a atural statstc to use s a scaled verso of ( yˆ ( j) y ) where ŷ ( j) s the ftted value for the jth observato whe the th observato s omtted from the ft. Ths leads to the so-called Cook's statstc. Fortuately, to obta the value of ths statstc, we do ot eed to carry out a regresso ft, omttg each pot term, for the statstc gve by: D = r p h ( h ) Observatos wth relatvely large values of D are defed as fluetal. Example (cotued): Calculatos of studetsed resduals ad Cook's D statstc for each observato are gve the followg table: ID Observed Predcted Std Err Std. Res Cook's D * ** * *** * **** * * ** * ** **

23 Fgure : Plot of stadardsed resduals agast predcted value of y. 3

24 .4. SOME FINAL COMMENTS (A) INTERPRETATION OF CORRELATION The followg s a extract from D Altma's commets: "Correlato coeffcets le wth the raged - to +, wth the mdpot of zero dcatg o lear assocato betwee the two varables. A very small correlato does ot ecessarly dcate that two varables are ot assocated, however. To be sure of ths, we should study a plot of the data, because t s possble that the two varables dsplay a pecular (.e. o-lear) relatoshp. For example, we should ot observe much, f ay, correlato betwee the average mdday temperature ad caledar moth because there s a cyclc patter. More commo s the stuato of a curved relatoshp betwee two varables, such as betwee brthweght ad legth of gestato. I ths case, Pearso's r wll uderestmate the assocato as t s a measure of lear assocato. The rak correlato coeffcet s better here as t assesses a more geeral way whether the varables ted to rse together (or move opposte drecto). It s surprsg how umpressve a correlato of 0.5 or eve 0.7 s whe a correlato of ths magtude s sgfcat at p<0.05 level wth a sample sze of 9 or 5 subjects. Whether these are mportat s aother matter. Feste commeted o the lack of clcal relevace of a statstcally sgfcat correlato of less tha 0. foud a sample of The problem of clcal relevace s oe that must be judged o ts merts each case, ad depeds o the cotext. For example, the same small correlato may be mportat a epdemologcal study but umportat clcally. Oe way of lookg at the correlato that helps to modfy the over-ethusasm s to calculate the R-square value, whch s the percetage of the varablty of the data that s "explaed" by the assocato betwee the two varables. So, a correlato of 0.7 mples that just 49% of the varablty may be put dow to the observed assocato. Iterpretato of assocato s ofte problematc because causato ca ot be drectly ferred. If we observe a assocato betwee two varables X ad Y, there are several possble explaatos. Excludg the possblty that t s a chace fdg, t may be because: 4

25 X causes (flueces) Y Y flueces X Both X ad Y are flueced by oe or more other varables. Correlato s ofte used as a exploratory method for vestgatg terrelatoshps amog may varables, for whch purpose t s most obvous to use hypothess tests. Although prcple, ths approach s ofte over-doe. The problem s that eve wth a modest umber of varables, the umber of coeffcets s large: 0 varables yeld (0 x 9)/ = 45 r values, ad 0 varables yelds 90 r values. O of the 0 of these wll be sgfcat at the level of 5% purely by chace, ad these ca ot be dstgushed from geue assocato. Furthermore, the magtude of correlato that s sgfcat at 5% s depedet o sample sze. I a large sample,, eve f there are several sgfcat r values of aroud 0. to 0.3, say, these are ulkely to be very useful. Whle ths way of lookg at large umbers of varables ca be helpful whe oe really has o pror hypothess, sgfcat assocato really eeds to be cofrmed aother set of data before credece ca be gve to them. Aother commo problem of terpretato occurs whe we kow that each of two varables s assocated wth a thrd varables. For example, f X s postvely correlated wth Y ad Y s postvely correlated wth Z, t s temptg to say that X ad Z must be postvely correlated. Although ths may deed be true, such a ferece s ujustfed - we ca ot say aythg about the correlato betwee X ad Z. The same s true whe oe has observed o assocato. For example, Mazess et al (984) the correlato betwee age ad heght was 0.05 ad betwee weght ad %fat was Ths does ot mply that the correlato betwee age ad %fat was also ear zero. I fact, ths correlato was Correlato ca ot be ferred from drect assocatos." (B) INTERPRETATION OF REGRESSION The varablty amog a set of observatos may be partly attrbuted to kow factors ad partly to ukow factors; the latter s ofte termed "radom varato". I lear regresso, we see how much of the varablty the respose varable ca be attrbuted to dfferet values of the predctor varable, ad the scatter ether sde of the ftted le shows uexplaed varablty. Because of ths varablty, the ftted le s oly a estmated of the relato betwee these varables the populato. As wth other 5

26 estmates (such as a sample mea) there wll be ucertaty assocated wth the estmated slope ad tercept. The cofdece tervals for the whole le ad predcto tervals for dvdual subjects show other aspect of varablty. The latter are especally useful as regresso s ofte used to make predctos about dvduals. It should be remembered that the regresso le should ot be used to make predctos for X values outsde the rage of values the observed data. Such extrapolato s ujustfed as we have o evdece about the relatoshp beyod the observed data. A statstcal model s oly a approxmato. Oe rarely beleves, for example, that the true relatoshp s exactly lear, but the lear regresso equato s take as a reasoable approxmato for the observed data. Outsde the rage of the observed data oe ca ot safely use the same equato. Thus, we should ot use the regresso equato to predct value beyod what we have observed. 6

27 III. EXERCISES. Cosder the smple regresso equato [8]. Cosder the least square resduals, gve by y y$ where =,, 3,...,. Show that (a) y$ y =. (b) ( y ˆ ) y = 0 = =. The followg data represet dastolc blood pressures take durg rest. The x values deote the legth of tme mutes sce rest bega, ad the y values deote dastolc blood pressures. x: y: (a) Costruct a scatter plot. (b) Fd the coeffcet of correlato ad the lear regresso equato of y o x. (c) Calculate 95% cofdece terval for the slope ad tercept. (d) Calculate 95% cofdece terval for the predcted value of y whe x = The followg table shows restg metabolc rate (RMR) (kcal/4 hr) ad body weght (kg) of 44 wome (Owe et al 986). Wt: RMR: Wt: RMR: Wt: RMR: Wt: RMR: (a) Perform lear regresso aalyss of RMR o body weght. (b) Exame the dstrbuto of resduals. Is the aalyss vald? 7

28 (c) Obta a 95% cofdece terval for the slope of the le. (d) Is t possble to use a dvdual's weght to predct ther RMR to wth 50 kcal/4 hr? 4. Dgox s a drug that s largely elmated uchaged the ure. I a study by Halk et al. 975, whch the authors commeted: (a) Its real clearace was correlated wth create clearace; (b) t clearace was depedet of ure flow. The authors provded data showg measuremets of these three varables from 35 patets beg treated wth dgox for cogestve heart falure. Patet Clearace Create Dgox (ml/m/.73 m ) Ur flow (ml/m)

29 (a) Do these data support statemets (a) ad (b) above? 5. Cosder the followg 4 data sets. Note that X = X = X4. Ft a lear regresso equato wth Y as a depedet varable ad X as a depedet varable. What s the most strkg feature from these data sets. Carry out a resdual plot for each data set ad commet o the result. X Y X Y X3 Y3 X4 Y

Simple Linear Regression

Simple Linear Regression Statstcal Methods I (EST 75) Page 139 Smple Lear Regresso Smple regresso applcatos are used to ft a model descrbg a lear relatoshp betwee two varables. The aspects of least squares regresso ad correlato

More information

ENGI 3423 Simple Linear Regression Page 12-01

ENGI 3423 Simple Linear Regression Page 12-01 ENGI 343 mple Lear Regresso Page - mple Lear Regresso ometmes a expermet s set up where the expermeter has cotrol over the values of oe or more varables X ad measures the resultg values of aother varable

More information

Lecture Notes Types of economic variables

Lecture Notes Types of economic variables Lecture Notes 3 1. Types of ecoomc varables () Cotuous varable takes o a cotuum the sample space, such as all pots o a le or all real umbers Example: GDP, Polluto cocetrato, etc. () Dscrete varables fte

More information

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades

Multiple Regression. More than 2 variables! Grade on Final. Multiple Regression 11/21/2012. Exam 2 Grades. Exam 2 Re-grades STAT 101 Dr. Kar Lock Morga 11/20/12 Exam 2 Grades Multple Regresso SECTIONS 9.2, 10.1, 10.2 Multple explaatory varables (10.1) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (10.2) Trasformatos

More information

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model Lecture 7. Cofdece Itervals ad Hypothess Tests the Smple CLR Model I lecture 6 we troduced the Classcal Lear Regresso (CLR) model that s the radom expermet of whch the data Y,,, K, are the outcomes. The

More information

Summary of the lecture in Biostatistics

Summary of the lecture in Biostatistics Summary of the lecture Bostatstcs Probablty Desty Fucto For a cotuos radom varable, a probablty desty fucto s a fucto such that: 0 dx a b) b a dx A probablty desty fucto provdes a smple descrpto of the

More information

Simple Linear Regression

Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uversty Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ  1 STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS Recall Assumpto E(Y x) η 0 + η x (lear codtoal mea fucto) Data (x, y ), (x 2, y 2 ),, (x, y ) Least squares estmator ˆ E (Y x) ˆ " 0 + ˆ " x, where ˆ

More information

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model 1. Estmatg Model parameters Assumptos: ox ad y are related accordg to the smple lear regresso model (The lear regresso model s the model that says that x ad y are related a lear fasho, but the observed

More information

ESS Line Fitting

ESS Line Fitting ESS 5 014 17. Le Fttg A very commo problem data aalyss s lookg for relatoshpetwee dfferet parameters ad fttg les or surfaces to data. The smplest example s fttg a straght le ad we wll dscuss that here

More information

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Mean is only appropriate for interval or ratio scales, not ordinal or nominal. Mea Same as ordary average Sum all the data values ad dvde by the sample sze. x = ( x + x +... + x Usg summato otato, we wrte ths as x = x = x = = ) x Mea s oly approprate for terval or rato scales, ot

More information

Statistics MINITAB - Lab 5

Statistics MINITAB - Lab 5 Statstcs 10010 MINITAB - Lab 5 PART I: The Correlato Coeffcet Qute ofte statstcs we are preseted wth data that suggests that a lear relatoshp exsts betwee two varables. For example the plot below s of

More information

Chapter 13 Student Lecture Notes 13-1

Chapter 13 Student Lecture Notes 13-1 Chapter 3 Studet Lecture Notes 3- Basc Busess Statstcs (9 th Edto) Chapter 3 Smple Lear Regresso 4 Pretce-Hall, Ic. Chap 3- Chapter Topcs Types of Regresso Models Determg the Smple Lear Regresso Equato

More information

STA302/1001-Fall 2008 Midterm Test October 21, 2008

STA302/1001-Fall 2008 Midterm Test October 21, 2008 STA3/-Fall 8 Mdterm Test October, 8 Last Name: Frst Name: Studet Number: Erolled (Crcle oe) STA3 STA INSTRUCTIONS Tme allowed: hour 45 mutes Ads allowed: A o-programmable calculator A table of values from

More information

Probability and. Lecture 13: and Correlation

Probability and. Lecture 13: and Correlation 933 Probablty ad Statstcs for Software ad Kowledge Egeers Lecture 3: Smple Lear Regresso ad Correlato Mocha Soptkamo, Ph.D. Outle The Smple Lear Regresso Model (.) Fttg the Regresso Le (.) The Aalyss of

More information

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions. Ordary Least Squares egresso. Smple egresso. Algebra ad Assumptos. I ths part of the course we are gog to study a techque for aalysg the lear relatoshp betwee two varables Y ad X. We have pars of observatos

More information

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger

Example: Multiple linear regression. Least squares regression. Repetition: Simple linear regression. Tron Anders Moger Example: Multple lear regresso 5000,00 4000,00 Tro Aders Moger 0.0.007 brthweght 3000,00 000,00 000,00 0,00 50,00 00,00 50,00 00,00 50,00 weght pouds Repetto: Smple lear regresso We defe a model Y = β0

More information

Econometric Methods. Review of Estimation

Econometric Methods. Review of Estimation Ecoometrc Methods Revew of Estmato Estmatg the populato mea Radom samplg Pot ad terval estmators Lear estmators Ubased estmators Lear Ubased Estmators (LUEs) Effcecy (mmum varace) ad Best Lear Ubased Estmators

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis LINEA EGESSION ANALYSIS MODULE III Lecture - 4 Multple Lear egresso Aalyss Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur Cofdece terval estmato The cofdece tervals multple

More information

Chapter 8. Inferences about More Than Two Population Central Values

Chapter 8. Inferences about More Than Two Population Central Values Chapter 8. Ifereces about More Tha Two Populato Cetral Values Case tudy: Effect of Tmg of the Treatmet of Port-We tas wth Lasers ) To vestgate whether treatmet at a youg age would yeld better results tha

More information

Objectives of Multiple Regression

Objectives of Multiple Regression Obectves of Multple Regresso Establsh the lear equato that best predcts values of a depedet varable Y usg more tha oe eplaator varable from a large set of potetal predctors {,,... k }. Fd that subset of

More information

Chapter 14 Logistic Regression Models

Chapter 14 Logistic Regression Models Chapter 4 Logstc Regresso Models I the lear regresso model X β + ε, there are two types of varables explaatory varables X, X,, X k ad study varable y These varables ca be measured o a cotuous scale as

More information

Statistics: Unlocking the Power of Data Lock 5

Statistics: Unlocking the Power of Data Lock 5 STAT 0 Dr. Kar Lock Morga Exam 2 Grades: I- Class Multple Regresso SECTIONS 9.2, 0., 0.2 Multple explaatory varables (0.) Parttog varablty R 2, ANOVA (9.2) Codtos resdual plot (0.2) Exam 2 Re- grades Re-

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Postpoed exam: ECON430 Statstcs Date of exam: Jauary 0, 0 Tme for exam: 09:00 a.m. :00 oo The problem set covers 5 pages Resources allowed: All wrtte ad prted

More information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS Exam: ECON430 Statstcs Date of exam: Frday, December 8, 07 Grades are gve: Jauary 4, 08 Tme for exam: 0900 am 00 oo The problem set covers 5 pages Resources allowed:

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlato ad Smple Lear Regresso Berl Che Departmet of Computer Scece & Iformato Egeerg Natoal Tawa Normal Uverst Referece:. W. Navd. Statstcs for Egeerg ad Scetsts. Chapter 7 (7.-7.3) & Teachg Materal

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 00 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for the

More information

Regresso What s a Model? 1. Ofte Descrbe Relatoshp betwee Varables 2. Types - Determstc Models (o radomess) - Probablstc Models (wth radomess) EPI 809/Sprg 2008 9 Determstc Models 1. Hypothesze

More information

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections ENGI 441 Jot Probablty Dstrbutos Page 7-01 Jot Probablty Dstrbutos [Navd sectos.5 ad.6; Devore sectos 5.1-5.] The jot probablty mass fucto of two dscrete radom quattes, s, P ad p x y x y The margal probablty

More information

Lecture 8: Linear Regression

Lecture 8: Linear Regression Lecture 8: Lear egresso May 4, GENOME 56, Sprg Goals Develop basc cocepts of lear regresso from a probablstc framework Estmatg parameters ad hypothess testg wth lear models Lear regresso Su I Lee, CSE

More information

Lecture 3. Sampling, sampling distributions, and parameter estimation

Lecture 3. Sampling, sampling distributions, and parameter estimation Lecture 3 Samplg, samplg dstrbutos, ad parameter estmato Samplg Defto Populato s defed as the collecto of all the possble observatos of terest. The collecto of observatos we take from the populato s called

More information

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution: Chapter 4 Exercses Samplg Theory Exercse (Smple radom samplg: Let there be two correlated radom varables X ad A sample of sze s draw from a populato by smple radom samplg wthout replacemet The observed

More information

residual. (Note that usually in descriptions of regression analysis, upper-case

residual. (Note that usually in descriptions of regression analysis, upper-case Regresso Aalyss Regresso aalyss fts or derves a model that descres the varato of a respose (or depedet ) varale as a fucto of oe or more predctor (or depedet ) varales. The geeral regresso model s oe of

More information

Functions of Random Variables

Functions of Random Variables Fuctos of Radom Varables Chapter Fve Fuctos of Radom Varables 5. Itroducto A geeral egeerg aalyss model s show Fg. 5.. The model output (respose) cotas the performaces of a system or product, such as weght,

More information

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model

ECON 482 / WH Hong The Simple Regression Model 1. Definition of the Simple Regression Model ECON 48 / WH Hog The Smple Regresso Model. Defto of the Smple Regresso Model Smple Regresso Model Expla varable y terms of varable x y = β + β x+ u y : depedet varable, explaed varable, respose varable,

More information

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity ECONOMETRIC THEORY MODULE VIII Lecture - 6 Heteroskedastcty Dr. Shalabh Departmet of Mathematcs ad Statstcs Ida Isttute of Techology Kapur . Breusch Paga test Ths test ca be appled whe the replcated data

More information

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn: Chapter 3 3- Busess Statstcs: A Frst Course Ffth Edto Chapter 2 Correlato ad Smple Lear Regresso Busess Statstcs: A Frst Course, 5e 29 Pretce-Hall, Ic. Chap 2- Learg Objectves I ths chapter, you lear:

More information

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation

Simple Linear Regression and Correlation. Applied Statistics and Probability for Engineers. Chapter 11 Simple Linear Regression and Correlation 4//6 Appled Statstcs ad Probablty for Egeers Sth Edto Douglas C. Motgomery George C. Ruger Chapter Smple Lear Regresso ad Correlato CHAPTER OUTLINE Smple Lear Regresso ad Correlato - Emprcal Models -8

More information

Lecture 1 Review of Fundamental Statistical Concepts

Lecture 1 Review of Fundamental Statistical Concepts Lecture Revew of Fudametal Statstcal Cocepts Measures of Cetral Tedecy ad Dsperso A word about otato for ths class: Idvduals a populato are desgated, where the dex rages from to N, ad N s the total umber

More information

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best Error Aalyss Preamble Wheever a measuremet s made, the result followg from that measuremet s always subject to ucertaty The ucertaty ca be reduced by makg several measuremets of the same quatty or by mprovg

More information

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 1 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uversty Mchael Bar Sprg 5 Mdterm am, secto Soluto Thursday, February 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes eam.. No calculators of ay kd are allowed..

More information

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs

CLASS NOTES. for. PBAF 528: Quantitative Methods II SPRING Instructor: Jean Swanson. Daniel J. Evans School of Public Affairs CLASS NOTES for PBAF 58: Quattatve Methods II SPRING 005 Istructor: Jea Swaso Dael J. Evas School of Publc Affars Uversty of Washgto Ackowledgemet: The structor wshes to thak Rachel Klet, Assstat Professor,

More information

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use.

The equation is sometimes presented in form Y = a + b x. This is reasonable, but it s not the notation we use. INTRODUCTORY NOTE ON LINEAR REGREION We have data of the form (x y ) (x y ) (x y ) These wll most ofte be preseted to us as two colum of a spreadsheet As the topc develops we wll see both upper case ad

More information

ENGI 4421 Propagation of Error Page 8-01

ENGI 4421 Propagation of Error Page 8-01 ENGI 441 Propagato of Error Page 8-01 Propagato of Error [Navd Chapter 3; ot Devore] Ay realstc measuremet procedure cotas error. Ay calculatos based o that measuremet wll therefore also cota a error.

More information

4. Standard Regression Model and Spatial Dependence Tests

4. Standard Regression Model and Spatial Dependence Tests 4. Stadard Regresso Model ad Spatal Depedece Tests Stadard regresso aalss fals the presece of spatal effects. I case of spatal depedeces ad/or spatal heterogeet a stadard regresso model wll be msspecfed.

More information

Linear Regression with One Regressor

Linear Regression with One Regressor Lear Regresso wth Oe Regressor AIM QA.7. Expla how regresso aalyss ecoometrcs measures the relatoshp betwee depedet ad depedet varables. A regresso aalyss has the goal of measurg how chages oe varable,

More information

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then Secto 5 Vectors of Radom Varables Whe workg wth several radom varables,,..., to arrage them vector form x, t s ofte coveet We ca the make use of matrx algebra to help us orgaze ad mapulate large umbers

More information

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes

Midterm Exam 1, section 2 (Solution) Thursday, February hour, 15 minutes coometrcs, CON Sa Fracsco State Uverst Mchael Bar Sprg 5 Mdterm xam, secto Soluto Thursda, Februar 6 hour, 5 mutes Name: Istructos. Ths s closed book, closed otes exam.. No calculators of a kd are allowed..

More information

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y. .46. a. The frst varable (X) s the frst umber the par ad s plotted o the horzotal axs, whle the secod varable (Y) s the secod umber the par ad s plotted o the vertcal axs. The scatterplot s show the fgure

More information

Chapter Statistics Background of Regression Analysis

Chapter Statistics Background of Regression Analysis Chapter 06.0 Statstcs Backgroud of Regresso Aalyss After readg ths chapter, you should be able to:. revew the statstcs backgroud eeded for learg regresso, ad. kow a bref hstory of regresso. Revew of Statstcal

More information

Lecture 3 Probability review (cont d)

Lecture 3 Probability review (cont d) STATS 00: Itroducto to Statstcal Iferece Autum 06 Lecture 3 Probablty revew (cot d) 3. Jot dstrbutos If radom varables X,..., X k are depedet, the ther dstrbuto may be specfed by specfyg the dvdual dstrbuto

More information

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance

Chapter 13, Part A Analysis of Variance and Experimental Design. Introduction to Analysis of Variance. Introduction to Analysis of Variance Chapter, Part A Aalyss of Varace ad Epermetal Desg Itroducto to Aalyss of Varace Aalyss of Varace: Testg for the Equalty of Populato Meas Multple Comparso Procedures Itroducto to Aalyss of Varace Aalyss

More information

Chapter 5 Properties of a Random Sample

Chapter 5 Properties of a Random Sample Lecture 6 o BST 63: Statstcal Theory I Ku Zhag, /0/008 Revew for the prevous lecture Cocepts: t-dstrbuto, F-dstrbuto Theorems: Dstrbutos of sample mea ad sample varace, relatoshp betwee sample mea ad sample

More information

TESTS BASED ON MAXIMUM LIKELIHOOD

TESTS BASED ON MAXIMUM LIKELIHOOD ESE 5 Toy E. Smth. The Basc Example. TESTS BASED ON MAXIMUM LIKELIHOOD To llustrate the propertes of maxmum lkelhood estmates ad tests, we cosder the smplest possble case of estmatg the mea of the ormal

More information

Third handout: On the Gini Index

Third handout: On the Gini Index Thrd hadout: O the dex Corrado, a tala statstca, proposed (, 9, 96) to measure absolute equalt va the mea dfferece whch s defed as ( / ) where refers to the total umber of dvduals socet. Assume that. The

More information

Simple Linear Regression and Correlation.

Simple Linear Regression and Correlation. Smple Lear Regresso ad Correlato. Correspods to Chapter 0 Tamhae ad Dulop Sldes prepared b Elzabeth Newto (MIT) wth some sldes b Jacquele Telford (Johs Hopks Uverst) Smple lear regresso aalss estmates

More information

Chapter 3 Sampling For Proportions and Percentages

Chapter 3 Sampling For Proportions and Percentages Chapter 3 Samplg For Proportos ad Percetages I may stuatos, the characterstc uder study o whch the observatos are collected are qualtatve ature For example, the resposes of customers may marketg surveys

More information

Chapter Two. An Introduction to Regression ( )

Chapter Two. An Introduction to Regression ( ) ubject: A Itroducto to Regresso Frst tage Chapter Two A Itroducto to Regresso (018-019) 1 pg. ubject: A Itroducto to Regresso Frst tage A Itroducto to Regresso Regresso aalss s a statstcal tool for the

More information

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018

Statistics. Correlational. Dr. Ayman Eldeib. Simple Linear Regression and Correlation. SBE 304: Linear Regression & Correlation 1/3/2018 /3/08 Sstems & Bomedcal Egeerg Departmet SBE 304: Bo-Statstcs Smple Lear Regresso ad Correlato Dr. Ama Eldeb Fall 07 Descrptve Orgasg, summarsg & descrbg data Statstcs Correlatoal Relatoshps Iferetal Geeralsg

More information

Module 7: Probability and Statistics

Module 7: Probability and Statistics Lecture 4: Goodess of ft tests. Itroducto Module 7: Probablty ad Statstcs I the prevous two lectures, the cocepts, steps ad applcatos of Hypotheses testg were dscussed. Hypotheses testg may be used to

More information

Multiple Choice Test. Chapter Adequacy of Models for Regression

Multiple Choice Test. Chapter Adequacy of Models for Regression Multple Choce Test Chapter 06.0 Adequac of Models for Regresso. For a lear regresso model to be cosdered adequate, the percetage of scaled resduals that eed to be the rage [-,] s greater tha or equal to

More information

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1

Wu-Hausman Test: But if X and ε are independent, βˆ. ECON 324 Page 1 Wu-Hausma Test: Detectg Falure of E( ε X ) Caot drectly test ths assumpto because lack ubased estmator of ε ad the OLS resduals wll be orthogoal to X, by costructo as ca be see from the momet codto X'

More information

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model Chapter 3 Asmptotc Theor ad Stochastc Regressors The ature of eplaator varable s assumed to be o-stochastc or fed repeated samples a regresso aalss Such a assumpto s approprate for those epermets whch

More information

22 Nonparametric Methods.

22 Nonparametric Methods. 22 oparametrc Methods. I parametrc models oe assumes apror that the dstrbutos have a specfc form wth oe or more ukow parameters ad oe tres to fd the best or atleast reasoably effcet procedures that aswer

More information

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1

C. Statistics. X = n geometric the n th root of the product of numerical data ln X GM = or ln GM = X 2. X n X 1 C. Statstcs a. Descrbe the stages the desg of a clcal tral, takg to accout the: research questos ad hypothess, lterature revew, statstcal advce, choce of study protocol, ethcal ssues, data collecto ad

More information

Chapter 11 The Analysis of Variance

Chapter 11 The Analysis of Variance Chapter The Aalyss of Varace. Oe Factor Aalyss of Varace. Radomzed Bloc Desgs (ot for ths course) NIPRL . Oe Factor Aalyss of Varace.. Oe Factor Layouts (/4) Suppose that a expermeter s terested populatos

More information

Correlation and Regression Analysis

Correlation and Regression Analysis Chapter V Correlato ad Regresso Aalss R. 5.. So far we have cosdered ol uvarate dstrbutos. Ma a tme, however, we come across problems whch volve two or more varables. Ths wll be the subject matter of the

More information

Chapter 4 Multiple Random Variables

Chapter 4 Multiple Random Variables Revew for the prevous lecture: Theorems ad Examples: How to obta the pmf (pdf) of U = g (, Y) ad V = g (, Y) Chapter 4 Multple Radom Varables Chapter 44 Herarchcal Models ad Mxture Dstrbutos Examples:

More information

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen. .5 x 54.5 a. x 7. 786 7 b. The raked observatos are: 7.4, 7.5, 7.7, 7.8, 7.9, 8.0, 8.. Sce the sample sze 7 s odd, the meda s the (+)/ 4 th raked observato, or meda 7.8 c. The cosumer would more lkely

More information

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA THE ROYAL STATISTICAL SOCIETY EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA PAPER II STATISTICAL THEORY & METHODS The Socety provdes these solutos to assst caddates preparg for the examatos future years ad for

More information

Analysis of Variance with Weibull Data

Analysis of Variance with Weibull Data Aalyss of Varace wth Webull Data Lahaa Watthaacheewaul Abstract I statstcal data aalyss by aalyss of varace, the usual basc assumptos are that the model s addtve ad the errors are radomly, depedetly, ad

More information

Simple Linear Regression - Scalar Form

Simple Linear Regression - Scalar Form Smple Lear Regresso - Scalar Form Q.. Model Y X,..., p..a. Derve the ormal equatos that mmze Q. p..b. Solve for the ordary least squares estmators, p..c. Derve E, V, E, V, COV, p..d. Derve the mea ad varace

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1 STA 08 Appled Lear Models: Regresso Aalyss Sprg 0 Soluto for Homework #. Let Y the dollar cost per year, X the umber of vsts per year. The the mathematcal relato betwee X ad Y s: Y 300 + X. Ths s a fuctoal

More information

Class 13,14 June 17, 19, 2015

Class 13,14 June 17, 19, 2015 Class 3,4 Jue 7, 9, 05 Pla for Class3,4:. Samplg dstrbuto of sample mea. The Cetral Lmt Theorem (CLT). Cofdece terval for ukow mea.. Samplg Dstrbuto for Sample mea. Methods used are based o CLT ( Cetral

More information

Chapter 2 Simple Linear Regression

Chapter 2 Simple Linear Regression Chapter Smple Lear Regresso. Itroducto ad Least Squares Estmates Regresso aalyss s a method for vestgatg the fuctoal relatoshp amog varables. I ths chapter we cosder problems volvg modelg the relatoshp

More information

Logistic regression (continued)

Logistic regression (continued) STAT562 page 138 Logstc regresso (cotued) Suppose we ow cosder more complex models to descrbe the relatoshp betwee a categorcal respose varable (Y) that takes o two (2) possble outcomes ad a set of p explaatory

More information

STA 105-M BASIC STATISTICS (This is a multiple choice paper.)

STA 105-M BASIC STATISTICS (This is a multiple choice paper.) DCDM BUSINESS SCHOOL September Mock Eamatos STA 0-M BASIC STATISTICS (Ths s a multple choce paper.) Tme: hours 0 mutes INSTRUCTIONS TO CANDIDATES Do ot ope ths questo paper utl you have bee told to do

More information

Bootstrap Method for Testing of Equality of Several Coefficients of Variation

Bootstrap Method for Testing of Equality of Several Coefficients of Variation Cloud Publcatos Iteratoal Joural of Advaced Mathematcs ad Statstcs Volume, pp. -6, Artcle ID Sc- Research Artcle Ope Access Bootstrap Method for Testg of Equalty of Several Coeffcets of Varato Dr. Navee

More information

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1

Handout #8. X\Y f(x) 0 1/16 1/ / /16 3/ / /16 3/16 0 3/ /16 1/16 1/8 g(y) 1/16 1/4 3/8 1/4 1/16 1 Hadout #8 Ttle: Foudatos of Ecoometrcs Course: Eco 367 Fall/05 Istructor: Dr. I-Mg Chu Lear Regresso Model So far we have focused mostly o the study of a sgle radom varable, ts correspodg theoretcal dstrbuto,

More information

Homework Solution (#5)

Homework Solution (#5) Homework Soluto (# Chapter : #6,, 8(b, 3, 4, 44, 49, 3, 9 ad 7 Chapter. Smple Lear Regresso ad Correlato.6 (6 th edto 7, old edto Page 9 Rafall volume ( vs Ruoff volume ( : 9 8 7 6 4 3 : a. Yes, the scatter-plot

More information

X ε ) = 0, or equivalently, lim

X ε ) = 0, or equivalently, lim Revew for the prevous lecture Cocepts: order statstcs Theorems: Dstrbutos of order statstcs Examples: How to get the dstrbuto of order statstcs Chapter 5 Propertes of a Radom Sample Secto 55 Covergece

More information

University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation

University of Belgrade. Faculty of Mathematics. Master thesis Regression and Correlation Uversty of Belgrade Vrtual Lbrary of Faculty of Mathematcs - Uversty of Belgrade Faculty of Mathematcs Master thess Regresso ad Correlato The caddate Supervsor Karma Ibrahm Soufya Vesa Jevremovć Jue 014

More information

ε. Therefore, the estimate

ε. Therefore, the estimate Suggested Aswers, Problem Set 3 ECON 333 Da Hugerma. Ths s ot a very good dea. We kow from the secod FOC problem b) that ( ) SSE / = y x x = ( ) Whch ca be reduced to read y x x = ε x = ( ) The OLS model

More information

STK3100 and STK4100 Autumn 2018

STK3100 and STK4100 Autumn 2018 SK3 ad SK4 Autum 8 Geeralzed lear models Part III Covers the followg materal from chaters 4 ad 5: Cofdece tervals by vertg tests Cosder a model wth a sgle arameter β We may obta a ( α% cofdece terval for

More information

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations

hp calculators HP 30S Statistics Averages and Standard Deviations Average and Standard Deviation Practice Finding Averages and Standard Deviations HP 30S Statstcs Averages ad Stadard Devatos Average ad Stadard Devato Practce Fdg Averages ad Stadard Devatos HP 30S Statstcs Averages ad Stadard Devatos Average ad stadard devato The HP 30S provdes several

More information

CHAPTER VI Statistical Analysis of Experimental Data

CHAPTER VI Statistical Analysis of Experimental Data Chapter VI Statstcal Aalyss of Expermetal Data CHAPTER VI Statstcal Aalyss of Expermetal Data Measuremets do ot lead to a uque value. Ths s a result of the multtude of errors (maly radom errors) that ca

More information

MEASURES OF DISPERSION

MEASURES OF DISPERSION MEASURES OF DISPERSION Measure of Cetral Tedecy: Measures of Cetral Tedecy ad Dsperso ) Mathematcal Average: a) Arthmetc mea (A.M.) b) Geometrc mea (G.M.) c) Harmoc mea (H.M.) ) Averages of Posto: a) Meda

More information

Based on Neter, Wasserman and Whitemore: Applied Statistics, Chapter 18, pp

Based on Neter, Wasserman and Whitemore: Applied Statistics, Chapter 18, pp SERIES V: REGRESSION ANALYSIS Based o Neter, Wasserma ad Whtemore: Appled Statstcs, Chapter 8, pp. 53-64. I. Itroducto assumptos of the model ad ts developmet The regresso aalyss aother mode of the aalyss

More information

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere

Econ 388 R. Butler 2016 rev Lecture 5 Multivariate 2 I. Partitioned Regression and Partial Regression Table 1: Projections everywhere Eco 388 R. Butler 06 rev Lecture 5 Multvarate I. Parttoed Regresso ad Partal Regresso Table : Projectos everywhere P = ( ) ad M = I ( ) ad s a vector of oes assocated wth the costat term Sample Model Regresso

More information

is the score of the 1 st student, x

is the score of the 1 st student, x 8 Chapter Collectg, Dsplayg, ad Aalyzg your Data. Descrptve Statstcs Sectos explaed how to choose a sample, how to collect ad orgaze data from the sample, ad how to dsplay your data. I ths secto, you wll

More information

Simulation Output Analysis

Simulation Output Analysis Smulato Output Aalyss Summary Examples Parameter Estmato Sample Mea ad Varace Pot ad Iterval Estmato ermatg ad o-ermatg Smulato Mea Square Errors Example: Sgle Server Queueg System x(t) S 4 S 4 S 3 S 5

More information

Continuous Distributions

Continuous Distributions 7//3 Cotuous Dstrbutos Radom Varables of the Cotuous Type Desty Curve Percet Desty fucto, f (x) A smooth curve that ft the dstrbuto 3 4 5 6 7 8 9 Test scores Desty Curve Percet Probablty Desty Fucto, f

More information

Introduction to F-testing in linear regression models

Introduction to F-testing in linear regression models ECON 43 Harald Goldste, revsed Nov. 4 Itroducto to F-testg lear regso s (Lecture ote to lecture Frday 4..4) Itroducto A F-test usually s a test where several parameters are volved at oce the ull hypothess

More information

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura Statstcs Descrptve ad Iferetal Statstcs Istructor: Dasuke Nagakura (agakura@z7.keo.jp) 1 Today s topc Today, I talk about two categores of statstcal aalyses, descrptve statstcs ad feretal statstcs, ad

More information

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4

Reaction Time VS. Drug Percentage Subject Amount of Drug Times % Reaction Time in Seconds 1 Mary John Carl Sara William 5 4 CHAPTER Smple Lear Regreo EXAMPLE A expermet volvg fve ubject coducted to determe the relatohp betwee the percetage of a certa drug the bloodtream ad the legth of tme t take the ubject to react to a tmulu.

More information

Lecture 2: Linear Least Squares Regression

Lecture 2: Linear Least Squares Regression Lecture : Lear Least Squares Regresso Dave Armstrog UW Mlwaukee February 8, 016 Is the Relatoshp Lear? lbrary(car) data(davs) d 150) Davs$weght[d]

More information

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek Partally Codtoal Radom Permutato Model 7- vestgato of Partally Codtoal RP Model wth Respose Error TRODUCTO Ed Staek We explore the predctor that wll result a smple radom sample wth respose error whe a

More information

STK3100 and STK4100 Autumn 2017

STK3100 and STK4100 Autumn 2017 SK3 ad SK4 Autum 7 Geeralzed lear models Part III Covers the followg materal from chaters 4 ad 5: Sectos 4..5, 4.3.5, 4.3.6, 4.4., 4.4., ad 4.4.3 Sectos 5.., 5.., ad 5.5. Ørulf Borga Deartmet of Mathematcs

More information

"It is the mark of a truly intelligent person to be moved by statistics." George Bernard Shaw

It is the mark of a truly intelligent person to be moved by statistics. George Bernard Shaw Chapter 0 Chapter 0 Lear Regresso ad Correlato "It s the mark of a truly tellget perso to be moved by statstcs." George Berard Shaw Source: https://www.google.com.ph/search?q=house+ad+car+pctures&bw=366&bh=667&tbm

More information

CHAPTER 2. = y ˆ β x (.1022) So we can write

CHAPTER 2. = y ˆ β x (.1022) So we can write CHAPTER SOLUTIONS TO PROBLEMS. () Let y = GPA, x = ACT, ad = 8. The x = 5.875, y = 3.5, (x x )(y y ) = 5.85, ad (x x ) = 56.875. From equato (.9), we obta the slope as ˆβ = = 5.85/56.875., rouded to four

More information