Chemometrics. Unit 2: Regression Analysis

Size: px
Start display at page:

Download "Chemometrics. Unit 2: Regression Analysis"

Transcription

1 Chemometrcs Unt : Regresson Analyss The problem of predctng the average value of one varable n terms of the known values of other varables s the problem of regresson. In carryng out a regresson analyss t s necessary to ft data to some assumed equaton usng the method of least squares. However there s more to a regresson analyss than just dong a least squares ft and ths s explaned n what follows. In analytcal chemstry t s very common for the assumed equaton to be of the form y = a + bx.e. a lnear equaton wth two varables, x and y. For example y could be the absorbance of a solute n soluton and x the concentraton of the solute, (Beer's Law). In ths example the ntercept s, n prncple, zero, so that the equaton s y = bx. Before the advent of computers and cheap calculators t was customary to ft the data by plottng a (straght lne) graph of y versus x. Ths was the "calbraton graph" or the workng curve and was used to nterpolate the concentraton of an "unknown" soluton drectly. In other words the gradent "b" (and the ntercept "a" f applcable) was not even calculated; rather the determnaton of the unknown concentraton was carred out graphcally. It s now qute common to determne the gradent and ntercept usng a computer to carry out the method of least squares. Ths s a mathematcally precse way of drawng a lne of "best ft" through a seres of plotted ponts whch, n general, wll not all fall perfectly on the drawn graph. It s possble to carry out the least squares ft, and hence obtan values for "a" and "b", wthout performng any statstcal analyss. All that results from such a procedure s a value for "a" and "b" wthout any ndcaton as to how "good" these values are. Furthermore unless the graph s drawn as well, there wll be no ndcaton as to how well the data fts a straght lne relatonshp. After all the program n the computer or calculator wll probably carry out the least squares ft to an assumed lnear equaton even f the plotted ponts form a crcle! Now drawng the graph does gve a vsual ndcaton of how well the ponts do ft a straght lne and some sort of measure of ths factor s necessary. It s, of course, possble to carry out some smple statstcal calculatons whch wll gve a quanttatve measure of the "goodness of ft" and when these calculatons are added to the least squares calculaton then the sum total s a regresson analyss. However, before gong on to the statstcal detals there s an mportant and, unfortunately, an often neglected matter to be dscussed. To lead up to ths matter t s frst necessary to gve some statstcal background to the topc of regresson analyss. However, what follows must not be regarded as a rgorous treatment. Consder the equaton y = a + bx. It s assumed the "x" varables are fxed n advance and that any observaton errors are neglgbly small. The varable y does possess a random error, but t s assumed there are no systematc errors. Thus the followng can be wrtten: y = α + ß*x + e () where t s assumed that the true relaton between x and y can be represented by ths equaton. e s the dfference between y and the expected value, Y; the expected value beng that gven by (α + ß*x ). Let Y be the expected value of y so that: e = y -Y () For each value of x the value of Y s taken to represent the average (arthmetc mean) of a theoretcally nfnte number of y values whch are normally dstrbuted. Thus there s a normal dstrbuton of y's for each value of x, the standard devaton of each dstrbuton beng assumed to be the same.

2 Fnally t s also assumed that e = and the mean of all the e s s also zero. (If the least squares ft lne s accurately calculated then the algebrac sum of all the devatons e from the lne should be zero.) Once the least squares ft lne s calculated then ths s referred to as the lne of regresson of y on x. The other case of a regresson of x on y would nvolve assumng a seres of essentally error free y values and the assocated values of x such that x = α + ß *y + e. Obvously, n practce t s smply a matter of choosng whch varable s to have the seres of fxed values and nomnatng ths as, say, x (the ndependent varable), then always wrtng y = a + ß*x + e where y s referred to as the dependent varable. Note also another name for x s the "explanatory varable" rather than the "ndependent varable". In carryng out the least squares ft we smply wrte y = a + bx where the fnally calculated values of a and b are taken as estmated of the true values α and ß (estmates snce a fnte populaton was taken). Method of Least Squares. The least squares method nvolves the mnmsaton of SSE ( sum of error squares ) where SSE = e = ( y Y ) () and b = n ( x x)( y y) = n = ( x x) =S yx /S xx (4) n n a = y b x = = a nd b are unbased estmators of the true parameters α (5) and ß (Mntab) Let the x data be n column C and the y data n column C mtb> regress C C The number s necessary as ths ndcates the number of predctors, and s always for lnear regresson (see multple regresson secton for more detals). Analyss of varance for Regresson If σ s the varance of regresson (equvalent ot the varablty of the data about the regresson lne) then s yx s an unbased estmator of σ s yx = SSE/(n-) = n = ( y Y ) n (6)

3 = (S yy -bs xy )/(n-) where S yy = ( y y) n = (7) S yx was defned n equaton 4. Once the least squares ft has been carred out then a number of questons may be asked about the results: () How good are the values of a and b as obtaned by the least squares ft? Obvously the values were determned from a gven sample (of x's and y's) and f the calculaton was repeated wth a dfferent sample then dfferent values of a and b would, n general, be obtaned. Hence t s desrable to have some measure of the error n a and b. () Suppose once a and b are known that a value of y s calculated for a gven value of x. In other words a value for y s beng predcted for some future observaton. The queston that naturally arses-s can two confdence lmts be constructed such that t can be asserted wth a probablty of, say,.95 that these lmts wll contan the next observed value of y? () There s another queston whch s subtly dfferent to () above. The least squares ft lne s, n effect, a seres of calculated y values for each of the values of x chosen. ane mght requre confdence lmts for ths lne.e. two lmts such that one could quote a gven probablty that the least squares lne would be contaned by these lmts. In practce ths means askng how good an estmate s a calculated value of y of the true average value of y for the value of x n queston. Ths queston s related to queston () n that t asks somethng of the calculated lne (whch s defned by a and b n queston ()). Thus the confdence lmts whch show the range wthn whch t s beleved the estmated average y values le ndcate the precson wth whch the average y's have been estmated.e. the varablty of Y (.e the value of y calculated by equaton ). They do NOT ndcate the spread of ndvdual values of observed y s. On the other hand f the nterest s n predctng the next observaton of y (whch, n general, wll not le on the calculated lne) then confdence lmts must be constructed for that specfc event. Ths s what queston () s about and those confdence lmts relate to the varablty, not of Y, but of the error (Y - y ). Ths pont s dscussed further n the case studes. (4) What s the standard devaton of the sample data? Ths queston s a lttle vague as stated. More specfcally, then, one can calculate the devatons of the plotted ponts, Y, from the lne of least squares ft.e. (Y - y ). Then one obtans the standard devaton of these varous dfferences (remember the algebrac mean of the dfferences should be zero f the least squares ft has been accurately carred out). Ths standard devaton, once calculated, obvously gves a measure of how scattered the ponts are, around the lne of least squares ft. Thus ths pont s concerned wth how "good" s the lne of best ft. In fact ths leads to another matter, correlaton,and wll be dealt wth later under that headng. Now the above background has been revsed t s possble to return to the neglected pont mentoned earler. Consder queston () above and suppose the least squares ft was carred out for a graph of absorbance versus concentraton. Obvously once the curve has been constructed the chemst s not n the slghtest bt nterested n predctng the next absorbance value, should he or she prepare another standard calbraton soluton. In fact, the stuaton s qute the reverse. The absorbance of an "unknown" soluton wll be measured and what s requred s a predcton of the concentraton. Usng the symbols y and x as used prevously t s x that must be predcted from y not y from x. Therefore the treatment n standard elementary texts on statstcs s not complete for the ''calbraton problem" snce authors wll commonly, and naturally, assume that a regresson of y on x wll be used to make future predctons about the dependent varable y gven a value for x, not the reverse. So a ffth queston should be added to the above lst vz:

4 (5) Once the least squares ft has been done then, gven a measurement y, can one form an estmate of how good s the correspondng calculated value of x? How good are the values of a and b? (Queston l) Suppose a and b were determned several tmes usng several sets of sample data. A range of, say, b values would be expected and ths dstrbuton of b values would have some standard devaton σ b. It s possble to form an estmate of ths standard devaton usng the calculatons from a gven least squares ft-(to a gven sample set). If the dstrbuton of b values s consdered to be a normal one then the usual argument of 95% of observed b values lyng wthn +/-σ b of the mean value of b may be used. However, because the estmate of σ b wll usually come from a relatvely small sample t s better to use a t dstrbuton. Once ths s done confdence lmts, say at the 95% level, may be readly calculated for b. Standard Devaton of b Let s b be the estmate of the true standard devaton, σ b,of the gradent of the straght lne, b. s b = s yx x ( x) / n... (8). Confdence lmts for b Suppose the confdence lmts are to be the 95% lmts.e. (l-α )=.95 or α =.5 as the level of sgnfcance. Then the value of t (for a gven number of degrees of freedom) to be used s t for α / snce the confdence nterval wll span a 5% nterval as +.5 ether sde of nomnal value. In general, then, f the confdence level s to be (l-α ), approprate value of t s t for α / (and the requred number of degrees of freedom). The expresson for the confdence lmts s ß = b+/- t s a/ yx ( ( x x) ). 5 = t S s a/ yx xx = t a/ *s b... (9) where S xx = ( x x) where the number of degrees of freedom approprate for t a/ s (n-) Standard devaton of a The standard devaton of a s gven by: ( x) / n s a = s yx ( / n + n x ( x) )... () and α = a +/- a/ yx t s x ns xx = a +/- t a/ *s a... ()

5 t tests for sgnfcance of α and ß The null hypothess H : ß = can be tested usng a t-test (.e testng for any sgnfcant relatonshp between y and x) The t statstc s calculated as follows: t = b/(s yx /S xx.5 ) (n- degrees of freedom) Note that ths t value s expermentally determned and should be compared wth a value found from t tables.e the crtcal value at α =.5 and n- degrees of freedom. Smlarly, to test whether the regresson lne has a sgnfcantly non-zero ntercept.e H o : α = can be tested usng:- t = a S x ns yx / xx... () Standard devaton of sample data? (Queston (4)) The standard devaton of the sample data as estmated from the "scatter" of ponts around the lne of least squares best ft s s yx whch has been quoted prevously (equaton 6) CORRELATION The above s a small begnnng to the queston of how well the least squares lne actually fts the data. In fact t s qute nadequate because a more detaled consderaton rases the queston as to how much of the total varaton of the y's can be attrbuted to the relatonshp wth x, and how much can be attrbuted to all other factors, ncludng chance. After all one reason the lne may not ft well s that the relatonshp s not lnear. On the other hand to the extent the relatonshp between y and x s lnear then one of the reasons the dfferent y's have dfferent values s smply that y vares lnearly wth x. The rest of the varaton (whch leads to the plotted ponts beng scattered around the least squares lne) may be due to expermental error. It s possble to derve a quantty whch measures the proporton of the total varaton of the y's that can be attrbuted to the lnear relatonshp wth x. Note that the quantty gven below measures the degree of lnear relatonshp. The data may ft a parabola almost perfectly but the lnear correlaton wll be very poor. Coeffcent of Determnaton and Coeffcent of Correlaton The quantty referred to mmedately above s the coeffcent of determnaton. The square root of ths quantty s the coeffcent of correlaton. Symbols used are R and R respectvely, wth R always postve. When the square root s extracted to gve R t s customary to choose the sgn of R to concde wth the sgn of the gradent b. The equaton for R s: r = n xy x y... () ([ n x ( x) ][ n y ( y) ]). 5

6 = ( x x)( y y) ( x x) ( y y) = S S xx xy S yy = β S xx S yy = SSR/SST where SSR s the sum-of-squares(varance) explaned by the regresson lne and SST s the total sum-of-squares. Thus ( y y) SST = SSR + SSE ( Y y) ( y Y )... (4) The physcal sgnfcance of the calculaton s best gven n terms of R not R. Thus f R =.96 then 96% of the varaton of y can be attrbuted to a lnear relatonshp between y and x. Obvously then, R cannot be greater than, and hence the R cannot exceed. Alternatvely f R = then none of the varaton of y can be attrbuted to a lnear relatonshp wth x. There s no (lnear) correlaton between the x and y values. Now f R does equal then all the plotted ponts wll le on the least squares lne. In ths sense R or R s a measure of how well the ponts le on the lne. A word of warnng, however, f the expermenter assumes there s a lnear relatonshp (on theoretcal grounds, say) then R may be taken as a measure of the scatter of the ponts due to expermental error n the y values. On the other hand f a lnear relatonshp s not assumed then R becomes a measure of the probablty that a lnear relatonshp does exst. R s not, tself, equal to some statstcal probablty, but t can be used to consder the probablty that an apparently hgh value of R obtaned was due to a purely chance varaton of y wth x. For example consder the extreme case of plottng a straght lne wth two ponts only. Snce t s always possble to place a straght lne perfectly through two ponts then R wll come out as unty. However, t scarcely takes any mathematcal statstcs to ndcate that n ths case R = certanly doesn't prove there s a lnear relatonshp between y and x! The matter of testng the value of R or R obtaned, to see at what confdence level one could ndcate that the value had not arsen by chance, wll be consdered later. However, the pont of the "two pont straght lne" - can be generalzed to ndcate that the smaller the number of data the less fath can be placed n an apparently "good" value of R. Put n another way t can be stated that the successve elmnaton of pars of x and y values s always lkely to ncrease the value of R. The moral s that the value of R obtaned must always be consdered n terms of the number of pars of x and y values taken. Another pont s that t requres some experence to judge what s a "good" value of R when a lnear relatonshp s assumed and the value of R s beng used as an ndcaton of the scatter of ponts around the least squares lne. It s very common s plottng workng curves of absorbance versus concentraton to smply take fve or sx standard solutons. Tradtonally a pece of foolscap szed graph paper would be used and a vsual judgement made as to the degree of scatter of the plotted ponts. Experence has shown that graphs whch are judged as beng rather poor by ths subjectve method nevertheless stll gve an R value of at least.98. In such cases the frst two fgures n the R value are of no use n makng any judgements and the thrd and fourth decmal places must be consdered. Undoubtedly the small number of data pars contrbutes to ths stuaton. It should also be remembered that R s not a lnear measure of the degree of correlaton. Thus R =.8 s not twce as "good" as r =.4. Indeed for R =.8 and.4 then R =.64 and R =.6respectvely..e. n the sense of the percentage of the varaton of the y's whch can be attrbuted to the relatonshp wth x, then R =.8 s four tmes as strong a correlaton as.4. An F test can be performed to test the sgnfcance of the regresson also:- F = MS(regresson)/MS(error) where MS(regresson) = SSR/d of f and MS(error) SSE/d of f

7 but F = t equvalent where t was defned above for the null hypothess H : ß = so both tests are note: Mntab uses an adjusted R as ths s an unbased estmator of the true coeffcent of regresson where adjusted R = SSR(n-p)/SSE(n-) p = no. of coeffcents (= for lnear regresson) and n s the number of data pars Confdence Lmts (Intervals) for the Lne of Best Ft and Predcton Lmts (see Questons () & ()) The confdence lmts relate to the problem of estmatng the true average value of y from a gven value of x. Ths s because the lne of best ft s, statstcally, an estmate of the average values of y correspondng to the x values. The confdence lmts then gve a measure, at a desred probablty level, of how good s the estmate. We are wrtng here of y = a + bx wth y the ndependent varable. In fact there s a dstrbuton (assumed normal n ths dscusson) of y's for each x and so f we attempt to predct a gven value of y correspondng to a chosen x value then the uncertanty of that predcton from the best ft lne s greater than the uncertanty of the true average value of y. In other words snce the best ft lne s an estmate of true average y values there s a further uncertanty as to what any one partcular y value wll be for a chosen value of x. The-equatons for the confdence lmts of the lne (.e. of y average, also called y calculated ) and the predcton lmts for a partcular value of y are as follows: Confdence Lmts about y average The lmts are gven by: x x/ n. 5 +/- tn, α / syx[ + ]... (5) n x ( x) / n these beng +/- about the y average lne.e. the lne of best ft. x n the equaton s the chosen value of x for whch y average s to be estmated. If these lmts are plotted for a range of x values they wll be found to be narrower towards the mddle of the best ft lne. Predcton Lmts for y The expressons for these lmts are agan very smlar to the confdence lmt equatons. All that need be changed s that the term l/n under the square root be replaced by (n+l)/n. Ths s equvalent to addng "" to the entre term under the square root. The lmts then are: x x n 5 +/- tn syx + /., α / [ + ]... (6) n x ( x) / n Of course the addton of the "" makes the term larger and hence the predcton lmts are wder than the confdence lmts. Confdence Lmts for determnng x from y Once the regresson equaton s known t s very smple to calculate an x value correspondng to a measured y. For example a common procedure n analytcal chemstry s to construct a calbraton lne from measurements on a set of standards and to use ths lne to predct the concentraton of an

8 unknown, after measurng the property used to construct the lne (e.g absorbance). The estmaton of confdence ntervals for such a determnaton s qute complex as t nvolves use of both slope and ntercept, each of whch have errors assocated wth them. However an approxmate formula can be used:- s x = s b yx }. 5 ( y y) { + + n b ( x x)... (7) In ths equaton y s the expermental value of y from whch the concentraton value x o s to be determned and s s the estmated standard devaton of x. Ths equaton actually gves the predcton lmts for x whch are +/-t.s x where t s the crtcal t- value at n- degrees of freedom. If several readngs were averaged to get y then we can get the confdence nterval for x by replacng by /m n the formula for s x where m s the number of determnatons of y. Predcton Lmts or Confdence Lmts? It may well be asked when one should use confdence lmts or predcton lmts. In the case of predctng y from x the decson s smply based on what one wants. A predcton of an average y value (strctly "estmate of a true average y correspondng to that chosen value of x") OR a predcton of a partcular y value.e. any one value from a normally dstrbuted set of y's about the average y. Naturally the uncertanty n the latter s greater than n the former case. It should be remembered at ths stage that n the constructon of the best ft lne the y values are assumed to be average y values. In the reverse case of estmatng x from y the answer to the equaton posed n the sub headng depends on how y s nterpreted. Presumably f y s taken as the mean of many observatons then the correspondng value of x may be taken as an average x value. In that case one has confdence lmts n mnd. However one partcular observaton of y can only lead to predcton lmts for x wth the correspondng ncreased uncertanty n that value. Case Study: Calbraton of a Nephelometer A nephelometer s to be calbrated so that ts scale readngs can be converted to ppm solds. The calbraton solutons are prepared by approprately dlutng a standard stock suspenson of very fne slca partcles wth dstlled water. The stock suspenson has a known ppm solds content as determned by a gravmetrc method. The followng data were obtaned from the standard suspensons: X, ppm solds Y, scale readng

9

10 C C C C4 X Y Calc Y resdual Regresson Analyss The regresson equaton s Y =. + 7 X Predctor Coef StDev T P Constant X S =.4 R-Sq = 99.% R-Sq(adj) = 98.8% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Total Obs X Y Ft StDev Ft Resdual St Resd

11 Calbraton of a Nephelometer Y = X R-Sq = 99. % 5 Y Regresson 95% CI 95% PI X Resdual Frequency Hstogram of Resduals Calbraton of a Nephelometer Normal Plot of Resduals Normal Score Resdual 4 Resdual Resdual I Chart of Resduals 4 5 Observaton Number Resduals vs. Fts Ft 6 7.SL=.5 X=. -.SL=-.5 From the Mntab output we have: Slope (= b) = 7. Intercept ( = a) =. s.d. of regresson (= s yx ) =.4 R (adj) = 98.8% The regresson equaton s y =.*x + 7. Confdence ntervals for the slope: ß = 7. +/- t.5,5.s b From tables t.5,5 =.57 so ß = 7.+/-.57*5.7 = 7. +/-. Confdence ntervals for the gradent: α =. +/- t.5,5 *s a =. +/-.57*.9 sgnfcant fgures) =. +/ (or. +/- 5.9, takng note of Tests of sgnfcance: (a) s there a sgnfcant relatonshp between y and x? (.e. H : ß = ) The t-rato for ths test s.7 (and probablty, p, that H s true s zero) so we reject H and accept there s a sgnfcant relatonshp. Alternatvely the F rato from the ANOVA table s 55. (=.7 ) and p =.

12 (b) Could the lne be consdered to go through the orgn? (H : α = ) t s.89 wth p =.47 that H s true, >.5 so we cannot reject H.e the ntercept s not sgnfcantly non-zero and we can accept the lne passes through the orgn Confdence and Predcton lmts for y: Usng as an example predctng a value for y for x =.5, the predcted value of y from the regresson equaton s 6.7. The confdence lmts for y are 57. to 64.. Ths represents the confdence nterval for the average value of y at ths x value. The predcton lmts for a sngle value of y s the predcton nterval ( ). The graph of the CI s and PI s over a whole range of x s s shown. Note that the uncertantes ncrease at the ends of the range, as expected. Confdence ntervals for x predcted from y. For chemsts, t s far more common to wsh to use the lne to predct x at a measured y (e.g predct the concentraton of analyte n an unknown from the nephelometer readng). For example, from a scale readng of y = 6 then x = (y -.)/7. = s x = 7 7 ( ).{ + / *. 69 average value of all the y s 5 }. (.69 = S xx = ( x x) 46.4 = =.9 so x =.49 +/-.57*.9 =.49 +/-.75 If the y value was an average of m readng s then replace n the above calculaton by /m e.g f 5 readng were averaged for y then s x = ( ).{ / + / * }. =.6 and x =.49 +/-.4 Analyss of Resduals Further nformaton on the model can be obtaned by examnng the resduals e where e = y -Y If the model we have chosen for the data (so far we have only consdered the lnear model) s approprate then t s expected that the resduals are:- () ndependent () e (average) = () constant varance (v) normally dstrbuted. The valdty of these assumptons can be tested by plots of (I) e vs x or Y (but not y correlated wth y!) or () a normalsed plot of e xs x as e s are These plots serve two purposes (a) detecton of problems wth the model (b) detecton of outlers. A lnear model mght be rejected because (a) no correlaton.e. ß = or (b) the lnear model s napproprate ( called lack-of-ft ) due to curvature n the data. Note that lack-of-ft s used for ths case only, not for general lack of correlaton. Examples of dfferent types of patterns that can occur n resual plots are shown n the fgure. Normalsed plots of resduals wll show up lack-of-ft or outlers as loss of lnearty n the plot (see case study ). Examples of types of types of resdual plots s shown n the followng dagram.

13 Lack-of-ft To test for non-lnearty n the data replcates are needed for at least some of the data ponts. The error sum-of-sqaures (SSE) can then be splt nto two components:- (a) error due to lack-of-ft and () pure error where pure error ss n j = ( y y ) u= ju j for n j replcates of y at x = x j SS(lack-of-ft) = SSE - pure error ss ms(lack-of-ft) = SS(lack-of-ft)/d of f

14 Case Study : Fluorescence Analyss Investgate the lnear range of the followng fluorescence experment (conc n ppm, fluorescence n ntensty unts). Duplcate measurements were performed so a check on lnearty could be carred out. C C conc I Regresson Analyss The regresson equaton s I = conc Predctor Coef StDev T P Constant conc S =.87 R-Sq = 97.5% R-Sq(adj) = 97.% Analyss of Varance Source DF SS MS F P Regresson Resdual Error Lack of Ft Pure Error 6.. Total 78.9 Lack of ft test Possble curvature n varable conc (P-Value =.) Possble lack of ft at outer X-values (P-Value =.) Overall lack of ft test s sgnfcant at P =.

15 Fluorescence Analyss Y = X R-Sq = 97.5 % 4 I Regresson 95% CI 95% PI 4 5 conc Resdual Frequency Normal Score Hstogram of Resduals -4 Normal Plot of Resduals Resdual Fluorescence Analyss Resdual Resdual I Chart of Resduals 5 Observaton Number Resduals vs. Fts Ft 5 5.SL=.7 X=. -.SL=-.7 4 The Mntab output and plot of the data are shown above. Snce replcates have been performed a test for lack-of-ft can be carred out. The Mntab test for lack-of-ft shows evdence of non-lnearty and nspecton of the plot confrms ths. It should be noted here that,despte an R value of.97, the lnear model s not approprate. An alternatve s to carry out quadratc regresson (see next secton) to ft a curve to the data. Ths mproves R to.988. However s ths the only nterpretaton? Inspecton of the plot of the data shows that the ponts are qute lnear, except for the pont at x =. Thus another alternatve would be to reject these ponts and ft a straght lne to the other data. Ths would be acceptable provdng the model was not used to predct responses for x>8.e operatng only n the lnear range. For ether model, predctng responses for x> would be extremely unrelable. Multlnear Regresson So far we have only consdered the case where y s predcted by only one varable, x. There are cases where more than one varable may be used to predct y (very common n expermental desgn studes). Thus we can have a model of the type:- y = ß + ß x ß m x m + e In a model such as ths t s a necessary condton that the x s are uncorrelated (or orthogonal) Matrx approach to Least Squares Regresson

16 The equatons for calculatng the ß s usng least squares are very complex when there are more than two parameters and can more easly be expressed n matrx notaton. Consder frst the lnear case:- Y = X*B + E y β x where Y =. B = β X =... y n x nx x nx e E =.. e n B s estmated by (X T *X) - *X T *Y provdng (X T *X) - exsts (f any of the x s are correlated then the matrx s sngular and no nverse exsts ). X T s the transform of X (rows -> columns). Ths formula can be generated to multlnear regresson, for m predctors β x x m B =. X =. x n x mn β m for y = ß + ß x ß m x m + e Specal case: the quadratc model can be dealt wth:.e. y = ß + ß x + ß x by settng x = x and x = x

17 Case Study : Determnaton of Cyande n Waste Water. The followng data s for a spectrophotometrc method for determnng cyande n waste water. C Conc C Abs Regresson Analyss The regresson equaton s abs = conc Predctor Coef StDev T P Constant conc S =.685 R-Sq =.% R-Sq(adj) =.% Analyss of Varance Source DF SS MS F P Regresson

18 Resdual Error.6. Total.9 Lack of ft test Possble curvature n varable conc (P-Value =.) Overall lack of ft test s sgnfcant at P =. Determnaton of Cyande n Waste Water Y = 5.88E- + 5.E-X R-Sq =. %. abs.5. 5 conc 5 5 Determnaton of Cyande n Waste Water Normal Plot of Resduals I Chart of Resduals...SL=.66 Resdual. Resdual. X= SL= Normal Score 5 Observaton Number 5 Hstogram of Resduals Resduals vs. Fts 4 Frequency Resdual Resdual Ft. Examnaton of the Mntab output at frst shows excellent agreement wth a lnear model, wth R = % (ths s actually rounded off from 99.97%). However the test for lack-of-ft s sgnfcant! Examnaton of the resduals plot and the normal plot also confrm that there s curvature n the data (compare the resduals plots wth the examples of plots shown earler). The data can be ftted to a quadratc model as shown n the followng Mntab output: mtb> let C = C*C mtb> regress C C C

19 Regresson Analyss The regresson equaton s abs = conc -. conc^ Predctor Coef StDev T P Constant conc conc^ S =.79 R-Sq =.% R-Sq(adj) =.% Analyss of Varance Source DF SS MS F P Regresson Resdual Error.. Total.9 Source DF Seq SS conc.85 conc^.4 One problem occurs when usng a quadratc model s f a value of x has to be predcted from a value of y. Ths means a quadratc equaton has to be solved. One way around ths s to swtch the data so x and y are reversed (.e do regresson of y on x). Ths wll make predcton smpler but wll not gve a vald predcton of the error snce, n the least squares method, we are assumng the error s assocated wth y and there s no error n x.

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Answers Problem Set 2 Chem 314A Williamsen Spring 2000 Answers Problem Set Chem 314A Wllamsen Sprng 000 1) Gve me the followng crtcal values from the statstcal tables. a) z-statstc,-sded test, 99.7% confdence lmt ±3 b) t-statstc (Case I), 1-sded test, 95%

More information

18. SIMPLE LINEAR REGRESSION III

18. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III US Domestc Beers: Calores vs. % Alcohol Ftted Values and Resduals To each observed x, there corresponds a y-value on the ftted lne, y ˆ ˆ = α + x. The are called ftted values.

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

Learning Objectives for Chapter 11

Learning Objectives for Chapter 11 Chapter : Lnear Regresson and Correlaton Methods Hldebrand, Ott and Gray Basc Statstcal Ideas for Managers Second Edton Learnng Objectves for Chapter Usng the scatterplot n regresson analyss Usng the method

More information

Chapter 14 Simple Linear Regression

Chapter 14 Simple Linear Regression Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Statistics Chapter 4

Statistics Chapter 4 Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 1 Chapters 14, 15 & 16 Professor Ahmad, Ph.D. Department of Management Revsed August 005 Chapter 14 Formulas Smple Lnear Regresson Model: y =

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction

Interval Estimation in the Classical Normal Linear Regression Model. 1. Introduction ECONOMICS 35* -- NOTE 7 ECON 35* -- NOTE 7 Interval Estmaton n the Classcal Normal Lnear Regresson Model Ths note outlnes the basc elements of nterval estmaton n the Classcal Normal Lnear Regresson Model

More information

Statistics for Business and Economics

Statistics for Business and Economics Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear

More information

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting. The Practce of Statstcs, nd ed. Chapter 14 Inference for Regresson Introducton In chapter 3 we used a least-squares regresson lne (LSRL) to represent a lnear relatonshp etween two quanttatve explanator

More information

Statistics MINITAB - Lab 2

Statistics MINITAB - Lab 2 Statstcs 20080 MINITAB - Lab 2 1. Smple Lnear Regresson In smple lnear regresson we attempt to model a lnear relatonshp between two varables wth a straght lne and make statstcal nferences concernng that

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Chapter 15 - Multiple Regression

Chapter 15 - Multiple Regression Chapter - Multple Regresson Chapter - Multple Regresson Multple Regresson Model The equaton that descrbes how the dependent varable y s related to the ndependent varables x, x,... x p and an error term

More information

STATISTICS QUESTIONS. Step by Step Solutions.

STATISTICS QUESTIONS. Step by Step Solutions. STATISTICS QUESTIONS Step by Step Solutons www.mathcracker.com 9//016 Problem 1: A researcher s nterested n the effects of famly sze on delnquency for a group of offenders and examnes famles wth one to

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

Lecture 4 Hypothesis Testing

Lecture 4 Hypothesis Testing Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to

More information

SIMPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION Smple Lnear Regresson and Correlaton Introducton Prevousl, our attenton has been focused on one varable whch we desgnated b x. Frequentl, t s desrable to learn somethng about the relatonshp between two

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION MTH352/MH3510 Regression Analysis NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER I EXAMINATION 014-015 MTH35/MH3510 Regresson Analyss December 014 TIME ALLOWED: HOURS INSTRUCTIONS TO CANDIDATES 1. Ths examnaton paper contans FOUR (4) questons

More information

ANOVA. The Observations y ij

ANOVA. The Observations y ij ANOVA Stands for ANalyss Of VArance But t s a test of dfferences n means The dea: The Observatons y j Treatment group = 1 = 2 = k y 11 y 21 y k,1 y 12 y 22 y k,2 y 1, n1 y 2, n2 y k, nk means: m 1 m 2

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Introduction to Regression

Introduction to Regression Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models

Statistics for Managers Using Microsoft Excel/SPSS Chapter 14 Multiple Regression Models Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 14 Multple Regresson Models 1999 Prentce-Hall, Inc. Chap. 14-1 Chapter Topcs The Multple Regresson Model Contrbuton of Indvdual Independent Varables

More information

Correlation and Regression

Correlation and Regression Correlaton and Regresson otes prepared by Pamela Peterson Drake Index Basc terms and concepts... Smple regresson...5 Multple Regresson...3 Regresson termnology...0 Regresson formulas... Basc terms and

More information

a. (All your answers should be in the letter!

a. (All your answers should be in the letter! Econ 301 Blkent Unversty Taskn Econometrcs Department of Economcs Md Term Exam I November 8, 015 Name For each hypothess testng n the exam complete the followng steps: Indcate the test statstc, ts crtcal

More information

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε

Y = β 0 + β 1 X 1 + β 2 X β k X k + ε Chapter 3 Secton 3.1 Model Assumptons: Multple Regresson Model Predcton Equaton Std. Devaton of Error Correlaton Matrx Smple Lnear Regresson: 1.) Lnearty.) Constant Varance 3.) Independent Errors 4.) Normalty

More information

NUMERICAL DIFFERENTIATION

NUMERICAL DIFFERENTIATION NUMERICAL DIFFERENTIATION 1 Introducton Dfferentaton s a method to compute the rate at whch a dependent output y changes wth respect to the change n the ndependent nput x. Ths rate of change s called the

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

Biostatistics 360 F&t Tests and Intervals in Regression 1

Biostatistics 360 F&t Tests and Intervals in Regression 1 Bostatstcs 360 F&t Tests and Intervals n Regresson ORIGIN Model: Y = X + Corrected Sums of Squares: X X bar where: s the y ntercept of the regresson lne (translaton) s the slope of the regresson lne (scalng

More information

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University

PHYS 450 Spring semester Lecture 02: Dealing with Experimental Uncertainties. Ron Reifenberger Birck Nanotechnology Center Purdue University PHYS 45 Sprng semester 7 Lecture : Dealng wth Expermental Uncertantes Ron Refenberger Brck anotechnology Center Purdue Unversty Lecture Introductory Comments Expermental errors (really expermental uncertantes)

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Laboratory 3: Method of Least Squares

Laboratory 3: Method of Least Squares Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 31 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 6. Rdge regresson The OLSE s the best lnear unbased

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

The SAS program I used to obtain the analyses for my answers is given below.

The SAS program I used to obtain the analyses for my answers is given below. Homework 1 Answer sheet Page 1 The SAS program I used to obtan the analyses for my answers s gven below. dm'log;clear;output;clear'; *************************************************************; *** EXST7034

More information

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X).

where I = (n x n) diagonal identity matrix with diagonal elements = 1 and off-diagonal elements = 0; and σ 2 e = variance of (Y X). 11.4.1 Estmaton of Multple Regresson Coeffcents In multple lnear regresson, we essentally solve n equatons for the p unnown parameters. hus n must e equal to or greater than p and n practce n should e

More information

Laboratory 1c: Method of Least Squares

Laboratory 1c: Method of Least Squares Lab 1c, Least Squares Laboratory 1c: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

Lecture 16 Statistical Analysis in Biomaterials Research (Part II)

Lecture 16 Statistical Analysis in Biomaterials Research (Part II) 3.051J/0.340J 1 Lecture 16 Statstcal Analyss n Bomaterals Research (Part II) C. F Dstrbuton Allows comparson of varablty of behavor between populatons usng test of hypothess: σ x = σ x amed for Brtsh statstcan

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

Scatter Plot x

Scatter Plot x Construct a scatter plot usng excel for the gven data. Determne whether there s a postve lnear correlaton, negatve lnear correlaton, or no lnear correlaton. Complete the table and fnd the correlaton coeffcent

More information

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students. PPOL 59-3 Problem Set Exercses n Smple Regresson Due n class /8/7 In ths problem set, you are asked to compute varous statstcs by hand to gve you a better sense of the mechancs of the Pearson correlaton

More information

Midterm Examination. Regression and Forecasting Models

Midterm Examination. Regression and Forecasting Models IOMS Department Regresson and Forecastng Models Professor Wllam Greene Phone: 22.998.0876 Offce: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Emal: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/regresson/outlne.htm

More information

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5).

STAT 3340 Assignment 1 solutions. 1. Find the equation of the line which passes through the points (1,1) and (4,5). (out of 15 ponts) STAT 3340 Assgnment 1 solutons (10) (10) 1. Fnd the equaton of the lne whch passes through the ponts (1,1) and (4,5). β 1 = (5 1)/(4 1) = 4/3 equaton for the lne s y y 0 = β 1 (x x 0

More information

e i is a random error

e i is a random error Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown

More information

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables

Linear Correlation. Many research issues are pursued with nonexperimental studies that seek to establish relationships among 2 or more variables Lnear Correlaton Many research ssues are pursued wth nonexpermental studes that seek to establsh relatonshps among or more varables E.g., correlates of ntellgence; relaton between SAT and GPA; relaton

More information

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes

DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR. Introductory Econometrics 1 hour 30 minutes 25/6 Canddates Only January Examnatons 26 Student Number: Desk Number:...... DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR Department Module Code Module Ttle Exam Duraton

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

β0 + β1xi. You are interested in estimating the unknown parameters β

β0 + β1xi. You are interested in estimating the unknown parameters β Ordnary Least Squares (OLS): Smple Lnear Regresson (SLR) Analytcs The SLR Setup Sample Statstcs Ordnary Least Squares (OLS): FOCs and SOCs Back to OLS and Sample Statstcs Predctons (and Resduals) wth OLS

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have

More information

Topic- 11 The Analysis of Variance

Topic- 11 The Analysis of Variance Topc- 11 The Analyss of Varance Expermental Desgn The samplng plan or expermental desgn determnes the way that a sample s selected. In an observatonal study, the expermenter observes data that already

More information

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal

Systematic Error Illustration of Bias. Sources of Systematic Errors. Effects of Systematic Errors 9/23/2009. Instrument Errors Method Errors Personal 9/3/009 Sstematc Error Illustraton of Bas Sources of Sstematc Errors Instrument Errors Method Errors Personal Prejudce Preconceved noton of true value umber bas Prefer 0/5 Small over large Even over odd

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Chapter 15 Student Lecture Notes 15-1

Chapter 15 Student Lecture Notes 15-1 Chapter 15 Student Lecture Notes 15-1 Basc Busness Statstcs (9 th Edton) Chapter 15 Multple Regresson Model Buldng 004 Prentce-Hall, Inc. Chap 15-1 Chapter Topcs The Quadratc Regresson Model Usng Transformatons

More information

Polynomial Regression Models

Polynomial Regression Models LINEAR REGRESSION ANALYSIS MODULE XII Lecture - 6 Polynomal Regresson Models Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Test of sgnfcance To test the sgnfcance

More information

III. Econometric Methodology Regression Analysis

III. Econometric Methodology Regression Analysis Page Econ07 Appled Econometrcs Topc : An Overvew of Regresson Analyss (Studenmund, Chapter ) I. The Nature and Scope of Econometrcs. Lot s of defntons of econometrcs. Nobel Prze Commttee Paul Samuelson,

More information

Chapter 4: Regression With One Regressor

Chapter 4: Regression With One Regressor Chapter 4: Regresson Wth One Regressor Copyrght 2011 Pearson Addson-Wesley. All rghts reserved. 1-1 Outlne 1. Fttng a lne to data 2. The ordnary least squares (OLS) lne/regresson 3. Measures of ft 4. Populaton

More information

Linear Approximation with Regularization and Moving Least Squares

Linear Approximation with Regularization and Moving Least Squares Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...

More information

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov,

UCLA STAT 13 Introduction to Statistical Methods for the Life and Health Sciences. Chapter 11 Analysis of Variance - ANOVA. Instructor: Ivo Dinov, UCLA STAT 3 ntroducton to Statstcal Methods for the Lfe and Health Scences nstructor: vo Dnov, Asst. Prof. of Statstcs and Neurology Chapter Analyss of Varance - ANOVA Teachng Assstants: Fred Phoa, Anwer

More information

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10)

Econ107 Applied Econometrics Topic 9: Heteroskedasticity (Studenmund, Chapter 10) I. Defnton and Problems Econ7 Appled Econometrcs Topc 9: Heteroskedastcty (Studenmund, Chapter ) We now relax another classcal assumpton. Ths s a problem that arses often wth cross sectons of ndvduals,

More information

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu

BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS. M. Krishna Reddy, B. Naveen Kumar and Y. Ramu BOOTSTRAP METHOD FOR TESTING OF EQUALITY OF SEVERAL MEANS M. Krshna Reddy, B. Naveen Kumar and Y. Ramu Department of Statstcs, Osmana Unversty, Hyderabad -500 007, Inda. nanbyrozu@gmal.com, ramu0@gmal.com

More information

Chapter 5 Multilevel Models

Chapter 5 Multilevel Models Chapter 5 Multlevel Models 5.1 Cross-sectonal multlevel models 5.1.1 Two-level models 5.1.2 Multple level models 5.1.3 Multple level modelng n other felds 5.2 Longtudnal multlevel models 5.2.1 Two-level

More information

Regression Analysis. Regression Analysis

Regression Analysis. Regression Analysis Regresson Analyss Smple Regresson Multvarate Regresson Stepwse Regresson Replcaton and Predcton Error 1 Regresson Analyss In general, we "ft" a model by mnmzng a metrc that represents the error. n mn (y

More information

STAT 511 FINAL EXAM NAME Spring 2001

STAT 511 FINAL EXAM NAME Spring 2001 STAT 5 FINAL EXAM NAME Sprng Instructons: Ths s a closed book exam. No notes or books are allowed. ou may use a calculator but you are not allowed to store notes or formulas n the calculator. Please wrte

More information

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore

Predictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.

More information

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA

7.1. Single classification analysis of variance (ANOVA) Why not use multiple 2-sample 2. When to use ANOVA Sngle classfcaton analyss of varance (ANOVA) When to use ANOVA ANOVA models and parttonng sums of squares ANOVA: hypothess testng ANOVA: assumptons A non-parametrc alternatve: Kruskal-Walls ANOVA Power

More information

experimenteel en correlationeel onderzoek

experimenteel en correlationeel onderzoek expermenteel en correlatoneel onderzoek lecture 6: one-way analyss of varance Leary. Introducton to Behavoral Research Methods. pages 246 271 (chapters 10 and 11): conceptual statstcs Moore, McCabe, and

More information

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2

Chapter 14 Simple Linear Regression Page 1. Introduction to regression analysis 14-2 Chapter 4 Smple Lnear Regresson Page. Introducton to regresson analyss 4- The Regresson Equaton. Lnear Functons 4-4 3. Estmaton and nterpretaton of model parameters 4-6 4. Inference on the model parameters

More information

17 - LINEAR REGRESSION II

17 - LINEAR REGRESSION II Topc 7 Lnear Regresson II 7- Topc 7 - LINEAR REGRESSION II Testng and Estmaton Inferences about β Recall that we estmate Yˆ ˆ β + ˆ βx. 0 μ Y X x β0 + βx usng To estmate σ σ squared error Y X x ε s ε we

More information

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced,

FREQUENCY DISTRIBUTIONS Page 1 of The idea of a frequency distribution for sets of observations will be introduced, FREQUENCY DISTRIBUTIONS Page 1 of 6 I. Introducton 1. The dea of a frequency dstrbuton for sets of observatons wll be ntroduced, together wth some of the mechancs for constructng dstrbutons of data. Then

More information

U-Pb Geochronology Practical: Background

U-Pb Geochronology Practical: Background U-Pb Geochronology Practcal: Background Basc Concepts: accuracy: measure of the dfference between an expermental measurement and the true value precson: measure of the reproducblty of the expermental result

More information

SIMPLE REACTION TIME AS A FUNCTION OF TIME UNCERTAINTY 1

SIMPLE REACTION TIME AS A FUNCTION OF TIME UNCERTAINTY 1 Journal of Expermental Vol. 5, No. 3, 1957 Psychology SIMPLE REACTION TIME AS A FUNCTION OF TIME UNCERTAINTY 1 EDMUND T. KLEMMER Operatonal Applcatons Laboratory, Ar Force Cambrdge Research Center An earler

More information

Analytical Chemistry Calibration Curve Handout

Analytical Chemistry Calibration Curve Handout I. Quck-and Drty Excel Tutoral Analytcal Chemstry Calbraton Curve Handout For those of you wth lttle experence wth Excel, I ve provded some key technques that should help you use the program both for problem

More information