The topics in this section concern with the second course objective. Correlation is a linear relation between two random variables.

Size: px
Start display at page:

Download "The topics in this section concern with the second course objective. Correlation is a linear relation between two random variables."

Transcription

1 4.1 Correlaton The topcs n ths secton concern wth the second course objectve. Correlaton s a lnear relaton between two random varables. Note that the term relaton used n ths secton means connecton or relatonshp and s not the mathematcal term or concept of relaton as n relatons and functons. However, the term functon used n ths secton and n ths book s the mathematcal term and concept of functon. The most common parameter used to ndcate the correlaton between two random varables s the Pearson correlaton coeffcent (or smply correlaton coeffcent). The notaton for ths correlaton coeffcent s ρ (rho). Often, the correlaton coeffcent of two varables X and Y s denoted as ρ X,Y. The correlaton s expressed by a number between 1 and 1 wth ths correlaton coeffcent; that s, -1 ρ 1. The value of 0 for the correlaton (that s, ρ X,Y = 0) means that there s no correlaton between the two varables X and Y. If ρ X,Y = -1, then the two varables have the perfect negatve correlaton whle, f ρ X,Y = 1, then the two varables have the perfect postve correlaton. For example, a random system outputs a crcle, and the sze of a crcle produced by ths system s of nterest. That s, the output of the system s a crcle, and the property of nterest s the sze of a produced crcle. The system cannot produce crcles of a unform (constant) sze. So, the crcle sze s random (and, thus, ths system s a random system). By the way, the nput of the system s a radus for the crcle to be produced by the system. As stated above, the system cannot produce the same szed crcles for a constant nput of the radus. There are several ways of measurng the property of the output from the system. For nstance, the sze of a crcle can be measured by ts area (nches squared) or dameter (nches). That s, there are two ways of measurng the property of nterest; namely, the sze of a crcle. 1

2 Let W be the random varable of the crcle area and Y be the random varable of the crcle dameter. Then, there s a relaton between the two random varables W and Y, gven by W = (0.25) π Y 2. That s, the varable W has a quadratc relaton wth the varable Y snce W s a quadratc functon of Y. Ths quadratc relaton s not a correlaton because the relaton s not lnear. For them to be correlated to each other, the relaton between the two varables must be lnear. By the way, note that these varables are random and the randomness comes from the randomness of the output of the random system. However, the (quadratc) relaton of W and Y s not random. There s another measure for the sze of crcle, the crcumference (nches). Let X be the random varable of the crcle crcumference. Then, there s a relaton between W and X gven as W = 0.25 π X2 Agan, the varable W has a quadratc relaton wth the varable X, but they have no correlaton. On the other hand, the two random varables X and Y have a relaton gven as X = π Y or Y = 1 π X. I hope you recognze, from the prerequste, that the last equaton as the slope-ntercept form of the lnear equaton wth a postve slope of 1/π wth the orgn for the y-ntercept. That s, Y s a lnear functon of X, and, hence, the two random varables X and Y have a lnear relaton. 2

3 These random varable X and Y have a correlaton and, n fact, X and Y are perfectly postvely correlated, whch means ρ X,Y = 1. The pont of ths Crcle Example s that a correlaton must be a lnear relaton of two random varables. The slope of the lnear equaton s not the value of the correlaton coeffcent snce the slope of a lne can be greater than 1 or less than -1. However, the sgn of the slope s the sgn of the correlaton. For nstance, the sgn of the slope for the lnear equaton n the Crcle Example s 1/π or π, whch s postve, and, hence, the correlaton of X and Y s postve (recall ρ X,Y = 1). Namely, f X ncreases as Y ncreases (the same movements), then X and Y have a postve correlaton, lke the X (the crcumference of a crcle) and the Y (the dameter of the crcle). On the other hand, f one varable decreases as another varable ncreases (opposte movements), then they have a negatve correlaton. If we know the relaton of two random varables (especally, mathematcally) lke the crcumference and the dameter of a crcle, then t s obvous and straghtforward. However, n practce, t s often not obvous or straghtforward. For nstance, the relaton between the malleablty of steel and the rate of the annealng temperature drop s not completely scentfcally (and mathematcally) understood, and no clear mathematcal equaton (formula) exsts for them whle they are known to have a certan relaton. Even f a lnear relaton s known for two random varables, t s not straghtforward to use the lnear relaton n practce. For nstance, you need to measure the volume of orange that you use n your food processng busness because your orange processng machne has the maxmum volume for the oranges. So, t cannot process oranges whch are too large. Measurng volumes of orange for many oranges s more complcated, more laborous, and more expensve that measurng ther weghts. The volume V and the weght W have a lnear relaton. In fact, the volume V s a lnear functon of the weght W gven as 3

4 V = 1 d W where d s the densty of the materal (oranges n ths example). If the densty d s constant for all the oranges that you receve from your vendors (supplers), then the volume V s computed from the lnear equaton gven above by weghng oranges (that s, from W). In practce, the problem s that, whle t s close, the densty s not constant for all oranges snce dfferent trees produce oranges of dfferent denstes. Even one tree produces oranges of dfferent denstes. If the values of the densty are close enough for oranges that you use, then the correct decson s to measure orange weghts, not orange volumes (so that a lot of tme and money are saved, and the proft ncreases) and compute ther volumes. If the values of the densty are all over, then the correct decson s to stay wth the volume measurement. In order to make the correct decson (whchever t s), we take data and estmate the correlaton between V and W by usng the data. That s, n ths Orange Sze Example, the correlaton (or equvalently the correlaton coeffcent) s the parameter, and ts value must be estmated or tested by hypothess testng to make the correct decson. As volume measurements are taken, the weght measurements of oranges are also taken. These volume and weght measurements form data n ordered pars as (W, V). Each ordered par corresponds to one orange, and ts weght and volume are gven n the frst and second postons of an ordered par, respectvely, as (W, V). If data are collected from 300 oranges, then there should be 300 ordered pars wth the total of 600 measurements whch are 300 weght measurements and ther correspondng 300 volume measurements. A weght measurement cannot be pared up wth any volume measurement arbtrarly. It must be pared up wth the volume measurement from the same orange. It should be noted that these ordered pars can be put (V, W), nstead of (W, V), just as well snce we are nvestgatng a lnear relaton of V and W. If V s a lnear functon of W, then W s a lnear functon of V, and vce versa. However, the order of the measurements (varables) must be consstent 4

5 throughout data. In other words, you cannot have some (V, W) s and (W, V) s n the same data; they are ether all (V, W) s or all (W, V) s. Any correlaton s nvestgated or not, ths knd of data s called bvarate data. Generally, data for correlaton means bvarate data. Bvarate data are data whose measurements (observatons) came from two random varables n ordered pars. The sample sze n of bvarate data s the number of the pars n the data, not the total number of observatons or measurements (whch s 2n for bvarate data of sze n). Ths s because, for bvarate data, each par s consdered to be a datum. The sample sze s the sze of data whch s the number of datums and, for bvarate data, t s the number of the pars. Besdes, there were 300 oranges sampled, and a par of measurements were, then, taken from each orange. The orgnal meanng of sample sze s the number of objects (oranges) sampled from a populaton. These pars are ordered pars and ponts n a two-dmensonal plane. Thus, they are often referred to as data ponts. The sample sze n s the total number of data ponts n data. Ths s applcable for mult-varate data n general. When a relaton of two varables s nvestgated, a certan knd of graph s very commonly used. It s the scatter plot. A scatter plot conssts of one horzontal axs (real number lne) and one vertcal axs (real number lne) ntersectng each other at the orgns; just lke the x-y plane but the axes are not necessarly x- and y-axes. An ordered par n data from two varables s a pont n such a plane wth ts frst (horzontal) coordnate for the horzontal axs and ts second coordnate for the vertcal axs n the ordered par. That s, ordered pars are plotted n a scatter plot as ponts. All ths should be famlar to you from the prerequste. For nstance, f 30 crcles are measured n the Crcle Example and 30 ordered pars of (Y, W) s are obtaned as data, then they are plotted n a scatter plot as 30 ponts. Ths scatter plot conssts of the Y-axs as the horzontal axs and the W-axs as the vertcal axs. That s, ths scatter plot 5

6 s a Y-W plane wth 30 ponts on t. These 30 ponts are all located on a parabola, gong through the orgn, n the frst quadrant snce Y 0 and W 0. If 30 ordered pars of (X, Y) are obtaned as data n the Crcle Example, then they are plotted n a scatter plot as 30 ponts. Ths scatter plot s an X-Y plane wth 30 ponts all on a straght lne, startng at the orgn wth a postve slope of 1/ π, n the frst quadrant snce X 0 and Y 0. These ponts must be all on the straght lne assumng no measurement error. Generally, f ponts are all on (or closely clustered around) a straght lne of a postve slope n a scatter plot, the correlaton of the two knds of measurements s 1; that s, the perfect postve correlaton or a strong postve correlaton exsts between the two varables. If ponts are all on (or closely clustered around) a straght lne of a negatve slope n a scatter plot, the correlaton of the two knds of measurements s 1; that s, the perfect negatve correlaton or a strong negatve correlaton exsts between the two varables. If ponts are all scatter around lke shotgun pellets, then the correlaton of the two knds of measurements s 0 or close to 0; that s, no correlaton or neglgble correlaton exsts between the two varables. See the dagrams of scatter plots gven below. ρ = 1 ρ = -1 6

7 ρ = 0 These scatter plots were found at the followng webste whch s no longer avalable. Please read Appendx: Scatter Plots gven below for more about scatter plots. If the 300 ordered pars of (W, V) are plotted n a scatter plot, the 300 ponts should be clustered along the graph of V = W/d n the frst quadrant snce W 0 and V 0. Note that these ponts do not spread out along the lne fllng out the frst quadrant snce the szes of oranges are relatvely unform; that s the range of W s small. However, f you zoom n to where all the ponts are located, the ponts should cluster along the short pece of the lne. A scatter plot s useful to recognze a relaton (or lack of relaton) between two varables. If the data of the steel malleablty and the rate of the annealng temperature drop are plotted, a relaton (may not be lnear) between them can be found. It mght show no relaton (a shotgun shot) between them. However, ths would be a very mportant and useful pece of nformaton because t means the rate of the annealng temperature drop 7

8 would have nothng to do wth the steel malleablty. That s, you could not control the steel malleablty by the annealng temperature drop. You would have to fnd the other factors that have relatons wth steel malleablty. You can construct a scatter plot by usng Excel or gong to the webste, You enter data as one column of measurements from one varable and another column of measurements from another varable, separatng between two measurements n each row by one blank (space). Do not enter bvarate data as ponts (that s, do not use parentheses). Try t out. Even f a scatter plot shows tghtly clustered ponts and ndcates a sgnfcant correlaton, the scatter plot does not estmate the value of the correlaton ρ. Whle a scatter plot mght suggest some correlaton (or lack of t) vsually, t does not provde a numercal estmate for ρ. When an estmate of ρ s needed, the sample correlaton coeffcent s used. Its formula s ˆρ x, y = n (x - x)(y - y) =1, (n - 1)(s )(s ) x y where n s the number of ordered pars, x and y are sample averages of data from varables X and Y respectvely, and s x and s y are the sample standard devatons of data from varables X and Y respectvely. Sometmes, r s used for ˆρ x, y. To understand the formula of ˆρ x, y, read Appendx: Formula of ˆρ x, y gven below. It s very mportant to read and understand what s explaned n the appendx. If you do, you understand the formula. You do not have to memorze t; you can remember or recall t correctly and use t correctly. Let us check on a couple of ponts about ˆρ x, y whch helps you understand ˆρ x, y and the formula. If Y = X, then ρ x,y = ρ x,x whch s 1. Let us see what happens to 8

9 ˆρ x, y = n (x - x)(y - y) =1. (n - 1)(s )(s ) x y ˆρ x, y = ˆρ x, x = n (x - x)(x - x) =1 (n - 1)(s )(s ) x x = n (x - x) =1 x 2 (n - 1)(s ) 2 = 1 (s ) x 2 n (x -x) =1 (n - 1) 2 snce n (x -x) =1 (n - 1) 2 = (s x ) 2 = 1 (s 2 x) (s x) 2 = 1. That s, ˆρ x, x = 1, whch makes sense and should be snce ρ x,x = 1. Now, f Y = -X, then ρ x,y = ρ x,-x whch s -1. Let us see what happens to ˆρ x,y. ˆρ x, y = ˆρ x, -x = n (x - x)(-x - (-x)) =1 (n - 1)(s )(s ) x x = n (x - x)(-1)(x - x) =1 (n - 1)(s ) x 2 = (-1) n (x - x) =1 x 2 (n - 1)(s ) 2 = (-1) ˆρ x, x = (-1)(1) = -1. That s, ˆρ x, -x = -1, whch makes sense and should be snce ρ x,-x = -1. It s mportant to understand what we have just done snce, f you understand t, you understand the formula of ˆρ x, y, and you can remember t and use t correctly. 9

10 Let us have a numercal example for the sample correlaton coeffcent wth small data of n = 4, {{ (3, 8.5), (12, 2.3), (6, 6.4), (7, 4.8) }}, whch can be gven n a table as X Y To compute ˆρ x, y, let us fnd s x and s y frst. x = ( )/4 = 7 and y = ( )/4 = 5.5. (x - x ) (x - x ) 2 (y - y ) (y - y ) = = = = = = = = Total 42 Total So, s x = 42 (4 1) = 3.74 and s y = (4 1) = Also, (x - x )(y - y ) (3 7)( ) = -12 (12 7)( ) = -16 (6 7)( ) = -0.9 (7 7)( ) = 0 Total

11 So, the estmate for ρ s ˆρ x, y = n (x - x)(y - y) =1 (n - 1)(s )(s ) x y = 28.9 (4 1)(3.74)(2.62) = Ths estmate ndcates that the true value of ρ s close to 1 and that the two varables X and Y have a strong negatve correlaton. The data used as an example above are small for the smplcty sake. In realty, should be used large data (30 data ponts or larger). To compute sample correlaton coeffcent, the followng webstes can be used. or Generally, a value of ˆρ x, y greater than 0.9 ndcates a strong postve correlaton between the two varables whle a value of ˆρ x, y less than 0.9 ndcates a strong negatve correlaton between the two varables. For large data, hypothess testng can be conducted on the correlaton between two varables, typcally as Ho: There s no correlaton between the two varables. vs Ha: There s some correlaton between the two varables. Please read Appendx: Hypothess Testng on Correlaton gven below. I would take more than 100 pars of data of volumes and weghts n the Orange Sze Example and conduct ths hypothess test at 1% sgnfcance level. There are computer packages that would do sgnfcance testng on correlaton. Wth the p-value from the sgnfcance test, I can fnsh the 11

12 hypothess testng on the correlaton between the orange volume (V) and the orange weght (W), as dscussed n the last secton of the last chapter. If the null hypothess s rejected, then I fnd the average densty d of the oranges that I used for the hypothess testng. Then, I start measurng ther weghts, nstead of ther volumes. The volume of an orange can be computed from ts weght by V = W/ d. I would randomly select oranges (say, 1% on average) as they come n and stll take volume measurements of them. Ths way, I can compute runnng correlaton between V and W so that I can montor any changes n the correlaton. At the same tme, I can reduce the cost of measurng orange szes and ncrease the proft. It s mportant to know that some (or even a strong) correlaton between two varables does not mean that there exsts a cause-effect relaton between them. Recall, a cause-effect relaton s a relaton of a set of varables (factors) causng some effect on the other varable (response varable). In fact, the varables causng effect on the other varables are called factors and the affected varables are called the response varables n experments. Data come from response varables (measured or observed values of the response varable) n experments. To fnd a cause-effect relaton among varables, an experment must be conducted. You can fnd a relaton among varables such as correlaton by observatonal studes, but an experment wth randomzaton s necessary to establsh a cause-effect relaton. A lnear relaton between two varables s also nvestgated by the smple regresson analyss. The dfference between the smple regresson and the correlaton s that one of the varables s a random varable but another varable s constant varable (as opposed to a random varable) n the smple regresson whle both varables are random varables for a correlaton. Fnally, sample covarance 12

13 ĈOV(X, Y) = n (x - x)(y - y) =1 (n - 1) estmates the (populaton) covarance COV(X, Y). Ths parameter of covarance ndcates the lnear relaton between two random varables, X and Y, just lke the correlaton coeffcent. However, ts values are not standardzed from -1 to 1 as the correlaton coeffcent. Also, unlke the correlaton coeffcent, the (populaton) covarance has a physcal dmenson, (the dmenson of X)*(the dmenson of Y). Ths s the reason why the correlaton coeffcent ρ X,Y s consdered to be a better parameter and more wdely used for ndcatng a lnear relaton between two random varables. The sample covarance has the same drawbacks as the populaton covarance. Look at the formula gven above. The absence of s x and s y from the denomnator results n an estmate wth a physcal dmenson of (the dmenson of X)*(the dmenson of Y). Also, the value of an estmate s not standardzed between -1 and 1. Incdentally, these drawbacks make the sample covarance a good estmator of COV(X, Y) snce t has these same drawbacks, but not a good ndcator of the correlaton between two varables. Often, the term coeffcent as n correlaton coeffcent means standardzed values (such as 0 to 1 or -1 to 1), wthout physcal dmenson. Coeffcents are often used for measurng some property of objects n engneerng and scences such as drag coeffcent. Appendx: Scatter Plots Scatter plots are smlar to lne graphs n that they use horzontal and vertcal axes to plot data ponts. However, they have a very specfc purpose. Scatter plots show how much one varable s related to another. The relatonshp between two varables s called ther correlaton. Scatter plots usually consst of a large body of data. The closer the data ponts come when plotted to makng a straght lne, the hgher the correlaton between the two varables s or the stronger the lnear relatonshp s. 13

14 If the data ponts make a straght lne gong from the orgn out to hgh x- and y-values, then the varables are sad to have a postve correlaton. If the lne goes from a hgh-value on the y-axs down to a hgh-value on the x- axs, the varables have a negatve correlaton. A perfect postve correlaton s gven the value of 1. A perfect negatve correlaton s gven the value of -1. If there s absolutely no correlaton present, the value gven s 0. The closer the number s to 1 or -1, the stronger the correlaton s, or the stronger the lner relatonshp between the varables s. The closer the number s to 0, the weaker the correlaton. So somethng that seems to knd of correlate n a postve drecton mght have a value of 0.67, whereas somethng wth an extremely weak negatve correlaton mght have the value An example of a stuaton where you mght fnd a perfect postve correlaton, as we have n the graph on the left above, would be when you compare the total amount of money spent on tckets at the move theater wth the number of people who go. Ths means that every tme that "x" number of people go, "y" amount of money s spent on tckets wthout varaton. An example of a stuaton where you mght fnd a perfect negatve correlaton, as n the graph on the rght above, would be f you were comparng the speed at whch a car s gong to the amount of tme t takes to reach a destnaton. As the speed ncreases, the amount of tme decreases. 14

15 On the other hand, a stuaton where you mght fnd a strong but not perfect postve correlaton would be f you examned the number of hours students spent studyng for an exam versus the grade receved. Ths would not be a perfect correlaton because two people could spend the same amount of tme studyng and get dfferent grades. However, n general, the rule wll hold true that as the amount of tme studyng ncreases so does the grade receved. Let us take a look at some examples. The graphs that are shown above both have perfect correlatons, so ther values are 1 and -1. The graphs below obvously do not have perfect correlatons. Whch graph would have a correlaton of 0? What about 0.7? -0.7? 0.3? -0.3? Clck on Answers when you thnk that you have them all matched up. 15

16 All ths gven n ths appendx s found at Also note that correlaton s used nterchangeably wth correlaton coeffcent n ths appendx. However, correlaton s a property of two varables and correlaton coeffcent s a measure for the property of correlaton. Appendx: Formula of ˆρ x, y Let us understand the formula. The denomnator of the formula s postve snce the three factors are all postve n the denomnator. The numerator of the formula s the one that makes the value of ˆρ x, y between 1 and 1, just lke ρ. Another pont about the denomnator s that t makes the ˆρ x, y dmensonless (untless or dmensonless), just lke ρ. Suppose, X s n $ and Y s n pounds, then the data from X and x are n $ and the data from Y and y are n pounds. As a result, the numerator s n the dmenson of $*pounds. In the denomnator, (n 1) has no physcal dmenson but s x and s y have dmensons n $ and pounds, respectvely. So, the quotent of the formula for ˆρ x, y has no physcal dmenson (untless). 16

17 It s not only sensble but also mportant to use an estmator whch produces an estmate of value between 1 and 1 and of no dmenson to estmate a parameter whose value s between 1 and 1 and dmensonless. Generally, bvarate data are graphcally represented as ponts n a rectangular coordnate system on a plane defned by a horzontal axs and a vertcal axs. If the varables are X and Y, then the plane s the x-y plane that you learned n the prerequstes. If the correlaton of X and Y s postvely hgh (that s, ρ x,y s close to 1), then the scatter plot of the data (whch are bvarate data) should show the ponts tghtly clustered along a straght lne wth a postve slope. Look at the numerator of the formula; x and y are subtracted from the data from X and data from Y respectvely, whch are a horzontal shft and a vertcal shft of all the ponts. They shft the pont ( x, y ), whch s the center of all the ponts, to the orgn (0, 0). That s, all the ponts are tghtly clustered along a straght lne of postve slope gong through the orgn. For nstance, the followng bvarate data have the center of the data, x = 7 and y = 5.5, whch so happen to be the thrd data pont. Subtracton of x = 7 from the x-coordnates and y = 5.5 from y-coordnates of the ponts results n shftng the center of data (7, 5.5) to the orgn (0, 0). (X, Y) (X - x, Y - y ) (3, 8.5) (-4,3.0) (6, 6.4) (-1, 0.9) (7, 5.5) (0, 0) (7, 4.8) (0, -0.7) (12, 2.3) (5, -3.2) Also, see the scatter plots of these data ponts gven below. Ths frst scatter dagram s for the orgnal (X, Y) s n the table. 17

18 Example Y X Seres1 The second scatter plot s for (X - x, Y - y ) s n the table. Both scatter plots are produced by Excel. Example Y - Y bar X - X bar Seres1 The ponts n the frst scatter plot got shfted down and to the left by the subtractons of the averages, gven n the second scatter plot. The center of the ponts s now at the orgn n the second scatter plot. All the ponts, except for one on the y-axs, are n the second or fourth quadrant n the second scatter plot. Each of these ponts has the x- and y-coordnates of opposte sgns. Thus, ts product s negatve, and the sum of all the products of these ponts results n a negatve number, whch makes sense snce these bvarate data came from X and Y wth ρ X,Y close to

19 Now, suppose that all the shfted ponts (x - x, y - y ) s are n the frst and thrd quadrants. In the frst quadrant, (x - x ) and (y - y ) are both postve, and, hence, (x - x )(y - y ) s are all postve. In the thrd quadrant, (x - x ) and (y - y ) are both negatve, and, hence, (x - x )(y - y ) s are all postve. So, n the numerator of the formula for ˆρ x, y, added are all the postve numbers, whch results n a postve number for the numerator. Ths results n a hgh postve number (close to 1) of ˆρ x, y, whch makes sense snce these data come from X and Y wth ρ X,Y close to 1. If the ponts are clustered to a straght lne of a postve slope but not tghtly to the lne, then some ponts (x - x, y - y ) s are n the second and fourth quadrants and (x - x )(y - y ) s are negatve n these quadrants. So, when all the (x - x )(y - y ) s are added up for the numerator, t does not add up to as hgh a postve number (not close to 1). Ths results n a postve number but closer to 0 for ˆρ x, y, whch makes sense snce ρ x,y s not close to 1 and, hence, the ponts do not cluster tghtly to a straght lne. In fact, f there s no correlaton between X and Y (that s, ρ x,y = 0), then the ponts can be all scattered somewhat unformly n a crcle wth the center ( x, y ). After subtractng x and y from the measurements from X and measurement from Y respectvely, there are about the equal numbers, n/4, of (x - x, y - y ) s n the each quadrant, whch means about the same number of negatve numbers and postve numbers are added n the numerator. Also, these ponts are scatter almost symmetrcally about the x-axs and the y-axs. As a result, each negatve (x - x )(y - y ) has a correspondng postve (x - x )(y - y ) whch are close to each other n the absolute values. They cancel each other whle added. Ths results n a value close to zero for the numerator of the formula, and, consequently, n a value close to zero for ˆρ x, y, whch makes sense snce ρ x,y = 0 s estmated. Smlarly, f ρ x,y s very close to -1, then the ponts are tghtly clustered to a straght lne wth a negatve slope. All (x - x )(y - y ) s are n the second and fourth quadrants so they are negatve. Negatve numbers are added n 19

20 the numerator, whch results n a number close to 1 for ˆρ x, y, whch makes sense snce ρ x,y close to 1 s estmated. If ρ x,y s not close to 1 but stll negatve, then some of (x - x, y - y ) s are n the frst and thrd quadrants, resultng n a few number of postve (x - x )(y - y ) s. Ths results n a negatve number but away from 1 and closer to 0 for ˆρ x,y, whch agan makes sense. So, the numerator of the formula makes sense. In fact, to obtan the nformaton from the ponts (whch are data) as to how tghtly they are clustered along a straght lne (the degree of the lnearty between X and Y), the numerator must be as gven n the formula. Try to come up wth other way of obtanng the same nformaton from data. It s very dffcult to do. By the way, you now know exactly why we subtract x and y n the numerator. Can you wrte the reason out wth one sentence? You already know the reason why we have s x and s y n the denomnator. The factor, n-1, n the denomnator s to adjust the value of the numerator to the sze of data, makng t per data pont (almost), just lke the n-1 n the formula of the sample standard devaton. Now, you understand the ˆρ x, y and the formula of ˆρ x, y, whch also helps you understand ρ X,Y. Appendx: Hypothess Testng on Correlaton You can do hypothess testng on correlaton as testng Ho: There s no correlaton between the two varables (ρ = 0). vs Ha: There s some correlaton between the two varables (ρ 0). Ths s a two-taled test and typcally performed by fndng the p-value of sgnfcance testng from the observed value of the test statstc. The test statstc s 20

21 /, and the observed value s computed by substtutng value of the sample correlaton coeffcent for (computed from bvarate data) and the number of the pars n the data for n. For nstance, = s computed from bvarate data whch consst of four pars. Then, the observed value s computed as ( ) / Now, go to the followng webste, or Clck the crcle under the last dstrbuton, nput 2 under d.f. and under t value, and clck on between t value and probablty. You do not see t here, but t s there n the webste. d.f. t value probablty Then, you should see the p-value of under probablty. However, the p-value can be obtaned more easly by drectly nputtng data to the followng webste. Let us do ths wth the webste, 21

22 You should put your bvarate data horzontally (not vertcally as often gven) on the rght of X and on the rght of Y n the grd. Then, you clck on the button TEST FOR CORRELATION and should fnd the sample correlaton coeffcent at Correlaton (X,Y), the observed value at Test-statstc (unfortunately, t s ncorrect) and the p-value for one-taled test at The P-Value. Let us have some exercse. Test Ho: ρ X,Y = 0 vs Ha: ρ X,Y 0 at 5% sgnfcance level based on data, X Y For ths, go to the webste and put 3, 12, 6, 7 horzontally rght of X (wthout commas, of course) and 8.5, 2.3, 6.4, 4.8 underneath horzontally rght of Y. Clck on the button TEST FOR CORRELATION. You should fnd for sample correlaton coeffcent, (whch s ncorrect) for the observed value, and for the p-value (whch s correct). However, ths p-value s for a one-taled hypothess test. The hypothess testng on the correlaton coeffcent s two-taled hypothess testng. Thus, the p-value s (= *2). Ths p-value s greater than 1% so the null hypothess could not be rejected at 1% sgnfcance level. However, t s less than 5% so the null hypothess s rejected at 5% level. That s, the data provde some evdence (at least, at 5% sgnfcance level) that supports correlaton between X and Y. Note that you should use data of greater than 30 data ponts snce the p-value s obtaned from the standard Normal dstrbuton based on large data (CLT). Also, the sample correlaton coeffcent of s very close to -1, but the null hypothess cannot be rejected at 1% sgnfcance level. Ths s because of the small sample sze. 22

23 Another note s that, n the webste, you should have 0 at Clamed Populaton s Correlaton. Ths s the null value and s 0 n ths secton. However, you can test hypotheses of Ho: ρ X,Y = 0.5 vs Ha: ρ X,Y 0.5 f nterested. In ths case, you need to set the clamed populaton s correlaton to 0.5. Several computer programmes (such as Excel) and webstes compute values of the sample correlaton coeffcent (by nputtng data) but do not produce observed values of the test statstc or p-values. Also, f someone gves you only a value of the sample correlaton coeffcent, along wth the sample sze, (but not the orgnal data) and asks you to do hypothess testng, then how can the hypothess testng be performed? There s a webste whch produces the p-value by nputtng a sample correlaton coeffcent value and ts sample sze. Here s one of them. Let us have an example. Someone asks you to conduct the hypothess testng on the sample correlaton coeffcent of whch was computed from bvarate data of a sample sze four (four pars or four data ponts). Then, go to the webste and nput for Correlaton Value (r):, whch takes only three places after the decmal, and 4 for Sample Sze:. You should get for the p-value at Probablty (Two-Taled):. So, the null hypothess s rejected at 5% but not rejected at 1% sgnfcance level. Of course, one-taled hypothess tests and sgnfcance tests of the followng forms can be conducted as descrbed n the second and thrd sectons of the last chapter. The left-taled test: Ho: There s postve or no correlaton between the two varables (ρ 0). vs Ha: There s negatve correlaton between the two varables (ρ < 0). 23

24 The rght-taled test: Ho: There s postve or no correlaton between the two varables (ρ 0). vs Ha: There s negatve correlaton between the two varables (ρ > 0). From the last example of the sample correlaton effcent of wth the sample sze of 4, the followng s the sgnfcance tests n these one-taled tests. Ho: ρ 0 n the left-taled test s rejected wth p-value of Ho: ρ 0 n the rght-taled test s rejected wth p-value of Here s a good revew on hypothess testng. Can you conduct the hypothess testng on these one-taled tests, say at α = 0.01? Copyrghted by Mchael Greenwch, 03/

/ n ) are compared. The logic is: if the two

/ n ) are compared. The logic is: if the two STAT C141, Sprng 2005 Lecture 13 Two sample tests One sample tests: examples of goodness of ft tests, where we are testng whether our data supports predctons. Two sample tests: called as tests of ndependence

More information

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9

Correlation and Regression. Correlation 9.1. Correlation. Chapter 9 Chapter 9 Correlaton and Regresson 9. Correlaton Correlaton A correlaton s a relatonshp between two varables. The data can be represented b the ordered pars (, ) where s the ndependent (or eplanator) varable,

More information

Chapter 9: Statistical Inference and the Relationship between Two Variables

Chapter 9: Statistical Inference and the Relationship between Two Variables Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,

More information

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation

Statistics for Managers Using Microsoft Excel/SPSS Chapter 13 The Simple Linear Regression Model and Correlation Statstcs for Managers Usng Mcrosoft Excel/SPSS Chapter 13 The Smple Lnear Regresson Model and Correlaton 1999 Prentce-Hall, Inc. Chap. 13-1 Chapter Topcs Types of Regresson Models Determnng the Smple Lnear

More information

Scatter Plot x

Scatter Plot x Construct a scatter plot usng excel for the gven data. Determne whether there s a postve lnear correlaton, negatve lnear correlaton, or no lnear correlaton. Complete the table and fnd the correlaton coeffcent

More information

Statistics for Economics & Business

Statistics for Economics & Business Statstcs for Economcs & Busness Smple Lnear Regresson Learnng Objectves In ths chapter, you learn: How to use regresson analyss to predct the value of a dependent varable based on an ndependent varable

More information

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students.

[The following data appear in Wooldridge Q2.3.] The table below contains the ACT score and college GPA for eight college students. PPOL 59-3 Problem Set Exercses n Smple Regresson Due n class /8/7 In ths problem set, you are asked to compute varous statstcs by hand to gve you a better sense of the mechancs of the Pearson correlaton

More information

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y)

Here is the rationale: If X and y have a strong positive relationship to one another, then ( x x) will tend to be positive when ( y y) Secton 1.5 Correlaton In the prevous sectons, we looked at regresson and the value r was a measurement of how much of the varaton n y can be attrbuted to the lnear relatonshp between y and x. In ths secton,

More information

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis

Resource Allocation and Decision Analysis (ECON 8010) Spring 2014 Foundations of Regression Analysis Resource Allocaton and Decson Analss (ECON 800) Sprng 04 Foundatons of Regresson Analss Readng: Regresson Analss (ECON 800 Coursepak, Page 3) Defntons and Concepts: Regresson Analss statstcal technques

More information

2016 Wiley. Study Session 2: Ethical and Professional Standards Application

2016 Wiley. Study Session 2: Ethical and Professional Standards Application 6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton

More information

Chapter 11: Simple Linear Regression and Correlation

Chapter 11: Simple Linear Regression and Correlation Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests

More information

Section 8.3 Polar Form of Complex Numbers

Section 8.3 Polar Form of Complex Numbers 80 Chapter 8 Secton 8 Polar Form of Complex Numbers From prevous classes, you may have encountered magnary numbers the square roots of negatve numbers and, more generally, complex numbers whch are the

More information

STATISTICS QUESTIONS. Step by Step Solutions.

STATISTICS QUESTIONS. Step by Step Solutions. STATISTICS QUESTIONS Step by Step Solutons www.mathcracker.com 9//016 Problem 1: A researcher s nterested n the effects of famly sze on delnquency for a group of offenders and examnes famles wth one to

More information

Statistics for Business and Economics

Statistics for Business and Economics Statstcs for Busness and Economcs Chapter 11 Smple Regresson Copyrght 010 Pearson Educaton, Inc. Publshng as Prentce Hall Ch. 11-1 11.1 Overvew of Lnear Models n An equaton can be ft to show the best lnear

More information

SIMPLE LINEAR REGRESSION

SIMPLE LINEAR REGRESSION Smple Lnear Regresson and Correlaton Introducton Prevousl, our attenton has been focused on one varable whch we desgnated b x. Frequentl, t s desrable to learn somethng about the relatonshp between two

More information

Basic Business Statistics, 10/e

Basic Business Statistics, 10/e Chapter 13 13-1 Basc Busness Statstcs 11 th Edton Chapter 13 Smple Lnear Regresson Basc Busness Statstcs, 11e 009 Prentce-Hall, Inc. Chap 13-1 Learnng Objectves In ths chapter, you learn: How to use regresson

More information

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X

3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number

More information

Lecture 6: Introduction to Linear Regression

Lecture 6: Introduction to Linear Regression Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6

More information

STAT 3008 Applied Regression Analysis

STAT 3008 Applied Regression Analysis STAT 3008 Appled Regresson Analyss Tutoral : Smple Lnear Regresson LAI Chun He Department of Statstcs, The Chnese Unversty of Hong Kong 1 Model Assumpton To quantfy the relatonshp between two factors,

More information

Statistics MINITAB - Lab 2

Statistics MINITAB - Lab 2 Statstcs 20080 MINITAB - Lab 2 1. Smple Lnear Regresson In smple lnear regresson we attempt to model a lnear relatonshp between two varables wth a straght lne and make statstcal nferences concernng that

More information

Economics 130. Lecture 4 Simple Linear Regression Continued

Economics 130. Lecture 4 Simple Linear Regression Continued Economcs 130 Lecture 4 Contnued Readngs for Week 4 Text, Chapter and 3. We contnue wth addressng our second ssue + add n how we evaluate these relatonshps: Where do we get data to do ths analyss? How do

More information

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity

Week3, Chapter 4. Position and Displacement. Motion in Two Dimensions. Instantaneous Velocity. Average Velocity Week3, Chapter 4 Moton n Two Dmensons Lecture Quz A partcle confned to moton along the x axs moves wth constant acceleraton from x =.0 m to x = 8.0 m durng a 1-s tme nterval. The velocty of the partcle

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours

UNIVERSITY OF TORONTO Faculty of Arts and Science. December 2005 Examinations STA437H1F/STA1005HF. Duration - 3 hours UNIVERSITY OF TORONTO Faculty of Arts and Scence December 005 Examnatons STA47HF/STA005HF Duraton - hours AIDS ALLOWED: (to be suppled by the student) Non-programmable calculator One handwrtten 8.5'' x

More information

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6

Department of Quantitative Methods & Information Systems. Time Series and Their Components QMIS 320. Chapter 6 Department of Quanttatve Methods & Informaton Systems Tme Seres and Ther Components QMIS 30 Chapter 6 Fall 00 Dr. Mohammad Zanal These sldes were modfed from ther orgnal source for educatonal purpose only.

More information

Chapter 3 Describing Data Using Numerical Measures

Chapter 3 Describing Data Using Numerical Measures Chapter 3 Student Lecture Notes 3-1 Chapter 3 Descrbng Data Usng Numercal Measures Fall 2006 Fundamentals of Busness Statstcs 1 Chapter Goals To establsh the usefulness of summary measures of data. The

More information

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management

Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 212. Chapters 14, 15 & 16. Professor Ahmadi, Ph.D. Department of Management Lecture Notes for STATISTICAL METHODS FOR BUSINESS II BMGT 1 Chapters 14, 15 & 16 Professor Ahmad, Ph.D. Department of Management Revsed August 005 Chapter 14 Formulas Smple Lnear Regresson Model: y =

More information

Chapter 14 Simple Linear Regression

Chapter 14 Simple Linear Regression Chapter 4 Smple Lnear Regresson Chapter 4 - Smple Lnear Regresson Manageral decsons often are based on the relatonshp between two or more varables. Regresson analss can be used to develop an equaton showng

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve

More information

APPENDIX 2 FITTING A STRAIGHT LINE TO OBSERVATIONS

APPENDIX 2 FITTING A STRAIGHT LINE TO OBSERVATIONS Unversty of Oulu Student Laboratory n Physcs Laboratory Exercses n Physcs 1 1 APPEDIX FITTIG A STRAIGHT LIE TO OBSERVATIOS In the physcal measurements we often make a seres of measurements of the dependent

More information

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1

j) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1 Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons

More information

Comparison of Regression Lines

Comparison of Regression Lines STATGRAPHICS Rev. 9/13/2013 Comparson of Regresson Lnes Summary... 1 Data Input... 3 Analyss Summary... 4 Plot of Ftted Model... 6 Condtonal Sums of Squares... 6 Analyss Optons... 7 Forecasts... 8 Confdence

More information

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4)

Econ107 Applied Econometrics Topic 3: Classical Model (Studenmund, Chapter 4) I. Classcal Assumptons Econ7 Appled Econometrcs Topc 3: Classcal Model (Studenmund, Chapter 4) We have defned OLS and studed some algebrac propertes of OLS. In ths topc we wll study statstcal propertes

More information

Statistics II Final Exam 26/6/18

Statistics II Final Exam 26/6/18 Statstcs II Fnal Exam 26/6/18 Academc Year 2017/18 Solutons Exam duraton: 2 h 30 mn 1. (3 ponts) A town hall s conductng a study to determne the amount of leftover food produced by the restaurants n the

More information

Chapter 6. Supplemental Text Material

Chapter 6. Supplemental Text Material Chapter 6. Supplemental Text Materal S6-. actor Effect Estmates are Least Squares Estmates We have gven heurstc or ntutve explanatons of how the estmates of the factor effects are obtaned n the textboo.

More information

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands

1. Inference on Regression Parameters a. Finding Mean, s.d and covariance amongst estimates. 2. Confidence Intervals and Working Hotelling Bands Content. Inference on Regresson Parameters a. Fndng Mean, s.d and covarance amongst estmates.. Confdence Intervals and Workng Hotellng Bands 3. Cochran s Theorem 4. General Lnear Testng 5. Measures of

More information

STAT 511 FINAL EXAM NAME Spring 2001

STAT 511 FINAL EXAM NAME Spring 2001 STAT 5 FINAL EXAM NAME Sprng Instructons: Ths s a closed book exam. No notes or books are allowed. ou may use a calculator but you are not allowed to store notes or formulas n the calculator. Please wrte

More information

Gravitational Acceleration: A case of constant acceleration (approx. 2 hr.) (6/7/11)

Gravitational Acceleration: A case of constant acceleration (approx. 2 hr.) (6/7/11) Gravtatonal Acceleraton: A case of constant acceleraton (approx. hr.) (6/7/11) Introducton The gravtatonal force s one of the fundamental forces of nature. Under the nfluence of ths force all objects havng

More information

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics

ECONOMICS 351*-A Mid-Term Exam -- Fall Term 2000 Page 1 of 13 pages. QUEEN'S UNIVERSITY AT KINGSTON Department of Economics ECOOMICS 35*-A Md-Term Exam -- Fall Term 000 Page of 3 pages QUEE'S UIVERSITY AT KIGSTO Department of Economcs ECOOMICS 35* - Secton A Introductory Econometrcs Fall Term 000 MID-TERM EAM ASWERS MG Abbott

More information

Joint Statistical Meetings - Biopharmaceutical Section

Joint Statistical Meetings - Biopharmaceutical Section Iteratve Ch-Square Test for Equvalence of Multple Treatment Groups Te-Hua Ng*, U.S. Food and Drug Admnstraton 1401 Rockvlle Pke, #200S, HFM-217, Rockvlle, MD 20852-1448 Key Words: Equvalence Testng; Actve

More information

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution

Department of Statistics University of Toronto STA305H1S / 1004 HS Design and Analysis of Experiments Term Test - Winter Solution Department of Statstcs Unversty of Toronto STA35HS / HS Desgn and Analyss of Experments Term Test - Wnter - Soluton February, Last Name: Frst Name: Student Number: Instructons: Tme: hours. Ads: a non-programmable

More information

This column is a continuation of our previous column

This column is a continuation of our previous column Comparson of Goodness of Ft Statstcs for Lnear Regresson, Part II The authors contnue ther dscusson of the correlaton coeffcent n developng a calbraton for quanttatve analyss. Jerome Workman Jr. and Howard

More information

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2

ISQS 6348 Final Open notes, no books. Points out of 100 in parentheses. Y 1 ε 2 ISQS 6348 Fnal Open notes, no books. Ponts out of 100 n parentheses. 1. The followng path dagram s gven: ε 1 Y 1 ε F Y 1.A. (10) Wrte down the usual model and assumptons that are mpled by ths dagram. Soluton:

More information

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers

Psychology 282 Lecture #24 Outline Regression Diagnostics: Outliers Psychology 282 Lecture #24 Outlne Regresson Dagnostcs: Outlers In an earler lecture we studed the statstcal assumptons underlyng the regresson model, ncludng the followng ponts: Formal statement of assumptons.

More information

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to

More information

28. SIMPLE LINEAR REGRESSION III

28. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III Ftted Values and Resduals US Domestc Beers: Calores vs. % Alcohol To each observed x, there corresponds a y-value on the ftted lne, y ˆ = βˆ + βˆ x. The are called ftted

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Recall: man dea of lnear regresson Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 8 Lnear regresson can be used to study an

More information

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding

Lecture 9: Linear regression: centering, hypothesis testing, multiple covariates, and confounding Lecture 9: Lnear regresson: centerng, hypothess testng, multple covarates, and confoundng Sandy Eckel seckel@jhsph.edu 6 May 008 Recall: man dea of lnear regresson Lnear regresson can be used to study

More information

Statistics Chapter 4

Statistics Chapter 4 Statstcs Chapter 4 "There are three knds of les: les, damned les, and statstcs." Benjamn Dsrael, 1895 (Brtsh statesman) Gaussan Dstrbuton, 4-1 If a measurement s repeated many tmes a statstcal treatment

More information

Unit 5: Quadratic Equations & Functions

Unit 5: Quadratic Equations & Functions Date Perod Unt 5: Quadratc Equatons & Functons DAY TOPIC 1 Modelng Data wth Quadratc Functons Factorng Quadratc Epressons 3 Solvng Quadratc Equatons 4 Comple Numbers Smplfcaton, Addton/Subtracton & Multplcaton

More information

Lecture 3: Probability Distributions

Lecture 3: Probability Distributions Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the

More information

x = , so that calculated

x = , so that calculated Stat 4, secton Sngle Factor ANOVA notes by Tm Plachowsk n chapter 8 we conducted hypothess tests n whch we compared a sngle sample s mean or proporton to some hypotheszed value Chapter 9 expanded ths to

More information

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction

The Multiple Classical Linear Regression Model (CLRM): Specification and Assumptions. 1. Introduction ECONOMICS 5* -- NOTE (Summary) ECON 5* -- NOTE The Multple Classcal Lnear Regresson Model (CLRM): Specfcaton and Assumptons. Introducton CLRM stands for the Classcal Lnear Regresson Model. The CLRM s also

More information

Answers Problem Set 2 Chem 314A Williamsen Spring 2000

Answers Problem Set 2 Chem 314A Williamsen Spring 2000 Answers Problem Set Chem 314A Wllamsen Sprng 000 1) Gve me the followng crtcal values from the statstcal tables. a) z-statstc,-sded test, 99.7% confdence lmt ±3 b) t-statstc (Case I), 1-sded test, 95%

More information

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed

More information

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)

ANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U) Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of

More information

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems Numercal Analyss by Dr. Anta Pal Assstant Professor Department of Mathematcs Natonal Insttute of Technology Durgapur Durgapur-713209 emal: anta.bue@gmal.com 1 . Chapter 5 Soluton of System of Lnear Equatons

More information

Midterm Examination. Regression and Forecasting Models

Midterm Examination. Regression and Forecasting Models IOMS Department Regresson and Forecastng Models Professor Wllam Greene Phone: 22.998.0876 Offce: KMC 7-90 Home page: people.stern.nyu.edu/wgreene Emal: wgreene@stern.nyu.edu Course web page: people.stern.nyu.edu/wgreene/regresson/outlne.htm

More information

18. SIMPLE LINEAR REGRESSION III

18. SIMPLE LINEAR REGRESSION III 8. SIMPLE LINEAR REGRESSION III US Domestc Beers: Calores vs. % Alcohol Ftted Values and Resduals To each observed x, there corresponds a y-value on the ftted lne, y ˆ ˆ = α + x. The are called ftted values.

More information

Chapter 8 Indicator Variables

Chapter 8 Indicator Variables Chapter 8 Indcator Varables In general, e explanatory varables n any regresson analyss are assumed to be quanttatve n nature. For example, e varables lke temperature, dstance, age etc. are quanttatve n

More information

Linear Feature Engineering 11

Linear Feature Engineering 11 Lnear Feature Engneerng 11 2 Least-Squares 2.1 Smple least-squares Consder the followng dataset. We have a bunch of nputs x and correspondng outputs y. The partcular values n ths dataset are x y 0.23 0.19

More information

A Robust Method for Calculating the Correlation Coefficient

A Robust Method for Calculating the Correlation Coefficient A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal

More information

Kernel Methods and SVMs Extension

Kernel Methods and SVMs Extension Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general

More information

Negative Binomial Regression

Negative Binomial Regression STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...

More information

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting.

x yi In chapter 14, we want to perform inference (i.e. calculate confidence intervals and perform tests of significance) in this setting. The Practce of Statstcs, nd ed. Chapter 14 Inference for Regresson Introducton In chapter 3 we used a least-squares regresson lne (LSRL) to represent a lnear relatonshp etween two quanttatve explanator

More information

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications

Durban Watson for Testing the Lack-of-Fit of Polynomial Regression Models without Replications Durban Watson for Testng the Lack-of-Ft of Polynomal Regresson Models wthout Replcatons Ruba A. Alyaf, Maha A. Omar, Abdullah A. Al-Shha ralyaf@ksu.edu.sa, maomar@ksu.edu.sa, aalshha@ksu.edu.sa Department

More information

Global Sensitivity. Tuesday 20 th February, 2018

Global Sensitivity. Tuesday 20 th February, 2018 Global Senstvty Tuesday 2 th February, 28 ) Local Senstvty Most senstvty analyses [] are based on local estmates of senstvty, typcally by expandng the response n a Taylor seres about some specfc values

More information

Module 9. Lecture 6. Duality in Assignment Problems

Module 9. Lecture 6. Duality in Assignment Problems Module 9 1 Lecture 6 Dualty n Assgnment Problems In ths lecture we attempt to answer few other mportant questons posed n earler lecture for (AP) and see how some of them can be explaned through the concept

More information

10-701/ Machine Learning, Fall 2005 Homework 3

10-701/ Machine Learning, Fall 2005 Homework 3 10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40

More information

Linear Regression Analysis: Terminology and Notation

Linear Regression Analysis: Terminology and Notation ECON 35* -- Secton : Basc Concepts of Regresson Analyss (Page ) Lnear Regresson Analyss: Termnology and Notaton Consder the generc verson of the smple (two-varable) lnear regresson model. It s represented

More information

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA

4 Analysis of Variance (ANOVA) 5 ANOVA. 5.1 Introduction. 5.2 Fixed Effects ANOVA 4 Analyss of Varance (ANOVA) 5 ANOVA 51 Introducton ANOVA ANOVA s a way to estmate and test the means of multple populatons We wll start wth one-way ANOVA If the populatons ncluded n the study are selected

More information

Analytical Chemistry Calibration Curve Handout

Analytical Chemistry Calibration Curve Handout I. Quck-and Drty Excel Tutoral Analytcal Chemstry Calbraton Curve Handout For those of you wth lttle experence wth Excel, I ve provded some key technques that should help you use the program both for problem

More information

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Structure and Drive Paul A. Jensen Copyright July 20, 2003 Structure and Drve Paul A. Jensen Copyrght July 20, 2003 A system s made up of several operatons wth flow passng between them. The structure of the system descrbes the flow paths from nputs to outputs.

More information

1 Matrix representations of canonical matrices

1 Matrix representations of canonical matrices 1 Matrx representatons of canoncal matrces 2-d rotaton around the orgn: ( ) cos θ sn θ R 0 = sn θ cos θ 3-d rotaton around the x-axs: R x = 1 0 0 0 cos θ sn θ 0 sn θ cos θ 3-d rotaton around the y-axs:

More information

The Order Relation and Trace Inequalities for. Hermitian Operators

The Order Relation and Trace Inequalities for. Hermitian Operators Internatonal Mathematcal Forum, Vol 3, 08, no, 507-57 HIKARI Ltd, wwwm-hkarcom https://doorg/0988/mf088055 The Order Relaton and Trace Inequaltes for Hermtan Operators Y Huang School of Informaton Scence

More information

January Examinations 2015

January Examinations 2015 24/5 Canddates Only January Examnatons 25 DO NOT OPEN THE QUESTION PAPER UNTIL INSTRUCTED TO DO SO BY THE CHIEF INVIGILATOR STUDENT CANDIDATE NO.. Department Module Code Module Ttle Exam Duraton (n words)

More information

The Geometry of Logit and Probit

The Geometry of Logit and Probit The Geometry of Logt and Probt Ths short note s meant as a supplement to Chapters and 3 of Spatal Models of Parlamentary Votng and the notaton and reference to fgures n the text below s to those two chapters.

More information

Numerical Heat and Mass Transfer

Numerical Heat and Mass Transfer Master degree n Mechancal Engneerng Numercal Heat and Mass Transfer 06-Fnte-Dfference Method (One-dmensonal, steady state heat conducton) Fausto Arpno f.arpno@uncas.t Introducton Why we use models and

More information

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li

Biostatistics. Chapter 11 Simple Linear Correlation and Regression. Jing Li Bostatstcs Chapter 11 Smple Lnear Correlaton and Regresson Jng L jng.l@sjtu.edu.cn http://cbb.sjtu.edu.cn/~jngl/courses/2018fall/b372/ Dept of Bonformatcs & Bostatstcs, SJTU Recall eat chocolate Cell 175,

More information

Homework Assignment 3 Due in class, Thursday October 15

Homework Assignment 3 Due in class, Thursday October 15 Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.

More information

β0 + β1xi. You are interested in estimating the unknown parameters β

β0 + β1xi. You are interested in estimating the unknown parameters β Ordnary Least Squares (OLS): Smple Lnear Regresson (SLR) Analytcs The SLR Setup Sample Statstcs Ordnary Least Squares (OLS): FOCs and SOCs Back to OLS and Sample Statstcs Predctons (and Resduals) wth OLS

More information

Math1110 (Spring 2009) Prelim 3 - Solutions

Math1110 (Spring 2009) Prelim 3 - Solutions Math 1110 (Sprng 2009) Solutons to Prelm 3 (04/21/2009) 1 Queston 1. (16 ponts) Short answer. Math1110 (Sprng 2009) Prelm 3 - Solutons x a 1 (a) (4 ponts) Please evaluate lm, where a and b are postve numbers.

More information

ANOVA. The Observations y ij

ANOVA. The Observations y ij ANOVA Stands for ANalyss Of VArance But t s a test of dfferences n means The dea: The Observatons y j Treatment group = 1 = 2 = k y 11 y 21 y k,1 y 12 y 22 y k,2 y 1, n1 y 2, n2 y k, nk means: m 1 m 2

More information

Q1: Calculate the mean, median, sample variance, and standard deviation of 25, 40, 05, 70, 05, 40, 70.

Q1: Calculate the mean, median, sample variance, and standard deviation of 25, 40, 05, 70, 05, 40, 70. Q1: Calculate the mean, medan, sample varance, and standard devaton of 5, 40, 05, 70, 05, 40, 70. Q: The frequenc dstrbuton for a data set s gven below. Measurements 0 1 3 4 Frequenc 3 5 8 3 1 a) What

More information

THE SUMMATION NOTATION Ʃ

THE SUMMATION NOTATION Ʃ Sngle Subscrpt otaton THE SUMMATIO OTATIO Ʃ Most of the calculatons we perform n statstcs are repettve operatons on lsts of numbers. For example, we compute the sum of a set of numbers, or the sum of the

More information

x i1 =1 for all i (the constant ).

x i1 =1 for all i (the constant ). Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by

More information

AP Physics 1 & 2 Summer Assignment

AP Physics 1 & 2 Summer Assignment AP Physcs 1 & 2 Summer Assgnment AP Physcs 1 requres an exceptonal profcency n algebra, trgonometry, and geometry. It was desgned by a select group of college professors and hgh school scence teachers

More information

Lecture 3 Stat102, Spring 2007

Lecture 3 Stat102, Spring 2007 Lecture 3 Stat0, Sprng 007 Chapter 3. 3.: Introducton to regresson analyss Lnear regresson as a descrptve technque The least-squares equatons Chapter 3.3 Samplng dstrbuton of b 0, b. Contnued n net lecture

More information

CORRELATION AND REGRESSION

CORRELATION AND REGRESSION CHAPTER 18 CORRELATION AND REGRESSION After readng ths chapter, students wll be able to understand: LEARNING OBJECTIVES The meanng of bvarate data and technques of preparaton of bvarate dstrbuton; The

More information

CHAPTER 8. Exercise Solutions

CHAPTER 8. Exercise Solutions CHAPTER 8 Exercse Solutons 77 Chapter 8, Exercse Solutons, Prncples of Econometrcs, 3e 78 EXERCISE 8. When = N N N ( x x) ( x x) ( x x) = = = N = = = N N N ( x ) ( ) ( ) ( x x ) x x x x x = = = = Chapter

More information

Chapter 12 Analysis of Covariance

Chapter 12 Analysis of Covariance Chapter Analyss of Covarance Any scentfc experment s performed to know somethng that s unknown about a group of treatments and to test certan hypothess about the correspondng treatment effect When varablty

More information

Composite Hypotheses testing

Composite Hypotheses testing Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter

More information

Section 3.6 Complex Zeros

Section 3.6 Complex Zeros 04 Chapter Secton 6 Comple Zeros When fndng the zeros of polynomals, at some pont you're faced wth the problem Whle there are clearly no real numbers that are solutons to ths equaton, leavng thngs there

More information

Learning Objectives for Chapter 11

Learning Objectives for Chapter 11 Chapter : Lnear Regresson and Correlaton Methods Hldebrand, Ott and Gray Basc Statstcal Ideas for Managers Second Edton Learnng Objectves for Chapter Usng the scatterplot n regresson analyss Usng the method

More information

Properties of Least Squares

Properties of Least Squares Week 3 3.1 Smple Lnear Regresson Model 3. Propertes of Least Squares Estmators Y Y β 1 + β X + u weekly famly expendtures X weekly famly ncome For a gven level of x, the expected level of food expendtures

More information

CHAPTER 14 GENERAL PERTURBATION THEORY

CHAPTER 14 GENERAL PERTURBATION THEORY CHAPTER 4 GENERAL PERTURBATION THEORY 4 Introducton A partcle n orbt around a pont mass or a sphercally symmetrc mass dstrbuton s movng n a gravtatonal potental of the form GM / r In ths potental t moves

More information

a. (All your answers should be in the letter!

a. (All your answers should be in the letter! Econ 301 Blkent Unversty Taskn Econometrcs Department of Economcs Md Term Exam I November 8, 015 Name For each hypothess testng n the exam complete the followng steps: Indcate the test statstc, ts crtcal

More information

Laboratory 3: Method of Least Squares

Laboratory 3: Method of Least Squares Laboratory 3: Method of Least Squares Introducton Consder the graph of expermental data n Fgure 1. In ths experment x s the ndependent varable and y the dependent varable. Clearly they are correlated wth

More information

Introduction to Regression

Introduction to Regression Introducton to Regresson Dr Tom Ilvento Department of Food and Resource Economcs Overvew The last part of the course wll focus on Regresson Analyss Ths s one of the more powerful statstcal technques Provdes

More information

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.

Chapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise. Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the

More information

Feature Selection: Part 1

Feature Selection: Part 1 CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?

More information