Business Statistics 41000:

Size: px
Start display at page:

Download "Business Statistics 41000:"

Transcription

1 Business Statistics 41000: Plotting and Summarizing Bivariate Data Drew D. Creal University of Chicago, Booth School of Business Week 2: January 17 and 18,

2 Class information Drew D. Creal Office: 404 Harper Center Office hours: me for an appointment Office phone: Course homepage: 2

3 Course schedule Week # 1: Plotting and summarizing univariate data Week # 2: Plotting and summarizing bivariate data Week # 3: Probability 1 Week # 4: Probability 2 Week # 5: Probability 3 Week # 6: In-class exam Week # 7: Statistical inference 1 Week # 8: Statistical inference 2 Week # 9: Simple linear regression Week # 10: Multiple linear regression 3

4 Outline of today s topics I. Plotting bivariate data (AWZ p ) The two-way table for categorical variables Scatter plots for numeric variables II. Summarizing bivariate data Tables III. Covariance and correlation (AWZ p ) Sample covariance Sample correlation Properties of the correlation coefficient Graphical depictions of correlation Correlation matrix 4

5 Outline of today s topics IV. Linearly related variables Linear functions Mean and variance of a linear function The sample mean and variance of a linear function Linear combinations Mean and variance of a linear combination V. Linear regression (AWZ p ) 5

6 Looking at Two Variables Last week, we discussed how to plot and analyze one variable. We found this to be a helpful way of understanding our data. In many practical situations, however, we need to look at two (or more) variables. 6

7 The Two-Way Table Let s say we would like to investigate the relationship (if any) between two categorical variables x and y. If x has two categories and y has two categories, then there are only four possible combinations of x and y. We can then count the number of combinations we observe of the 4 possibilities. If instead x has N x possible outcomes and y has N y possible outcomes, then the total number of possible combinations of x and y together is N x N y. We can often plot these in a table. 7

8 The Two-Way Table Consider the simp and cola variables from the British marketing data. Remember that 0 stands for does not. simp cola 0 1 Total Total In the second table on the right, we display percentages. For example on the left, 146 of the 1000 respondents drink cola and watch the Simpsons. simp cola 0 1 Total % 3.5% 42.2% % 14.6% 57.8% Total 81.9% 18.1% 100.0% 8

9 A conditional Two-Way Table simp cola 0 1 Total Total simp cola 0 1 Total % 19.34% 42.20% % 80.66% 57.80% Total 100% 100% 100% In the table on the right, we display the percentage of each column. Conditional on those respondents who do not watch the Simpsons (simp = 0), 387 out of 819 or 47.25% do not drink cola. 9

10 A conditional Two-Way Table simp cola 0 1 Total Total simp cola 0 1 Total % 8.30% 100% % 25.26% 100% Total 81.90% 18.90% 100% Alternatively, we can condition on each row. For example, conditional on drinking cola (cola = 1), we know that 146 out of 578 respondents or 25% watch the Simpsons. 10

11 Social grades of Great Britain The variable soc stands for socioeconomic status and there are six categories: A: Higher managerial, administrative and professional B: Intermediate managerial, administrative and professional C1: Supervisory, clerical and junior managerial, administrative and professional C2: Skilled manual workers D: Semi-skilled and unskilled manual workers E: State pensioners, casual and lowest grade workers, unemployed with state benefits only 11

12 The Two-Way Table Using cig and soc from the British marketing data, how does social grade relate to cigarette use? cigs soc 0 1 Total % 25.00% 100% % 19.87% 100% % 25.81% 100% % 29.36% 100% % 37.82% 100% % 35.83% 100% Total 71.20% 28.80% 100% Notice that this is a conditional Two-Way table and these are row percentages. Lower social grades seem to smoke more cigarettes. 12

13 The Scatter Plot For numeric variables, we can use a scatter plot Each row corresponds to an individual and their drinking ability. (see beer.xls ) Each person has recorded the number of beers they can drink and their weight. Do you think there is a relationship? nbeer weight

14 The Scatter Plot For numeric variables, we can use a scatter plot 22 Anonymous survey of MBA drinking In a scatter plot, each point corresponds to an observation. Weight is on the horizontal axis. The number of beers is on the vertical axis. number of beers Outlier? weight 14

15 The Scatter Plot Are returns on a mutual fund related to the market? VWNDX GSPC Each point corresponds to a monthly return. 15

16 Mutual funds Consider the mutual fund data (mutualfundreturns.xls ). We have monthly data on 12 different assets from July 1996 to Dec Ticker Symbol TWEIX BRKA DGRIX LBF FTRNX JAVLX OPGSX PRTBX PTTRX PINCX GSPC VWNDX Fund name American Century Equity Income (large value) Berkshire Hathaway Holding Company A shares Dreyfus Growth & Income (large growth) DWS Global High Income Fund (bonds) Fidelity Trend Fund (large growth) Janus Twenty (large growth) Oppenheimer Gold & Special Minerals Permanent Portfolio Treasury Bill (ultrashort bonds) Pimco Funds Total Return (intermediate bonds) Putnam Income S&P 500 index Vanguard Windsor (large value) 16

17 It is common in finance to take a series of assets and plot the sample mean versus the sample standard deviation DWS Global Oppenheimer G&M Mean American CenturyBerkshire Hathaway Janus Twenty PIMCO Bond Fund Putnam Bond Treasury Bill Fund Vanguard Windsor Fidelity S&P500 Dreyfus Std Dev If you were an asset manager, where would you want your fund to be located on this plot? 17

18 The Scatter Plot Let s compare the mean and std. dev. of portfolios from several countries (see conret.xls ). Mean Hong Kong Malaysia The Netherlands!!!!! Sweden SwissDenmark USA Belgium France Germany Norway Singapore AustraliaIreland Austria Finland Canada United Kingdom New Zealand Spain Italy Japan Std. Dev. Monthly returns from 1988 to

19 Comparing numeric and categorical variables How do you relate a numeric variable versus a categorical variable? This is not obvious at first glance. Our first option is to bin the numeric variable and transform it into a categorical variable. Then, we can use a Two-Way Table. 19

20 Comparing numeric and categorical variables Consider the age and cigs variables from the British marketing data. What is the relationship between age and cigarette usage? cigs age 0 1 total % 49.02% % % 36.36% % % 32.31% % % 35.24% % % 20.24% % % 8.87% % % 11.90% % % 0.00% % total 71.20% 28.80% % 20

21 Summaries for bivariate data Two-way tables and scatter plots can help us understand the information contained in our data. However, they do not describe the strength of evidence. For categorical variables, we can use the information contained in our tables but there is no one-way to do this. For numeric variables, two important summaries are the sample covariance and sample correlation. 21

22 Covariance and correlation 22

23 Measuring strength of evidence for numeric data: Covariance and Correlation In the beer data, it looks like there is a relationship. The relationship looks linear in the sense that it looks like we could draw a straight line with positive slope through the plot. number of beers Anonymous survey of MBA drinking weight 23

24 Covariance and Correlation Covariance and correlation summarize how strong a linear relationship there is between two variables. Consider any two numeric variables x and y. In the beer example, we could let x be the number of beers and y be each individual s weight. 24

25 The sample covariance The sample covariance between data x and y is defined as s xy = 1 n 1 n (x i x) (y i y) i=1 In words, covariance is the average product of the deviations from the means. Remember in Lecture # 1 we demonstrated how to compute expressions like (x i x) and (y i y). What are the units of the covariance? 25

26 The sample correlation The sample correlation between data x and y is defined as r xy = s xy s x s y The sample correlation is simply the sample covariance divided by the standard deviation of x and y, respectively. Can anyone tell me what the units of measurement are? 26

27 Properties of the sample correlation The sample correlation always lies between -1 and 1, i.e. 1 r xy 1. The closer r xy is to 1 the stronger the linear relationship is with a positive slope. When one variable increases the other tends to increase. The closer r xy is to -1 the stronger the linear relationship is with a negative slope. When one variable increases the other tends to decrease. A correlation of 1 is a straight line with positive slope. 27

28 Correlation Compare the mutual fund data and the beer data. 22 Anonymous survey of MBA drinking VWNDX number of beers GSPC weight Which appears to be more correlated? 28

29 Correlation Sample correlation between VWNDX and S&P500 = Sample correlation between nbeer and weight = The larger correlation between VWNDX and S&P500 indicates that the linear relationship is STRONGER. 22 Anonymous survey of MBA drinking VWNDX number of beers GSPC weight 29

30 Correlation: more examples 100 simulated data points 2 Sample correlation between x 1 and y 1 is 0.1 X Y1 2 Sample correlation between x 2 and y 2 is X Y2 30

31 Correlation: more examples 100 simulated data points 2 Sample correlation between x 3 and y 3 is X Y3 2 Sample correlation between x 4 and y 4 is X Y4 31

32 Correlation: more examples 100 simulated points (on the x-axis) 1.0 Sample correlation is Sample correlation is

33 Correlation: be cautious X Y5 IMPORTANT: Correlation only measures a LINEAR relationship. Clearly, the variables x 5 and y 5 are highly (nonlinearly) related. Correlation between x 5 and y 5 =

34 Correlation: more examples Data on monthly returns for different countries. We have a total of 22 countries Canada USA Correlation is 0.65 for Canada and the USA. 34

35 The sample correlation matrix The sample correlation matrix is a table of all sample correlations between each pair of variables australia austria belgium canada denmark australia austria belgium canada denmark Why are all the diagonal entries equal to one? Why are the upper and lower off-diagonals the same? 35

36 The sample covariance matrix The sample covariance matrix is a table of all sample covariances between each pair of variables australia austria belgium canada denmark australia austria belgium canada denmark The diagonal elements of the sample covariance matrix are equal to the variances of that variable. 36

37 Using the correlation and covariance formulas Let s return to our simple example from lecture # 1. x (x x) y (y y) s xy = 1 n 1 n (x i x) (y i y) i=1 = 1 [( ) ( ) + ( ) ( ) 3 + ( ) ( ) + ( ) ( )] = 1 [ ] 3 37

38 Using the correlation and covariance formulas s xy = 1 ( ) 3 = 1 ( ) 3 = 1 (0.0012) = Each of the four combinations of points contributes to the covariance. Notice what determines the sign (, +) and magnitude of each contribution. Let s look at where those points are relative to the sample means x and y. 38

39 (x 3 x)(y 3 ȳ) = (.01) *.02 = x 0.10 (x 1 x)(y 1 ȳ) =.02 *.04 = (ii) (i) (iii) (iv) ȳ (x 2 x)(y 2 ȳ) =.01 * (.02) = (x 4 x)(y 4 ȳ) = (.02) * (.04) =.0008 Points in quadrant (iii) have both x and y less than their means so they make a positive contribution to the covariance. Points in quadrant (i) have both x and y larger than their means so they make a positive contribution to the covariance. In (ii) and (iv) one of x and y is less than its mean and the other is greater so we get a negative contribution. The sign (, +) of the covariance tells us in which quadrants our data lies on average. 39

40 There are lots of data points in quadrants (i) and (iii) which make positive contributions (ii) (i) 0.10 VWNDX (iii) (iv) SP500 There are only a few data points in quadrants (ii) and (iv) which make negative contributions. 40

41 How changes in units affect the covariance Suppose we have data on 4 individuals education and income: years of school income in dollars 34, ,200 54,950 98,100 The sample covariance is What are the units of measurement? 41

42 How changes in units affect the covariance What if we measure income in thousands of dollars instead of dollars? years of school income in thous. of dollars The sample covariance changes to !! What are the units of measurement? 42

43 Key points to remember about covariance A positive covariance implies that when a variable is above (below) its mean the other variable tends to be above (below) its mean. A negative covariance implies that when one variable is above (below) its mean the other variable tends to be below (above) its mean. The units of the covariance are (typically) not meaningful. The magnitude of the covariance is not easy to interpret. Focus on the sign of the covariance. It tells us which quadrant which should expect to see our data in relative to the mean. 43

44 Return to our numerical example x (x x) y (y y) We computed the sample covariance as s xy = The sample correlation is r xy = s xy s x s y = (0.0365)(0.0183) = 0.6 where we divide the covariance by the two standard deviations. 44

45 How changes in units affect the correlation Consider the same data on individuals education and income as above: years of school income in dollars 34, ,200 54,950 98,100 The sample correlation is What are the units of measurement? 45

46 How changes in units affect the correlation Again, what if we measure income in thousands of dollars? years of school income in thous. of dollars The sample correlation remains 0.975!! What are the units of measurement? 46

47 Key points to remember about correlation The correlation always has the same sign as the covariance because we are simply dividing by standard deviations which are always positive. The correlation can be more informative than the covariance because it is easier to interpret as a measure of strength. Correlation is unit-less and always lies between 1 and 1. Interpretation: close to 1 means a strong positive relationship. Interpretation: close to 1 means a strong negative relationship. 47

48 Linear functions 48

49 Linear functions We have seen data sets which display some kind of relationship between variables (e.g. Vanguard vs. S&P500). An exact linear relationship or linear function between two univariate variables is defined as y = c 0 + c 1 x In this formula, the variable c 0 is a number called the intercept. The variable c 1 is a number called the slope. (NOTE: There is nothing special about the notation y, x, c 0, and c 1.) 49

50 Linear functions There are many reasons why we are interested in linear functions. A first example, suppose we observed a sample of the variable x and we knew its sample mean and sample variance. Using this information, could we determine the sample mean and variance of y if y = c 0 + c 1 x? YES! 50

51 Celsius to Fahrenheit Suppose we have a sample of temperatures measured in Celsius but we wanted to convert them to Fahrenheit. cel fahr The relationship between these variables is fahr = cel Sample mean: cel = 32.5 Sample stand. dev. s cel =

52 Celsius to Fahrenheit If we plot the data using a scatter plot, what do we get? Fahr Celsius Note: the correlation is 1. 52

53 Definition of a linear function A variable y is a linear function of the variable x if y = c 0 + c 1 x c 0 : the intercept c 1 : the slope We think of c 0 and c 1 as constants (i.e. fixed numbers) while x and y are allowed to vary. 53

54 Sample mean and variance of a linear function Suppose y is a linear function of x. y = c 0 + c 1 x How are the sample mean and variance (std. dev.) of y related to those of x? In other words, given x and s 2 x, what separate affects do multiplying by c 1 and adding c 0 have? 54

55 Sample mean and variance of a linear function Let us look at the temperature example where fahr = cel We can see both affects graphically. 9 cel cel fahr 5 mean std. dev Cel 9/5 Cel Fahr

56 Mean and variance of a linear function When we multiply by a constant (in this case c 1 = 9 ), we 5 affect (in this case increase) both the mean and the standard deviation proportionally. If we add a constant (in this case c 0 = 32), we simply increase the mean (by the value of the constant c 0 ) but leave the overall dispersion unchanged. 56

57 Seeing the effects graphically Here, I have simulated 1000 data points labeled x with x = 1 and s x = 1. These are in the top histogram. X Mean: 1 Std Dev: X Mean: 3 Std Dev: X Mean: 2 Std Dev:

58 Sample mean and variance of a linear function Two important formulas Suppose y = c 0 + c 1 x Then y = c 0 + c 1 x s 2 y = c 2 1 s 2 x s y = c 1 s x 58

59 Celsius to Fahrenheit Let s return to our temperature example where c 0 = 32 and c 1 = 9 5. fahr = c 0 + c 1 cel = = = 90.5 sfahr 2 = c1 2 scel 2 ( ) 2 9 = = 1296 (s fahr = 36) 59

60 Formal proof y = 1 n = 1 n = 1 n n y i (by definition of y) i=1 n (c 0 + c 1 x i ) (because y i = c 0 + c 1 x i ) i=1 n c n i=1 = n n c 0 + c 1 n n i=1 n c 1 x i i=1 x i = c 0 + c 1 x (by definition of x) NOTE: you do not need to know this for any exam. 60

61 Formal proof sy 2 = = = = = 1 n 1 1 n 1 1 n 1 1 n 1 1 n 1 n (y i y) 2 (by definition of sy 2 ) i=1 n (y i c 0 c 1 x) 2 (because y = c 0 + c 1 x) i=1 n (c 0 + c 1 x i c 0 c 1 x) 2 (because y i = c 0 + c 1 x i ) i=1 n (c 1 x i c 1 x) 2 i=1 n c1 2 (x i x) 2 = c1 2 1 n 1 i=1 n (x i x) 2 = c1 2 sx 2 i=1 NOTE: you do not need to know this for any exam. 61

62 We can get the sample standard deviation by using the formula s 2 y = c 2 1 s 2 x and then just taking the square root. Or, we can use our other formula directly s y = c 1 s x This is because the sample standard deviation is always the square root of the sample variance. 62

63 Example Suppose x has sample mean 100 and sample standard deviation 10. What are the sample mean, sample variance, and sample std. dev. of y when 1. y = 2x? 2. y = 5 + x? 3. y = 5 2x? NOTE: Answers are on the next slide. 63

64 Example Suppose x has sample mean 100 and sample standard deviation 10. What are the sample mean, sample variance, and sample std. dev. of y? 1. y = 2x y = 200, s 2 y = 400, s y = y = 5 + x y = 105, s 2 y = 100, s y = y = 5 2x y = 195, s 2 y = 400, s y = 20 64

65 Linear combinations 65

66 Linear combinations We may want a variable y to be a function of more than one variable. Assume we have k different variables x 1,..., x k. A variable y is a linear combination if it is related to several other variables x 1,..., x k by the formula y = c 0 + c 1 x 1 + c 2 x c k x k c 0 : the intercept c i : a coefficient 66

67 Linear combinations You may occasionally see the following notation where we double-index the variable x ij. When we have a sample of size i = 1,..., n, each variable x ij goes for j = 1,..., k. The i-th observed variable y i is a linear combination if it is related to several other variables x i1,..., x ik by the formula y i = c 0 + c 1 x i1 + c 2 x i c k x ik c 0 : the intercept c j : a coefficient 67

68 Example: Portfolios Suppose you have $ 100 to invest. Let x 1 be the return on asset 1. If x 1 = 0.1 and you put all your money into asset 1, then you will have $ 100 ( ) = $ 110 at the end of the period. Let x 2 be the return on asset 2. If x 2 = 0.15 and you put all your money into asset 2, then you will have $ 100 ( ) = $ 115 at the end of the period. What happens if you put 1 2 your money in asset 1 and 1 2 your money in asset 2? 68

69 Example: Portfolios At the end of the period you will have ( ) ( ) = 100 [1 + (.5.1) + (.5.15)] = $ In other words, if we put 1 2 of our money in asset 1 and 1 2 in asset 2 the return on the portfolio is R p = 1 2 x x 2 The return on the portfolio is a linear combination of the returns on the individual assets. 69

70 Example: Portfolios In general, suppose you have $M dollars to invest in two assets with returns x 1 and x 2. Let w 1 be the fraction of your wealth that you choose to put in x 1. Assume that our portfolio weights sum to one, w 1 + w 2 = 1. w 1 M(1 + x 1 ) + w 2 M(1 + x 2 ) = M[w 1 + w 2 + (w 1 x 1 ) + (w 2 x 2 )] The return on our portfolio is = M[1 + w 1 x 1 + w 2 x 2 ] R p = w 1 x 1 + w 2 x 2 70

71 Example: Portfolio R p = w 1 x 1 + w 2 x 2 In this linear combination, the coefficients are the portfolio weights or the percentage of our wealth placed in each asset. When discussing portfolios, it is common to change notation and use w i for the weights instead of c i. In this class, we will always assume that the portfolio weights sum to one. Can an asset s weight be negative? 71

72 Example: Portfolio Suppose we have m possible assets. Let x i denote the return on the ith asset. Let w i denote the percentage of wealth invested in the ith asset. Then, the return on the portfolio is: R p = m i=1 w ix i The return on the portfolio is a linear combination of individual asset returns, where the coefficients are equal to the fraction of wealth invested. 72

73 Mean and variance of a linear combination First, consider the case of two variables x 1 and x 2 Suppose Then y = c 0 + c 1 x 1 + c 2 x 2 y = c 0 + c 1 x 1 + c 2 x 2 s 2 y = c 2 1 s2 x 1 + c 2 2 s2 x 2 + 2c 1 c 2 s x1x 2 Notice that when we have two variables we must take their covariance s x1 x 2 into account. 73

74 Reminder about correlations and covariances Remember, we defined the sample correlation of two variables x 1 and x 2 as the sample covariance divided by the standard deviations r x1 x 2 = s x 1 x 2 s x1 s x2. So, if we know the sample correlation and the sample standard deviations we can determine the sample covariance. s x1 x 2 = r x1 x 2 s x1 s x2 If we know the sample standard deviations, then all we need to know is either the sample correlation or the sample covariance. 74

75 Example: country returns Consider building a portfolio using our monthly country returns data (conret.xls ) and the two variables Hong Kong and USA. We place 1 2 of our wealth in each asset. honkong usa port We obtain the portfolio return as R p = 1 2 honkong usa so that w 1 = 0.5 and w 2 =

76 Example: country returns honkong usa port The sample mean of the portfolio is R p = w 1 honkong + w 2 usa = 0.5honkong + 0.5usa honkong = usa = = 0.5 ( ) ( ) =

77 Example: country returns Let us compute the variance of our portfolio. The sample covariance matrix is: honkong usa honkong usa Remember that the sample variances are on the diagonal and the sample covariances in the off-diagonal. Next, we apply the variance formula s 2 port = w 2 1 s 2 honkong + w 2 2 s 2 usa + 2w 1 w 2 s honkong,usa = ( ) ( ) + 2(0.5 2 )( ) = (s port = = 0.046) 77

78 Example: country returns What if we had put 25% into the US and 75% into Hong Kong? How does the variance of the portfolio get affected? The covariance matrix does not change honkong usa honkong usa Again, we apply the formula but with different weights sport 2 = w1 2 shonkong 2 + w2 2 susa 2 + 2w 1 w 2 s honkong,usa = ( ) ( ) + 2(0.25)(0.75)( ) = (s port = = 0.058) 78

79 Example: country returns Scatter plot of the mean and standard dev. for the portfolio with equal weights. It looks like the mean of the portfolio is half-way between the sample mean of honkong and usa. Mean honkong port usa Standard Dev 79

80 Example: country returns The sample standard dev is less than half-way between s usa and s honkong honkong What happened? Mean port Diversification! usa Standard Dev 80

81 Example: diversification Consider returns x 1 and x 2 and a portfolio y = 1 2 x x 2. At each point (x 1, x 2 ), we plot the value y. x 1 x 2 y Covariance matrix: x 1 x 2 x x X x X 1 x Using our formulas, the variance of y is: s 2 y = Why is the variance of y smaller than the variance of both x 1 and x 2? 81

82 Example: diversification Consider returns x 1 and x 2 and a portfolio y = 1 2 x x 2. At each point (x 1, x 2 ), we plot the value y. x 1 x 2 y X x x 2. Covariance matrix: x 1 x 2 x x X 1 Using our formulas, the variance of y is: The covariance is now positive! Why is the variance of y similar to that of x 1 and x 2? 82

83 Example: diversification Consider returns x 1 and x 2 and a portfolio y = 1 2 x x 2. At each point (x 1, x 2 ), we plot the value y. x 1 x 2 y Covariance matrix: x 1 x 2 x x X x X 1 x Using our formulas, the variance of y is: Why is the variance of y smaller than the variance of both x 1 and x 2? 83

84 Mean and variance of a linear combination Three right hand side variables x 1, x 2, and x 3. Suppose Then y = c 0 + c 1 x 1 + c 2 x 2 + c 3 x 3 y = c 0 + c 1 x 1 + c 2 x 2 + c 3 x 3 s 2 y = c 2 1 s2 x 1 + c 2 2 s2 x 2 + c 2 3 s2 x [c 1 c 2 s x1x 2 + c 1 c 3 s x1x 3 + c 2 c 3 s x2x 3 ] Notice that the variance formula now has 3 covariance terms. 84

85 Example: mutual funds Vanguard Windsor (VWNDX), Pimco bond fund (PTTRX), DWS Global (LBF) Portfolio: port = 0.1 VWNDX PTTRX LBF VWNDX PTTRX LBF r VWNDX,PTTRX = VWNDX e r VWNDX,LBF = PTTRX e r PTTRX,LBF = LBF We apply the variance formula sport 2 = w1 2 svwndx 2 + w2 2 spttrx 2 + w3 2 slbf 2 +2w 1 w 2 s VWNDX,PTTRX + 2w 1 w 3 s VWNDX,LBF + 2w 2 w 3 s PTTRX,LBF = ( ) ( ) ( ) + +2(0.1)(0.7)(6.6220e 5) + 2(0.1)(0.2)( ) +2(0.7)(0.2)( ) =

86 Example: mutual funds Scatter plot of the mean and standard dev. We created a new portfolio with a slightly higher return than PTTRX and also more risk as measured by the standard Mean port Pimco Bond Fund Vanguard Windsor DWS global deviation Standard Dev 86

87 Further remarks on linear combinations For linear combinations greater than 3 right-hand side variables (say k variables), the mean and variance formulas can be generalized. The mean formula is: y = c 0 + k j=1 c jx j The variance formulas take into account all pairwise combinations of the covariances. I will not ask you to calculate by hand any formulas with more than 3 right-hand side variables. If you take the portfolios class from the finance group, you will learn about building portfolios by taking linear combinations of assets. The goal is to choose the weights to build good portfolios that are on or close to the efficient frontier. 87

88 Linear regression 88

89 Linear regression This is data on 128 homes (MidCity.xls ). It includes their sales price (in dollars) and interior size (in square feet) Price Size: square feet 89

90 Linear regression Clearly, the data are correlated. price size price size It looks like we could fit a line through the data. But, which line? And, what is the equation of that line? Why would we want to do this? Linear regression fits a line through data. (Univariate) linear regression helps us compute the coefficients c 0 and c 1 in the equation y = c 0 + c 1 x. 90

91 Linear regression Let y be the house prices and x be the size of the house. When we run a regression, we obtain values for the intercept and slope coefficients given our data. y = intercept + slope x coefficients constant intercept sq. ft slope 91

92 Linear regression Here is the data with the regression line drawn through it Price Size: square feet 92

93 Linear regression formulas slope = sxy s 2 x intercept = y slope x The formulas for the slope and intercept just use the sample mean, sample covariance, and sample variance. We will study this in more detail later in the class. The slope formula takes the covariance and standardizes it so that its units are (units of y)/(units of x). The intercept formula will make the regression line pass through the point (x, y). 93

94 Regression and prediction You have a house on the market with size 2200 sq. ft. Can we predict at what price the house will sell? We could use the sample mean or median of prices but this doesn t take size into account Histogram of house prices

95 Regression and prediction Regression allows us to use information on size to form our prediction. We plug the size (2200 sq. ft) into our equation. Predicted price: *2200 = $ 144, Price Size: square feet 95

96 Additional comments on regression Because we are using more information (in this case the information on the size of the home), the predictions we make are (hopefully!) better in some sense. Importantly, regression is based on the same concepts (sample means, sample covariance and variances) that we learned in today s lecture. It s simply an alternative way to use our information. There is nothing mysterious about it! 96

97 Limits of regression It is important to remember that regression has its limits. First, it matters which variable is the left hand side variable y, i.e. it matters that we let y be house prices and not size of the house. You will get a different answer if you swap y and x. Correlation does not imply causation. Just because we regress x on y does not mean that changes in x cause changes in y. 97

Chapter 6 Scatterplots, Association and Correlation

Chapter 6 Scatterplots, Association and Correlation Chapter 6 Scatterplots, Association and Correlation Looking for Correlation Example Does the number of hours you watch TV per week impact your average grade in a class? Hours 12 10 5 3 15 16 8 Grade 70

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Chapter 4 Data with Two Variables

Chapter 4 Data with Two Variables Chapter 4 Data with Two Variables 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient Looking for Correlation Example Does the number of hours you watch TV per week impact your average

More information

Chapter 4 Data with Two Variables

Chapter 4 Data with Two Variables Chapter 4 Data with Two Variables 1 Scatter Plots and Correlation and 2 Pearson s Correlation Coefficient Looking for Correlation Example Does the number of hours you watch TV per week impact your average

More information

Business Statistics 41000: Homework # 5

Business Statistics 41000: Homework # 5 Business Statistics 41000: Homework # 5 Drew Creal Due date: Beginning of class in week # 10 Remarks: These questions cover Lectures #7, 8, and 9. Question # 1. Condence intervals and plug-in predictive

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

1. Create a scatterplot of this data. 2. Find the correlation coefficient.

1. Create a scatterplot of this data. 2. Find the correlation coefficient. How Fast Foods Compare Company Entree Total Calories Fat (grams) McDonald s Big Mac 540 29 Filet o Fish 380 18 Burger King Whopper 670 40 Big Fish Sandwich 640 32 Wendy s Single Burger 470 21 1. Create

More information

SESSION 5 Descriptive Statistics

SESSION 5 Descriptive Statistics SESSION 5 Descriptive Statistics Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures. Together with simple

More information

Linear Regression Communication, skills, and understanding Calculator Use

Linear Regression Communication, skills, and understanding Calculator Use Linear Regression Communication, skills, and understanding Title, scale and label the horizontal and vertical axes Comment on the direction, shape (form), and strength of the relationship and unusual features

More information

Learning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables

Learning Objectives. Math Chapter 3. Chapter 3. Association. Response and Explanatory Variables ASSOCIATION: CONTINGENCY, CORRELATION, AND REGRESSION Chapter 3 Learning Objectives 3.1 The Association between Two Categorical Variables 1. Identify variable type: Response or Explanatory 2. Define Association

More information

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 2: Simple Linear Regression Sections 2.1 2.3 by Iain Pardoe 2.1 Probability model for and 2 Simple linear regression model for and....................................

More information

STATISTICS Relationships between variables: Correlation

STATISTICS Relationships between variables: Correlation STATISTICS 16 Relationships between variables: Correlation The gentleman pictured above is Sir Francis Galton. Galton invented the statistical concept of correlation and the use of the regression line.

More information

Lecture 8 CORRELATION AND LINEAR REGRESSION

Lecture 8 CORRELATION AND LINEAR REGRESSION Announcements CBA5 open in exam mode - deadline midnight Friday! Question 2 on this week s exercises is a prize question. The first good attempt handed in to me by 12 midday this Friday will merit a prize...

More information

Corporate Governance, and the Returns on Investment

Corporate Governance, and the Returns on Investment Corporate Governance, and the Returns on Investment Klaus Gugler, Dennis C. Mueller and B. Burcin Yurtoglu University of Vienna, Department of Economics BWZ, Bruennerstr. 72, A-1210, Vienna 1 Considerable

More information

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression

MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression MATH 1070 Introductory Statistics Lecture notes Relationships: Correlation and Simple Regression Objectives: 1. Learn the concepts of independent and dependent variables 2. Learn the concept of a scatterplot

More information

Chapter 6: Exploring Data: Relationships Lesson Plan

Chapter 6: Exploring Data: Relationships Lesson Plan Chapter 6: Exploring Data: Relationships Lesson Plan For All Practical Purposes Displaying Relationships: Scatterplots Mathematical Literacy in Today s World, 9th ed. Making Predictions: Regression Line

More information

AP Final Review II Exploring Data (20% 30%)

AP Final Review II Exploring Data (20% 30%) AP Final Review II Exploring Data (20% 30%) Quantitative vs Categorical Variables Quantitative variables are numerical values for which arithmetic operations such as means make sense. It is usually a measure

More information

Business Statistics. Lecture 10: Correlation and Linear Regression

Business Statistics. Lecture 10: Correlation and Linear Regression Business Statistics Lecture 10: Correlation and Linear Regression Scatterplot A scatterplot shows the relationship between two quantitative variables measured on the same individuals. It displays the Form

More information

Sampling, Frequency Distributions, and Graphs (12.1)

Sampling, Frequency Distributions, and Graphs (12.1) 1 Sampling, Frequency Distributions, and Graphs (1.1) Design: Plan how to obtain the data. What are typical Statistical Methods? Collect the data, which is then subjected to statistical analysis, which

More information

6. 5x Division Property. CHAPTER 2 Linear Models, Equations, and Inequalities. Toolbox Exercises. 1. 3x = 6 Division Property

6. 5x Division Property. CHAPTER 2 Linear Models, Equations, and Inequalities. Toolbox Exercises. 1. 3x = 6 Division Property CHAPTER Linear Models, Equations, and Inequalities CHAPTER Linear Models, Equations, and Inequalities Toolbox Exercises. x = 6 Division Property x 6 = x =. x 7= Addition Property x 7= x 7+ 7= + 7 x = 8.

More information

2006 Supplemental Tax Information for JennisonDryden and Strategic Partners Funds

2006 Supplemental Tax Information for JennisonDryden and Strategic Partners Funds 2006 Supplemental Information for JennisonDryden and Strategic Partners s We have compiled the following information to help you prepare your 2006 federal and state tax returns: Percentage of income from

More information

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice

AP Statistics Bivariate Data Analysis Test Review. Multiple-Choice Name Period AP Statistics Bivariate Data Analysis Test Review Multiple-Choice 1. The correlation coefficient measures: (a) Whether there is a relationship between two variables (b) The strength of the

More information

Applied Regression Analysis. Section 4: Diagnostics and Transformations

Applied Regression Analysis. Section 4: Diagnostics and Transformations Applied Regression Analysis Section 4: Diagnostics and Transformations 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions of our linear regression model: (i) The mean of

More information

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13

Year 10 Mathematics Semester 2 Bivariate Data Chapter 13 Year 10 Mathematics Semester 2 Bivariate Data Chapter 13 Why learn this? Observations of two or more variables are often recorded, for example, the heights and weights of individuals. Studying the data

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 17, 2010 Instructor: John Parman Final Exam - Solutions You have until 12:30pm to complete this exam. Please remember to put your

More information

Stat 101 Exam 1 Important Formulas and Concepts 1

Stat 101 Exam 1 Important Formulas and Concepts 1 1 Chapter 1 1.1 Definitions Stat 101 Exam 1 Important Formulas and Concepts 1 1. Data Any collection of numbers, characters, images, or other items that provide information about something. 2. Categorical/Qualitative

More information

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. q3_3 MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Provide an appropriate response. 1) In 2007, the number of wins had a mean of 81.79 with a standard

More information

The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Two-Variable Analysis: Simple Linear Regression/ Correlation

Two-Variable Analysis: Simple Linear Regression/ Correlation Two-Variable Analysis: Simple Linear Regression/ Correlation 1 Topics I. Scatter Plot (X-Y Graph) II. III. Simple Linear Regression Correlation, R IV. Assessing Model Accuracy, R 2 V. Regression Abuses

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Chapter 2: Linear Functions

Chapter 2: Linear Functions Chapter 2: Linear Functions Chapter one was a window that gave us a peek into the entire course. Our goal was to understand the basic structure of functions and function notation, the toolkit functions,

More information

Expectations and Variance

Expectations and Variance 4. Model parameters and their estimates 4.1 Expected Value and Conditional Expected Value 4. The Variance 4.3 Population vs Sample Quantities 4.4 Mean and Variance of a Linear Combination 4.5 The Covariance

More information

Topic 10 - Linear Regression

Topic 10 - Linear Regression Topic 10 - Linear Regression Least squares principle Hypothesis tests/confidence intervals/prediction intervals for regression 1 Linear Regression How much should you pay for a house? Would you consider

More information

+ Statistical Methods in

+ Statistical Methods in + Statistical Methods in Practice STAT/MATH 3379 + Discovering Statistics 2nd Edition Daniel T. Larose Dr. A. B. W. Manage Associate Professor of Mathematics & Statistics Department of Mathematics & Statistics

More information

Ch. 3 Review - LSRL AP Stats

Ch. 3 Review - LSRL AP Stats Ch. 3 Review - LSRL AP Stats Multiple Choice Identify the choice that best completes the statement or answers the question. Scenario 3-1 The height (in feet) and volume (in cubic feet) of usable lumber

More information

Sections OPIM 303, Managerial Statistics H Guy Williams, 2006

Sections OPIM 303, Managerial Statistics H Guy Williams, 2006 Sections 3.1 3.5 The three major properties which describe a set of data: Central Tendency Variation Shape OPIM 303 Lecture 3 Page 1 Most sets of data show a distinct tendency to group or cluster around

More information

MATH 1150 Chapter 2 Notation and Terminology

MATH 1150 Chapter 2 Notation and Terminology MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3: Examining Relationships Most statistical studies involve more than one variable. Often in the AP Statistics exam, you will be asked to compare two data sets by using side by side boxplots or

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics Data and Statistics Data consists of information coming from observations, counts, measurements, or responses. Statistics is the science of collecting, organizing, analyzing,

More information

HUDM4122 Probability and Statistical Inference. February 2, 2015

HUDM4122 Probability and Statistical Inference. February 2, 2015 HUDM4122 Probability and Statistical Inference February 2, 2015 Special Session on SPSS Thursday, April 23 4pm-6pm As of when I closed the poll, every student except one could make it to this I am happy

More information

The empirical ( ) rule

The empirical ( ) rule The empirical (68-95-99.7) rule With a bell shaped distribution, about 68% of the data fall within a distance of 1 standard deviation from the mean. 95% fall within 2 standard deviations of the mean. 99.7%

More information

Descriptive Univariate Statistics and Bivariate Correlation

Descriptive Univariate Statistics and Bivariate Correlation ESC 100 Exploring Engineering Descriptive Univariate Statistics and Bivariate Correlation Instructor: Sudhir Khetan, Ph.D. Wednesday/Friday, October 17/19, 2012 The Central Dogma of Statistics used to

More information

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.

Chapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc. Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright

More information

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.

Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January

More information

Chapter 3 Multiple Regression Complete Example

Chapter 3 Multiple Regression Complete Example Department of Quantitative Methods & Information Systems ECON 504 Chapter 3 Multiple Regression Complete Example Spring 2013 Dr. Mohammad Zainal Review Goals After completing this lecture, you should be

More information

Chapter 12 - Part I: Correlation Analysis

Chapter 12 - Part I: Correlation Analysis ST coursework due Friday, April - Chapter - Part I: Correlation Analysis Textbook Assignment Page - # Page - #, Page - # Lab Assignment # (available on ST webpage) GOALS When you have completed this lecture,

More information

Examining Relationships. Chapter 3

Examining Relationships. Chapter 3 Examining Relationships Chapter 3 Scatterplots A scatterplot shows the relationship between two quantitative variables measured on the same individuals. The explanatory variable, if there is one, is graphed

More information

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation

AP Statistics. Chapter 6 Scatterplots, Association, and Correlation AP Statistics Chapter 6 Scatterplots, Association, and Correlation Objectives: Scatterplots Association Outliers Response Variable Explanatory Variable Correlation Correlation Coefficient Lurking Variables

More information

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots

Relationships between variables. Visualizing Bivariate Distributions: Scatter Plots SFBS Course Notes Part 7: Correlation Bivariate relationships (p. 1) Linear transformations (p. 3) Pearson r : Measuring a relationship (p. 5) Interpretation of correlations (p. 10) Relationships between

More information

More formally, the Gini coefficient is defined as. with p(y) = F Y (y) and where GL(p, F ) the Generalized Lorenz ordinate of F Y is ( )

More formally, the Gini coefficient is defined as. with p(y) = F Y (y) and where GL(p, F ) the Generalized Lorenz ordinate of F Y is ( ) Fortin Econ 56 3. Measurement The theoretical literature on income inequality has developed sophisticated measures (e.g. Gini coefficient) on inequality according to some desirable properties such as decomposability

More information

2017 Source of Foreign Income Earned By Fund

2017 Source of Foreign Income Earned By Fund 2017 Source of Foreign Income Earned By Fund Putnam Emerging Markets Equity Fund EIN: 26-2670607 FYE: 08/31/2017 Statement Pursuant to 1.853-4: The fund is hereby electing to apply code section 853 for

More information

Chapter 7. Scatterplots, Association, and Correlation

Chapter 7. Scatterplots, Association, and Correlation Chapter 7 Scatterplots, Association, and Correlation Bin Zou (bzou@ualberta.ca) STAT 141 University of Alberta Winter 2015 1 / 29 Objective In this chapter, we study relationships! Instead, we investigate

More information

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation

Overview. Overview. Overview. Specific Examples. General Examples. Bivariate Regression & Correlation Bivariate Regression & Correlation Overview The Scatter Diagram Two Examples: Education & Prestige Correlation Coefficient Bivariate Linear Regression Line SPSS Output Interpretation Covariance ou already

More information

STA 218: Statistics for Management

STA 218: Statistics for Management Al Nosedal. University of Toronto. Fall 2017 My momma always said: Life was like a box of chocolates. You never know what you re gonna get. Forrest Gump. Problem How much do people with a bachelor s degree

More information

Section 4: Multiple Linear Regression

Section 4: Multiple Linear Regression Section 4: Multiple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 The Multiple Regression

More information

3E4: Modelling Choice

3E4: Modelling Choice 3E4: Modelling Choice Lecture 6 Goal Programming Multiple Objective Optimisation Portfolio Optimisation Announcements Supervision 2 To be held by the end of next week Present your solutions to all Lecture

More information

From Argentina to Zimbabwe: Where Should I Sell my Widgets?

From Argentina to Zimbabwe: Where Should I Sell my Widgets? From Argentina to Zimbabwe: Department of Statistics Texas A&M University 15 Feb 2010 Acknowledgments This is joint work with my coauthors Bani Mallick (Texas A&M University) Debu Talukdar (SUNY - Buffalo)

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Chapter 6. Exploring Data: Relationships. Solutions. Exercises:

Chapter 6. Exploring Data: Relationships. Solutions. Exercises: Chapter 6 Exploring Data: Relationships Solutions Exercises: 1. (a) It is more reasonable to explore study time as an explanatory variable and the exam grade as the response variable. (b) It is more reasonable

More information

Chapter 2: Looking at Data Relationships (Part 3)

Chapter 2: Looking at Data Relationships (Part 3) Chapter 2: Looking at Data Relationships (Part 3) Dr. Nahid Sultana Chapter 2: Looking at Data Relationships 2.1: Scatterplots 2.2: Correlation 2.3: Least-Squares Regression 2.5: Data Analysis for Two-Way

More information

Practice Questions for Math 131 Exam # 1

Practice Questions for Math 131 Exam # 1 Practice Questions for Math 131 Exam # 1 1) A company produces a product for which the variable cost per unit is $3.50 and fixed cost 1) is $20,000 per year. Next year, the company wants the total cost

More information

Vocabulary: Data About Us

Vocabulary: Data About Us Vocabulary: Data About Us Two Types of Data Concept Numerical data: is data about some attribute that must be organized by numerical order to show how the data varies. For example: Number of pets Measure

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation

Chapter 14. Statistical versus Deterministic Relationships. Distance versus Speed. Describing Relationships: Scatterplots and Correlation Chapter 14 Describing Relationships: Scatterplots and Correlation Chapter 14 1 Statistical versus Deterministic Relationships Distance versus Speed (when travel time is constant). Income (in millions of

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

('')''* = 1- $302. It is common to include parentheses around negative numbers when they appear after an operation symbol.

('')''* = 1- $302. It is common to include parentheses around negative numbers when they appear after an operation symbol. 2.2 ADDING INTEGERS Adding Integers with the Same Sign We often associate the + and - symbols with positive and negative situations. We can find the sum of integers by considering the outcome of these

More information

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website.

Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website. Section 2.5 from Precalculus was developed by OpenStax College, licensed by Rice University, and is available on the Connexions website. It is used under a Creative Commons Attribution-NonCommercial- ShareAlike

More information

Simple linear regression

Simple linear regression Simple linear regression Business Statistics 41000 Fall 2015 1 Topics 1. conditional distributions, squared error, means and variances 2. linear prediction 3. signal + noise and R 2 goodness of fit 4.

More information

Least Squares Regression

Least Squares Regression Least Squares Regression Sections 5.3 & 5.4 Cathy Poliak, Ph.D. cathy@math.uh.edu Office in Fleming 11c Department of Mathematics University of Houston Lecture 14-2311 Cathy Poliak, Ph.D. cathy@math.uh.edu

More information

Export Destinations and Input Prices. Appendix A

Export Destinations and Input Prices. Appendix A Export Destinations and Input Prices Paulo Bastos Joana Silva Eric Verhoogen Jan. 2016 Appendix A For Online Publication Figure A1. Real Exchange Rate, Selected Richer Export Destinations UK USA Sweden

More information

Bivariate Relationships Between Variables

Bivariate Relationships Between Variables Bivariate Relationships Between Variables BUS 735: Business Decision Making and Research 1 Goals Specific goals: Detect relationships between variables. Be able to prescribe appropriate statistical methods

More information

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin

Note on Bivariate Regression: Connecting Practice and Theory. Konstantin Kashin Note on Bivariate Regression: Connecting Practice and Theory Konstantin Kashin Fall 2012 1 This note will explain - in less theoretical terms - the basics of a bivariate linear regression, including testing

More information

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc.

Section Linear Correlation and Regression. Copyright 2013, 2010, 2007, Pearson, Education, Inc. Section 13.7 Linear Correlation and Regression What You Will Learn Linear Correlation Scatter Diagram Linear Regression Least Squares Line 13.7-2 Linear Correlation Linear correlation is used to determine

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line

Chapter 7. Linear Regression (Pt. 1) 7.1 Introduction. 7.2 The Least-Squares Regression Line Chapter 7 Linear Regression (Pt. 1) 7.1 Introduction Recall that r, the correlation coefficient, measures the linear association between two quantitative variables. Linear regression is the method of fitting

More information

This is Solving Linear Systems, chapter 4 from the book Beginning Algebra (index.html) (v. 1.0).

This is Solving Linear Systems, chapter 4 from the book Beginning Algebra (index.html) (v. 1.0). This is Solving Linear Systems, chapter 4 from the book Beginning Algebra (index.html) (v. 1.0). This book is licensed under a Creative Commons by-nc-sa 3.0 (http://creativecommons.org/licenses/by-nc-sa/

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

FSA Algebra I End-of-Course Review Packet

FSA Algebra I End-of-Course Review Packet FSA Algebra I End-of-Course Review Packet Table of Contents MAFS.912.N-RN.1.2 EOC Practice... 3 MAFS.912.N-RN.2.3 EOC Practice... 5 MAFS.912.N-RN.1.1 EOC Practice... 8 MAFS.912.S-ID.1.1 EOC Practice...

More information

STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical Methods Course code: EC40 Examiner: Lena Nekby Number of credits: 7,5 credits Date of exam: Saturday, May 9, 008 Examination time: 3

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Chapter 3 Review Chapter 3: Examining Relationships 1. A study is conducted to determine if one can predict the yield of a crop based on the amount of yearly rainfall. The response variable in this study

More information

Lesson 3 Average Rate of Change and Linear Functions

Lesson 3 Average Rate of Change and Linear Functions Lesson 3 Average Rate of Change and Linear Functions Lesson 3 Average Rate of Change and Linear Functions In this lesson, we will introduce the concept of average rate of change followed by a review of

More information

Analyzing Lines of Fit

Analyzing Lines of Fit 4.5 Analyzing Lines of Fit Essential Question How can you analytically find a line of best fit for a scatter plot? Finding a Line of Best Fit Work with a partner. The scatter plot shows the median ages

More information

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930)

Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS. X = cigarette consumption (per capita in 1930) BIOSTATS 540 Fall 2015 Introductory Biostatistics Page 1 of 10 Unit 9 Regression and Correlation Homework #14 (Unit 9 Regression and Correlation) SOLUTIONS Consider the following study of the relationship

More information

Quantitative Bivariate Data

Quantitative Bivariate Data Statistics 211 (L02) - Linear Regression Quantitative Bivariate Data Consider two quantitative variables, defined in the following way: X i - the observed value of Variable X from subject i, i = 1, 2,,

More information

Simple Regression Model. January 24, 2011

Simple Regression Model. January 24, 2011 Simple Regression Model January 24, 2011 Outline Descriptive Analysis Causal Estimation Forecasting Regression Model We are actually going to derive the linear regression model in 3 very different ways

More information

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006 Chapter 17 Simple Linear Regression and Correlation 17.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

STATS DOESN T SUCK! ~ CHAPTER 16

STATS DOESN T SUCK! ~ CHAPTER 16 SIMPLE LINEAR REGRESSION: STATS DOESN T SUCK! ~ CHAPTER 6 The HR manager at ACME food services wants to examine the relationship between a workers income and their years of experience on the job. He randomly

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions

Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Psych 10 / Stats 60, Practice Problem Set 10 (Week 10 Material), Solutions Part 1: Conceptual ideas about correlation and regression Tintle 10.1.1 The association would be negative (as distance increases,

More information

Basic Practice of Statistics 7th

Basic Practice of Statistics 7th Basic Practice of Statistics 7th Edition Lecture PowerPoint Slides In Chapter 4, we cover Explanatory and response variables Displaying relationships: Scatterplots Interpreting scatterplots Adding categorical

More information

Assumptions in Regression Modeling

Assumptions in Regression Modeling Fall Semester, 2001 Statistics 621 Lecture 2 Robert Stine 1 Assumptions in Regression Modeling Preliminaries Preparing for class Read the casebook prior to class Pace in class is too fast to absorb without

More information

Scatterplots and Correlation

Scatterplots and Correlation Chapter 4 Scatterplots and Correlation 2/15/2019 Chapter 4 1 Explanatory Variable and Response Variable Correlation describes linear relationships between quantitative variables X is the quantitative explanatory

More information

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa

How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa How Well Are Recessions and Recoveries Forecast? Prakash Loungani, Herman Stekler and Natalia Tamirisa 1 Outline Focus of the study Data Dispersion and forecast errors during turning points Testing efficiency

More information

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190.

Related Example on Page(s) R , 148 R , 148 R , 156, 157 R3.1, R3.2. Activity on 152, , 190. Name Chapter 3 Learning Objectives Identify explanatory and response variables in situations where one variable helps to explain or influences the other. Make a scatterplot to display the relationship

More information

Nigerian Capital Importation QUARTER THREE 2016

Nigerian Capital Importation QUARTER THREE 2016 Nigerian Capital Importation QUARTER THREE 2016 _ November 2016 Capital Importation Data The data on Capital Importation used in this report was obtained from the Central Bank of Nigeria (CBN). The data

More information

Essential Maths Skills. for AS/A-level. Geography. Helen Harris. Series Editor Heather Davis Educational Consultant with Cornwall Learning

Essential Maths Skills. for AS/A-level. Geography. Helen Harris. Series Editor Heather Davis Educational Consultant with Cornwall Learning Essential Maths Skills for AS/A-level Geography Helen Harris Series Editor Heather Davis Educational Consultant with Cornwall Learning Contents Introduction... 5 1 Understanding data Nominal, ordinal and

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information