Assoc.Prof.Dr. Wolfgang Feilmayr Multivariate Methods in Regional Science: Regression and Correlation Analysis REGRESSION ANALYSIS

Size: px
Start display at page:

Download "Assoc.Prof.Dr. Wolfgang Feilmayr Multivariate Methods in Regional Science: Regression and Correlation Analysis REGRESSION ANALYSIS"

Transcription

1 REGRESSION ANALYSIS Regression Analysis can be broadly defined as the analysis of statistical relationships between one dependent and one or more independent variables. Although the terms dependent and independent variables are quite conventional, no implication of causality is necessarily implied in any given case. This is true no matter how strong the statistical relationship might be. In some instances there may be strong a priori grounds for specification of a cause-and-effect relationship and the selection of a dependent variable and one or more independent variables. In many other situations this may not be so easy. Regional science, and social science generally, are characterized by the recognition of relationships between variables which are unclear, ambiguous, or possibly even exhibiting two-way dependence. All variables involved have to be quantitative. 1

2 Examples How do income and duration of holidays influence holiday expenses. What is the relationship between the occupancy rate of tourist beds and the number of tourist facilities. How will be the future development of potential building plots in some communities. Purposes of Regression Analysis (1) Analysis of the relationship between dependent and independent variables; e.g. estimation of the regression coefficients b j. () Examination, if the estimated relationship is significant and if it can be extended from the sample to the statistical population, if sample and population are not identical. Significance is also an indicator, if the number of observations is large enough in relation to the number of independent variables

3 y = b 0 + b 1 x 1 + b x b n x n + u y x i b i u dependent variable regressors (independent variables) regression coefficients stochastic term (residual) Methods of estimation 1. Least Square Estimation (Kleinst-Quadrat Schätzung). Maximum Liklihood It can be shown, that in the case of linear regression analysis the two methods lead to the same results. We restrict here to the first case and detail "Least Square Estimation" in the case of one independent variable (Simple Regression). In this case dependent and independent variable can be depicted as points in the R. Graphically the purpose of simple regression analysis can be described as "fitting" the regression line to the scattergram of the variables

4 Least Square Estimation Minimize the sum of the squared deviations from the regression line! y = a + bx + u y = a + bx y predicted value for y u = y - y u i = (y i - a - bx i ) min! u i = y i - ay i - bx i + abx i + a + b x i 4

5 Differentiating with respect to a and b! Setting each derivative to zero! u i / δa = - y i + bx i + a = 0 u i / δb = - x i + ax i + bx i = 0 That yields the socalled Normal Equations (Normalgleichungen): an + b x i = y i a x i + b x i = x i y i from which the regression coefficients can be derived: a = y b x i N i and b = N x y x y i i i i N x ( x ) i i N number of cases 5

6 Example: Let the y i be the real estate prices (m ) in Vienna. We would like to know, how they are influenced by accessability (the x i represent the distance from the city center). y i x i x i x i y i y i residual sum * * 65 b = b = * * 65 a = a = y = x 6

7 Least-Square Estimation for k Variables (Multiple Regression Analysis) Let Y be the the column vector of the n observations y i ; Let X be the (n x k+1) matrix of the independent variables (note: the x i0 are all equal to 1); Let β be the (k+1) column vector of estimates of ß and e the column vector of the n residuals: Y = y1 y... y n x10 x11... x1k ˆ β 1 x0 x1... xk ˆ β X = ˆ. β = xn0 xn 1... xnk. ˆ β k e = e1 e... e n Then we may write the multiple regression model as: and we have to minimize: Y = X β + e n e i i=1 = e e 7

8 = (Y - X β) (Y - X β ) = Y Y - β X Y + β X X β To find the value of β, which minimizes the sum of the squared residuals we differentiate: ( ) ee = XY + X Xβ β Setting this expression to zero gives: β = (X X) -1 X Y 8

9 Analysis of Variance (of the Regression Model) Method: Decomposition of Variance ( yi y) = ( y y) i + ( y y ) i i Total = Regression + Error Variance Variance Variance SS tot = SS reg + SS err r = SS reg /SS tot = SS reg /(SS reg + SS err ) ε [0,1] 9

10 It can be shown, that the coefficient of determination (goodness of fit; Bestimmtheitsmaß) r equals the square of the Correlation coefficient r between y und x. Example: Calculate the coeffiecient of determination for the regression of the real estate prices: y i y i ( y y) ( y y ) ( yi y) i Sum i i y =33334 r = / =

11 Inferences in Regression Analysis 1. Inferences concerning the r Null hypothesis: There is no relationship (in the population) between dependent and independent variables. F-Test: If the empirical F-value F emp is greater than the tabulated F-value F tab, the null hypothesis has to be rejected; therefore the influence is significant on the respective level of significance (usually between 90% and 99.9%). Femp = r / M ( 1 r )/( N M 1) M N number of independent variables number of observations (sample size). Inferences concerning the regression coefficients Null hypothesis: The regression coefficients equal (in the population) zero; the respective variables have no influence. T-Test: If the empirical T-value T emp is greater than the absolute value of of the tabulated T-value T tab, 11

12 the null hypothesis has to be rejected; therefore the influence of the independent variable under consideration is significant on the respective level of significance (usually between 90% and 99.9%). T emp = b S j bj b j S bj regression coefficient of variable j standard Error of b j The Basic Assumptions of the Simple Linear Regression Model 1. For the i-th level of the independent variable x i the expected value of the error e i is equal to zero. This is usually expressed as E(e i ) = 0, for i = 1, N.. The variance of the error component e i is constant for all levels of X i ; that is, V(e i ) =σ, for i = 1, N. 3. The values of the error component for any two e i and e j are pairwise uncorrelated. 1

13 4. The error components e i are normally distributed. Gauß - Markov Theorem Given the four assumptions of the simple linear regression model, the least squares estimators a and b are unbiased and have the minimum variance among all the linear unbiased estimators of the regression coefficients. Violations of the basic assumptions of the linear regression model 1. Multicollinearity Assumption: Exogenuos variables have to be independent from each other (must not be correlated) 13

14 Mearsurement: Calculation of "Tolerance values" Tolerance of x j : 1 - r j r j Coefficient of determination of a regression with x j as dependent and all other variables as independent variables. Tolerance values close to zero indicate multicollinearity. Recommendations: (1) Elimination of variables with low tolerance values () Add new observations to your sample (3) Factor Analysis. Autocorrelation Assumption: Residuals are uncorrelated (Ass. 3) Autocorrelation occurs, if the deviations from the regression line are not at random, but depend for 14

15 example on the deviation of previous observations. This is the case with time-related autocorrelation. Measurement: Durbin-Watson Test d = ( e e k k ) e k 1 e k d Residual of observation k Durbin-Watson Coefficient For values close to no autocorrelation has to be suspected. Values close to 0 indicate positive, values close to 4 negative autocorrelation. Recommendation: Search for those variables, which are responsible for autocorrelation and incorporate them into the model; autocorrelation may be an indicator for a nonlinear relationship. More important in regional science is spatial autocorrelation. It occurs, if one observation value is dependent from the value of a neighbouring observation. (Example: The price of a real estate depends on the prices having been paid in the neighbourhood). Spatial autocorrelation can be detected by visualising the residuals on maps or by performing special contiguity tests. 15

16 3. Heteroscedasticity Assumption: The variance of the residuals must be constant (Ass. ) The residuals must not be influenced by the ammount or by the sequence of the observations of the dependent variable. (Example: Increasing measurement errors due to decreasing attention of the person collecting the data). Measurement: Calculating the variances Recommendation:Heteroscedasticity often is an indicator for a nonlinear relationship 16

17 Recommendations for Regression Analysis 1. The problem which has to be analysed should be specified carefully.. The sample should be sufficiently large. At least the number of observations should be twice the number of the variables in the model. 3. At the beginning hypotheses about the expected relationships (strength and sign) should be formulated. 4. After the estimation of the regression function first of all the significance of the determination coefficient has to be examined. If it is not signifgicant, the regression hypothesis has to be rejected. Otherwise the regression coefficients have to be examined logically (sign, amount) and statistically (significance). 5. Check, if the estimated regression function violates the assumptions of the linear regression model. 6. Eventually variables have to be dropped from the equation or new variables have to be added. 17

18 The model formulation should be done iteratively. The results from one step are used to formulate new hypotheses, which will be examined in a further step. 7. This iterative process is supported by most of the computer-based statistical software systems (SPSS, SAS). One method is "Forward Selection". Hereby the first variable considered for entry into the equation is the one with the largest positive or negative correlation with the dependent variable. The F-test for the hypothesis that the coefficient of the entered variable is zero is then calculated. To determine wether this variable (and each succeeding variable) is entered, the F-value is compared to an established criterion (either a tabulated F-Value (FIN), or a p-value (PIN)). The process terminates until no significant independent variables are left. While forward selection starts with no independent variables in the equation and sequentially enters them, backward elimination starts with all variables in the equation and sequentially removes them. Instead of entry criteria, removal criteria are used. "Stepwise selection" is really a combination of backward and forward procedures and is probably the most commonly used method. Variables, once entered, may be removed again, if they do not meet the defined criteria. 18

19 CORRELATION ANALYSIS Correlation analysis investigates the statistical correlation between variables without explicitly distinguishing between dependent and independent variables. Examples: (-) Is there any relationship between personal income and holiday expenses (-) Is there any relationship between the marks in mathematics and in computer science 1. Bivariate Correlation between Metric Variables r xy = ( x x)( y y) i i ( x x) ( y y) i i r xy Correlation coefficient (Pearson s product-moment correlation coefficient) between variables x and y -1 r xy +1 19

20 Inferences in Correlation Analysis Null hypothesis: There is no relationship between the two variables. T-Test: If the empirical T-value T emp is greater than the tabulated T-value T tab, the null hypothesis has to be rejected; therefore the influence is significant on the respective level of significance (usually between 90% and 99.9%). r T emp = xy n 1 xy r n number of observations. Multiple Correlation between Metric Variables In order to mearsure the relationship between one variable on the one side and more variables on the other side multiple correlation has to be applied. 0

21 In the case of three variables multiple correlation is defined as follows: R.= xyz r + r r r r xy xz xy xz yz 1 r yz R x.yz multiple correlation coefficient between x and y plus z Note: If the direction of cause and effect is known, one should perform a multiple regression analysis. It can be shown, that the square of the multiple correlation coefficient equals the coefficient of determination. 3. Partial Correlation between Metric Variables In many cases the correlation between two variables is also influenced by other variables. To account for this a partial correlation coefficient can be defined. The partial correlation coefficient measures the relationship between two variables controlling for other variables. 1

22 For example if you wish to control for a third variable z, the partial correlation coefficient is defined as: r xy. z = r r r xy xz yz ( 1 r )( 1 r ) xz yz r xy.z partial correlation coefficient between x and y controlling for z 4. Correlation between Ordinal Variables (Rank Correlation Coefficients) To measure the relationship between ordinal scaled variables (example: the ranks of location quality before and after some infrastructure investments) rank correlation coefficients are appropriate. One of the most commonly used measures of ordinal association is Spearman s rs : r s = 1 6 Di nn ( 1) r s D i n Spearman s rs differences of the ranks of each observation number of observations

23 5. Correlations between Nominal Variables (Nominal Scale Measures) The analysis of the dependency of two nominal scaled variables is identical with the analysis of contingency tables. Contingency tables are matrices, where the rows represent the categories of the first and the columns the categories of a second attribute. The matrix elements f ij contain the counts of observations with the first attribute i and the second attribute j: f ij R i C j matrix elements sums of rows sums of columns f11 f1 R1 F = f1 f R C C 1 Example: Relationship between sex and nationality nation male female sum A D NL other sum

24 The relationship between the two variables can be measured with the socalled χ -Test. The χ -statistic is defined as: χ = k1 k ( f ij Fij) i= 1 j= 1 Fij F ij expected frequency of f ij F ij = (R i * C j )/N F = χ = The difficulty with χ is that the value of χ in any contingency table is directly proportional to the sample size N. Two tables with identically proportional cell frequencies will have different χ values. One way of overcoming this problem is to use the φ (phi) coefficient: φ = Inference-Test: If the empirical χ-value is greater than the tabulated χ-value, the null hypothesis, that there is no relationship has to be rejected; therefore the influence is significant on the respective level of significance (usually between 90% and 99.9%). χ N 4

25 Recommendations for Correlation Analysis Before calculating correlation coefficients it is necessary to examine, if formal correlation does exist between two variables (if for example the variables are percentages summing up to 100%). If this can be excluded the case of inhomogenity correlation should be checked. Hereby the observations come from subpopulations lying in different parts of the coordinate system. If the subpopulations are not distinguished, correlations effects may occur, which are not consistent with the relationships within the subpopulations. 5

26 Here a positive correlation between live expectancy and car ownership might be identified, being inconsistant with the relationship within the subpopulations. Finally it should be checked, if joint correlation does exist, as it occurs for example between size and weight of individual persons. If all the "pathlogical" cases can be excluded, it can be assumed, that the calculated correlations indicate a real relationship between the variables under consideration. 6

27 SPSS-PROCEDURES FOR REGRESSION ANALYSIS 7

28 With Method you can choose a proper method. The default method is forced entry. All variables are entered in a single step (Enter). The options 8

29 Forward, Backward and Stepwise correspond to the the methods described earlier. 9

30 30

31 31

32 3

33 33

34 Additional Information Concerning the Interpretation of the SPSS-Regression Output The statistic adjusted R attempts to correct R to more closely reflect the goodness of fit of the model in the population. Adjusted R is given by R R a = R M( 1 ) N M 1 where M is the number of independent variables in the equation. Since the population variance of the errors σ, is not known, it must also be estimated. The usual estimate of σ is (in the case of a simple regression) S = ( Y B B X ) N i 0 1 N i The positive square root of S is termed the standard error of estimate, or the standard deviation of the residuals. The estimated standard errors of the parameters of the regression are displayed in third column and labeled SE B. It is also inappropriate to interpret the B s as indicators of the relative importance of variables. The actual magnitude 34

35 of the coefficients depends on the units in which the variables are measured. Only if all independent variables are measured in the same units-years, for example-are their coefficients directly comparable. When variables differ substantially in units of measurement, the sheer magnitude of their coefficients does not reveal anything real about relative importance. One way to make regression coefficients somewhat more comparable is to calculate beta weights, which are the coefficients of the independent variables when all variables are expressed in standardized (Z score) form. The beta coefficients can be calculated directly from the regression coefficients using k betak B S k S Y = ( ) where S k is the standard deviation of the kth independent variable. However, the values of the beta coefficients, like the B s, are contingent on the other independent variables in the equation. They are also affected by the correlations of the independent variables and do not in any absolute sense reflect the importance of the various independent variables. 35

36 SPSS-OUTPUT FROM REGRESSION ANALYSIS Descriptive Statistics MEAND ABIPRO LANDESH LANDP UEGES1 Std. Mean Deviation N 1046,594 97, ,3747 5, ,535 73, , , , , Meand Average price (ATS/m) for building land Abipro Percentage of higher educated people Landesh Distance to federal capital (min) Landp Percentage of agricultural employees Ueges1 Number of nights per year Correlations Pearson Correlation Sig. (1-tailed) N MEAND ABIPRO LANDESH LANDP UEGES1 MEAND ABIPRO LANDESH LANDP UEGES1 MEAND ABIPRO LANDESH LANDP UEGES1 MEAND ABIPRO LANDESH LANDP UEGES1 1,000,536 -,37 -,40,347,536 1,000 -,94 -,501,099 -,37 -,94 1,000,5,07 -,40 -,501,5 1,000 -,5,347,099,07 -,5 1,000,,000,000,000,000,000,,000,000,00,000,000,,000,11,000,000,000,,000,000,00,11,000,

37 Model 1 Model Summary b Std. Error Adjusted R of the R R Square Square Estimate,647 a,418, , a. Predictors: (Constant), UEGES1, LANDESH, ABIPRO, LANDP b. Dependent Variable: MEAND Model 1 Regression Residual Total ANOVA b Sum of Mean Squares df Square F Sig. 3,47E ,803,000 a 4,83E ,580 8,30E a. Predictors: (Constant), UEGES1, LANDESH, ABIPRO, LANDP b. Dependent Variable: MEAND Model 1 (Constant) ABIPRO LANDESH LANDP UEGES1 Unstandardized Coefficients a. Dependent Variable: MEAND Coefficients a Standardi zed Coefficien ts B Std. Error Beta t Sig. 581,495 10,885 5,65,000 76,380 5,755,405 13,71,000 -,575,359 -,195-7,17,000-5,413 1,867 -,089 -,899,004 1,577E-03,000,9 10,978,000 Case Number Casewise Diagnostics a Std. Predicted Residual MEAND Value Residual -3, , , ,68 a. Dependent Variable: MEAND 3, , , ,416 5, , , ,949 3,58 456, ,609 63,891 4, ,000 50, ,030 3, , , ,714 3, , , ,904 4, , , ,75 37

38 Residuals Statistics a Minimum Maximum Mean Std. Deviation N Predicted Value -48, , ,594 68, Residual -334, ,949-1,E-13 74, Std. Predicted Value -,059 6,031,000 1, Std. Residual -3,139 5,558,000, a. Dependent Variable: MEAND Histogram 160 Dependent Variable: MEAND Frequency ,5 4,75 4,5 3,75 3,5,75,5 1,75 1,5,75,5 -,5 -,75-1,5-1,75 -,5 -,75-3,5 Std. Dev = 1,00 Mean = 0,00 N = 878,00 Regression Standardized Residual 38

39 Scatterplot 6000 Dependent Variable: MEAND MEAND Regression Standardized Residual Example for Simulation What is the virtual price for a community with 0% higher educated, 0 min distance from the federal capital, 10% agricultural employees and nights per year? P = 581,5 + 76,38 x 0 -,575 x 0-5,413 x x =.08,3 ATS 39

40 SPSS-PROCEDURES FOR CORRELATION ANALYSIS 40

41 41

42 4

43 SPSS-OUTPUT FROM CORRELATION ANALYSIS 1. Pearson Correlation Fahrzeit nach Wien im IV BEZH LANDESH MEAND Pearson Correlation Sig. (-tailed) N Pearson Correlation Sig. (-tailed) N Pearson Correlation Sig. (-tailed) N Pearson Correlation Sig. (-tailed) N Correlations **. Correlation is significant at the 0.01 level (-tailed). Fahrzeit nach Wien im IV BEZH LANDESH MEAND 1,000,100**,061**,31**,,000,003, ,100** 1,000,330** -,167**,000,,000, ,061**,330** 1,000 -,97**,003,000,, ,31** -,167** -,97** 1,000,000,000,000, Rank (nonparametric) Correlation Correlations Kendall s tau_b Spearman s rho WI SO WI SO Correlation Coefficient Sig. (-tailed) N Correlation Coefficient Sig. (-tailed) N Correlation Coefficient Sig. (-tailed) N Correlation Coefficient Sig. (-tailed) N **. Correlation is significant at the.01 level (-tailed). WI SO 1,000,73**,, ,73** 1,000,000, ,000,746**,, ,746** 1,000,000, WI SO 3 types of winter tourism (1=weak; =medium; 3=high) 3 types of summer tourism (1=weak; =medium; 3=high) 43

44 SPSS-PROCEDURES FOR PARTIAL CORRELATION ANALYSIS 44

45 45

46 SPSS-OUTPUT FROM PARTIAL CORRELATION-ANALYSIS P A R T I A L C O R R E L A T I O N C O E F F I C I E N T S Controlling for.. MEAND ABIPRO LANDESH MEAND 1,0000 -,188 ( 0) ( 188) P=, P=,000 LANDESH -,188 1,0000 ( 188) ( 0) P=,000 P=, (Coefficient / (D.F.) / -tailed Significance) ", " is printed if a coefficient cannot be computed Note, that the correlation coefficient between building land price and distance to federal capital is smaller, if you control for the percentage of the higher educated ( vs ). SPSS-PROCEDURES FOR CONTINGENCY ANALYSIS 46

47 47

48 A graphical display is available by activating Display clustered bar charts. If you are interested in crosstabulation statistical measures but don t want to display the actual tables, you can choose Suppress Tables. 48

49 49

50 Finally you can modify the table format by clicking on Format in the Crosstabs dialog box. SPSS-OUTPUT FROM CONTINGENCY ANALYSIS 50

51 HEIZ * ZUST Crosstabulation HEIZ Total Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total Residual Count Expected Count % within HEIZ % within ZUST % of Total ZUST 1 3 Total ,9 6,0,1 5,0 40,0% 40,0% 0,0% 100,0%,3%,8% 1,1%,5%,%,%,1%,5% -6,9 4,0, ,8 18,6 45,6 536,0 70,9% 4,4% 4,7% 100,0% 11,0% 10,7% 5,7% 10,5% 7,4%,6%,5% 10,5% 18,,4-0, ,7 164,1 58, 684,0 79,4% 17,7%,9% 100,0% 15,7% 9,9% 4,6% 13,4% 10,6%,4%,4% 13,4% 81,3-43,1-38, ,0 48,0 17,0 00,0 85,5% 13,0% 1,5% 100,0% 5,0%,1%,7% 3,9% 3,3%,5%,1% 3,9% 36,0 -,0-14, ,3 430, 15,5 1793,0 78,% 18,0% 3,8% 100,0% 40,6% 6,3% 15,6% 35,1% 7,4% 6,3% 1,3% 35,1% 191,7-107, -84, ,7 68,1 4, 84,0 37,7% 43,0% 19,4% 100,0% 3,1% 9,9% 1,6% 5,6%,1%,4% 1,1% 5,6% -84,7 53,9 30, , 33,8 1,0 141,0 65,%,0% 1,8% 100,0%,7%,5% 4,1%,8% 1,8%,6%,4%,8% -3, -,8 6, ,8 4,6 1,6 19,0 36,8% 63,%,0% 100,0%,% 1,0%,0%,4%,1%,%,0%,4% -5,8 7,4-1, ,7 10,6 3,7 44,0 93,% 6,8%,0% 100,0% 1,%,%,0%,9%,8%,1%,0%,9% 11,3-7,6-3, ,4,5,,0 50,0% 50,0%,0% 100,0%,0%,1%,0%,0%,0%,0%,0%,0% -,4,5 -, ,7 40,1 14, 167,0 16,% 47,9% 35,9% 100,0%,8% 6,5% 13,8% 3,3%,5% 1,6% 1,% 3,3% -85,7 39,9 45, ,0 86,4 30,6 360,0 45,3% 37,5% 17,% 100,0% 4,7% 11,0% 14,3% 7,0% 3,%,6% 1,% 7,0% -80,0 48,6 31, ,0 39,1 13,9 163,0 3,3% 19,6% 57,1% 100,0% 1,1%,6% 1,4% 3,%,7%,6% 1,8% 3,% -7,0-7,1 79, ,1 147,6 5,3 615,0 71,5% 6,3%,1% 100,0% 1,7% 13,% 3,0% 1,0% 8,6% 3,%,3% 1,0% 4,9 14,4-39, ,7 19,4 6,9 81,0 37,0% 46,9% 16,0% 100,0%,9% 3,1% 3,0% 1,6%,6%,7%,3% 1,6% -4,7 18,6 6, ,0 17,0 435,0 5114,0 67,5% 4,0% 8,5% 100,0% 100,0% 100,0% 100,0% 100,0% 67,5% 4,0% 8,5% 100,0% 51

52 Pearson Chi-Square Likelihood Ratio Linear-by-Linear Association Chi-Square Tests Asymp. Sig. Value df (-sided) 164,185 a 8, ,515 8,000 43,83 1,000 N of Valid Cases 5114 a. 7 cells (15,6%) have expected count less than 5. The minimum expected count is,17. Nominal by Nominal Interval by Interval Ordinal by Ordinal N of Valid Cases Phi Cramer s V Contingency Coefficient Pearson s R Spearman Correlation a. Not assuming the null hypothesis. Symmetric Measures b. Using the asymptotic standard error assuming the null hypothesis. c. Based on normal approximation. Asymp. Approx. Value Std. Error a Approx. T b Sig.,497,000,35,000,445,000,18,014 15,981,000 c,08,014 15,06,000 c ZUST Count HEIZ 5

53 In this example (crosstabulation of typ of heating system and state of repair of Vienna appartements) we find out a significant relationsship between the two variables, as Pearson Chi-Square as well as Phi are highly significant. Looking closer at the crosstabulation we see that central heating systems (categories to 4) are predominantly associated with "very good" state of repair (positive residuals). 53

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis

ESP 178 Applied Research Methods. 2/23: Quantitative Analysis ESP 178 Applied Research Methods 2/23: Quantitative Analysis Data Preparation Data coding create codebook that defines each variable, its response scale, how it was coded Data entry for mail surveys and

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami

Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric versus Nonparametric Statistics-when to use them and which is more powerful? Dr Mahmoud Alhussami Parametric Assumptions The observations must be independent. Dependent variable should be continuous

More information

Can you tell the relationship between students SAT scores and their college grades?

Can you tell the relationship between students SAT scores and their college grades? Correlation One Challenge Can you tell the relationship between students SAT scores and their college grades? A: The higher SAT scores are, the better GPA may be. B: The higher SAT scores are, the lower

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

Readings Howitt & Cramer (2014) Overview

Readings Howitt & Cramer (2014) Overview Readings Howitt & Cramer (4) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch : Statistical significance

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Correlation. A statistics method to measure the relationship between two variables. Three characteristics

Correlation. A statistics method to measure the relationship between two variables. Three characteristics Correlation Correlation A statistics method to measure the relationship between two variables Three characteristics Direction of the relationship Form of the relationship Strength/Consistency Direction

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know:

Multiple Regression. More Hypothesis Testing. More Hypothesis Testing The big question: What we really want to know: What we actually know: We know: Multiple Regression Ψ320 Ainsworth More Hypothesis Testing What we really want to know: Is the relationship in the population we have selected between X & Y strong enough that we can use the relationship

More information

Chi-Square. Heibatollah Baghi, and Mastee Badii

Chi-Square. Heibatollah Baghi, and Mastee Badii 1 Chi-Square Heibatollah Baghi, and Mastee Badii Different Scales, Different Measures of Association Scale of Both Variables Nominal Scale Measures of Association Pearson Chi-Square: χ 2 Ordinal Scale

More information

Readings Howitt & Cramer (2014)

Readings Howitt & Cramer (2014) Readings Howitt & Cramer (014) Ch 7: Relationships between two or more variables: Diagrams and tables Ch 8: Correlation coefficients: Pearson correlation and Spearman s rho Ch 11: Statistical significance

More information

SPSS Guide For MMI 409

SPSS Guide For MMI 409 SPSS Guide For MMI 409 by John Wong March 2012 Preface Hopefully, this document can provide some guidance to MMI 409 students on how to use SPSS to solve many of the problems covered in the D Agostino

More information

WORKSHOP 3 Measuring Association

WORKSHOP 3 Measuring Association WORKSHOP 3 Measuring Association Concepts Analysing Categorical Data o Testing of Proportions o Contingency Tables & Tests o Odds Ratios Linear Association Measures o Correlation o Simple Linear Regression

More information

Practical Biostatistics

Practical Biostatistics Practical Biostatistics Clinical Epidemiology, Biostatistics and Bioinformatics AMC Multivariable regression Day 5 Recap Describing association: Correlation Parametric technique: Pearson (PMCC) Non-parametric:

More information

y response variable x 1, x 2,, x k -- a set of explanatory variables

y response variable x 1, x 2,, x k -- a set of explanatory variables 11. Multiple Regression and Correlation y response variable x 1, x 2,, x k -- a set of explanatory variables In this chapter, all variables are assumed to be quantitative. Chapters 12-14 show how to incorporate

More information

1 Correlation and Inference from Regression

1 Correlation and Inference from Regression 1 Correlation and Inference from Regression Reading: Kennedy (1998) A Guide to Econometrics, Chapters 4 and 6 Maddala, G.S. (1992) Introduction to Econometrics p. 170-177 Moore and McCabe, chapter 12 is

More information

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing

Chapter Fifteen. Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Chapter Fifteen Frequency Distribution, Cross-Tabulation, and Hypothesis Testing Copyright 2010 Pearson Education, Inc. publishing as Prentice Hall 15-1 Internet Usage Data Table 15.1 Respondent Sex Familiarity

More information

Retrieve and Open the Data

Retrieve and Open the Data Retrieve and Open the Data 1. To download the data, click on the link on the class website for the SPSS syntax file for lab 1. 2. Open the file that you downloaded. 3. In the SPSS Syntax Editor, click

More information

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS

THE ROYAL STATISTICAL SOCIETY 2008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS THE ROYAL STATISTICAL SOCIETY 008 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE (MODULAR FORMAT) MODULE 4 LINEAR MODELS The Society provides these solutions to assist candidates preparing for the examinations

More information

Logistic Regression Analysis

Logistic Regression Analysis Logistic Regression Analysis Predicting whether an event will or will not occur, as well as identifying the variables useful in making the prediction, is important in most academic disciplines as well

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

2 Prediction and Analysis of Variance

2 Prediction and Analysis of Variance 2 Prediction and Analysis of Variance Reading: Chapters and 2 of Kennedy A Guide to Econometrics Achen, Christopher H. Interpreting and Using Regression (London: Sage, 982). Chapter 4 of Andy Field, Discovering

More information

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION In this lab you will first learn how to display the relationship between two quantitative variables with a scatterplot and also how to measure the strength of

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

Topic 1. Definitions

Topic 1. Definitions S Topic. Definitions. Scalar A scalar is a number. 2. Vector A vector is a column of numbers. 3. Linear combination A scalar times a vector plus a scalar times a vector, plus a scalar times a vector...

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

Sociology 593 Exam 1 February 14, 1997

Sociology 593 Exam 1 February 14, 1997 Sociology 9 Exam February, 997 I. True-False. ( points) Indicate whether the following statements are true or false. If false, briefly explain why.. There are IVs in a multiple regression model. If the

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Finding Relationships Among Variables

Finding Relationships Among Variables Finding Relationships Among Variables BUS 230: Business and Economic Research and Communication 1 Goals Specific goals: Re-familiarize ourselves with basic statistics ideas: sampling distributions, hypothesis

More information

Entering and recoding variables

Entering and recoding variables Entering and recoding variables To enter: You create a New data file Define the variables on Variable View Enter the values on Data View To create the dichotomies: Transform -> Recode into Different Variable

More information

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013

QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 2013 QUANTITATIVE STATISTICAL METHODS: REGRESSION AND FORECASTING JOHANNES LEDOLTER VIENNA UNIVERSITY OF ECONOMICS AND BUSINESS ADMINISTRATION SPRING 3 Introduction Objectives of course: Regression and Forecasting

More information

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data? Univariate analysis Example - linear regression equation: y = ax + c Least squares criteria ( yobs ycalc ) = yobs ( ax + c) = minimum Simple and + = xa xc xy xa + nc = y Solve for a and c Univariate analysis

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Statistics: A review. Why statistics?

Statistics: A review. Why statistics? Statistics: A review Why statistics? What statistical concepts should we know? Why statistics? To summarize, to explore, to look for relations, to predict What kinds of data exist? Nominal, Ordinal, Interval

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

Chapter 9 - Correlation and Regression

Chapter 9 - Correlation and Regression Chapter 9 - Correlation and Regression 9. Scatter diagram of percentage of LBW infants (Y) and high-risk fertility rate (X ) in Vermont Health Planning Districts. 9.3 Correlation between percentage of

More information

Regression Analysis. BUS 735: Business Decision Making and Research

Regression Analysis. BUS 735: Business Decision Making and Research Regression Analysis BUS 735: Business Decision Making and Research 1 Goals and Agenda Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators

x3,..., Multiple Regression β q α, β 1, β 2, β 3,..., β q in the model can all be estimated by least square estimators Multiple Regression Relating a response (dependent, input) y to a set of explanatory (independent, output, predictor) variables x, x 2, x 3,, x q. A technique for modeling the relationship between variables.

More information

One-Way ANOVA. Some examples of when ANOVA would be appropriate include:

One-Way ANOVA. Some examples of when ANOVA would be appropriate include: One-Way ANOVA 1. Purpose Analysis of variance (ANOVA) is used when one wishes to determine whether two or more groups (e.g., classes A, B, and C) differ on some outcome of interest (e.g., an achievement

More information

ECON 4230 Intermediate Econometric Theory Exam

ECON 4230 Intermediate Econometric Theory Exam ECON 4230 Intermediate Econometric Theory Exam Multiple Choice (20 pts). Circle the best answer. 1. The Classical assumption of mean zero errors is satisfied if the regression model a) is linear in the

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Where as simple linear regression has 2 variables (1 dependent, 1 independent): y ˆ = a + bx Multiple linear regression has >2 variables (1 dependent, many independent): ˆ

More information

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors. EDF 7405 Advanced Quantitative Methods in Educational Research Data are available on IQ of the child and seven potential predictors. Four are medical variables available at the birth of the child: Birthweight

More information

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti

Prepared by: Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Prepared by: Prof Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang M L Regression is an extension to

More information

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS The data used in this example describe teacher and student behavior in 8 classrooms. The variables are: Y percentage of interventions

More information

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES

4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES 4:3 LEC - PLANNED COMPARISONS AND REGRESSION ANALYSES FOR SINGLE FACTOR BETWEEN-S DESIGNS Planned or A Priori Comparisons We previously showed various ways to test all possible pairwise comparisons for

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Ch. 16: Correlation and Regression

Ch. 16: Correlation and Regression Ch. 1: Correlation and Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely to

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Unit 11: Multiple Linear Regression

Unit 11: Multiple Linear Regression Unit 11: Multiple Linear Regression Statistics 571: Statistical Methods Ramón V. León 7/13/2004 Unit 11 - Stat 571 - Ramón V. León 1 Main Application of Multiple Regression Isolating the effect of a variable

More information

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV)

Example: Forced Expiratory Volume (FEV) Program L13. Example: Forced Expiratory Volume (FEV) Example: Forced Expiratory Volume (FEV) Program L13 Relationships between two variables Correlation, cont d Regression Relationships between more than two variables Multiple linear regression Two numerical variables Linear or curved relationship?

More information

Sociology 593 Exam 1 Answer Key February 17, 1995

Sociology 593 Exam 1 Answer Key February 17, 1995 Sociology 593 Exam 1 Answer Key February 17, 1995 I. True-False. (5 points) Indicate whether the following statements are true or false. If false, briefly explain why. 1. A researcher regressed Y on. When

More information

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis STAT 3900/4950 MIDTERM TWO Name: Spring, 205 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis Instructions: You may use your books, notes, and SPSS/SAS. NO

More information

Multivariate Correlational Analysis: An Introduction

Multivariate Correlational Analysis: An Introduction Assignment. Multivariate Correlational Analysis: An Introduction Mertler & Vanetta, Chapter 7 Kachigan, Chapter 4, pps 180-193 Terms you should know. Multiple Regression Linear Equations Least Squares

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Econometrics Midterm Examination Answers

Econometrics Midterm Examination Answers Econometrics Midterm Examination Answers March 4, 204. Question (35 points) Answer the following short questions. (i) De ne what is an unbiased estimator. Show that X is an unbiased estimator for E(X i

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM

TA: Sheng Zhgang (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan (W 1:20) / 346 (Th 12:05) FINAL EXAM STAT 301, Fall 2011 Name Lec 4: Ismor Fischer Discussion Section: Please circle one! TA: Sheng Zhgang... 341 (Th 1:20) / 342 (W 1:20) / 343 (W 2:25) / 344 (W 12:05) Haoyang Fan... 345 (W 1:20) / 346 (Th

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Multiple Regression. Peerapat Wongchaiwat, Ph.D.

Multiple Regression. Peerapat Wongchaiwat, Ph.D. Peerapat Wongchaiwat, Ph.D. wongchaiwat@hotmail.com The Multiple Regression Model Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model

More information

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56 References Aljandali, A. (2014). Exchange rate forecasting: Regional applications to ASEAN, CACM, MERCOSUR and SADC countries. Unpublished PhD thesis, London Metropolitan University, London. Aljandali,

More information

10. Alternative case influence statistics

10. Alternative case influence statistics 10. Alternative case influence statistics a. Alternative to D i : dffits i (and others) b. Alternative to studres i : externally-studentized residual c. Suggestion: use whatever is convenient with the

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10) Name Economics 170 Spring 2004 Honor pledge: I have neither given nor received aid on this exam including the preparation of my one page formula list and the preparation of the Stata assignment for the

More information

Advanced Regression Topics: Violation of Assumptions

Advanced Regression Topics: Violation of Assumptions Advanced Regression Topics: Violation of Assumptions Lecture 7 February 15, 2005 Applied Regression Analysis Lecture #7-2/15/2005 Slide 1 of 36 Today s Lecture Today s Lecture rapping Up Revisiting residuals.

More information

13.1 Categorical Data and the Multinomial Experiment

13.1 Categorical Data and the Multinomial Experiment Chapter 13 Categorical Data Analysis 13.1 Categorical Data and the Multinomial Experiment Recall Variable: (numerical) variable (i.e. # of students, temperature, height,). (non-numerical, categorical)

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46

Dr. Junchao Xia Center of Biophysics and Computational Biology. Fall /1/2016 1/46 BIO5312 Biostatistics Lecture 10:Regression and Correlation Methods Dr. Junchao Xia Center of Biophysics and Computational Biology Fall 2016 11/1/2016 1/46 Outline In this lecture, we will discuss topics

More information

ECON 497 Midterm Spring

ECON 497 Midterm Spring ECON 497 Midterm Spring 2009 1 ECON 497: Economic Research and Forecasting Name: Spring 2009 Bellas Midterm You have three hours and twenty minutes to complete this exam. Answer all questions and explain

More information

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables

Structural Equation Modeling and Confirmatory Factor Analysis. Types of Variables /4/04 Structural Equation Modeling and Confirmatory Factor Analysis Advanced Statistics for Researchers Session 3 Dr. Chris Rakes Website: http://csrakes.yolasite.com Email: Rakes@umbc.edu Twitter: @RakesChris

More information

Business Statistics. Lecture 9: Simple Regression

Business Statistics. Lecture 9: Simple Regression Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals

More information

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang

Project Report for STAT571 Statistical Methods Instructor: Dr. Ramon V. Leon. Wage Data Analysis. Yuanlei Zhang Project Report for STAT7 Statistical Methods Instructor: Dr. Ramon V. Leon Wage Data Analysis Yuanlei Zhang 77--7 November, Part : Introduction Data Set The data set contains a random sample of observations

More information

The simple linear regression model discussed in Chapter 13 was written as

The simple linear regression model discussed in Chapter 13 was written as 1519T_c14 03/27/2006 07:28 AM Page 614 Chapter Jose Luis Pelaez Inc/Blend Images/Getty Images, Inc./Getty Images, Inc. 14 Multiple Regression 14.1 Multiple Regression Analysis 14.2 Assumptions of the Multiple

More information

LOOKING FOR RELATIONSHIPS

LOOKING FOR RELATIONSHIPS LOOKING FOR RELATIONSHIPS One of most common types of investigation we do is to look for relationships between variables. Variables may be nominal (categorical), for example looking at the effect of an

More information

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of Probability Sampling Procedures Collection of Data Measures

More information

Introduction to Statistical Analysis using IBM SPSS Statistics (v24)

Introduction to Statistical Analysis using IBM SPSS Statistics (v24) to Statistical Analysis using IBM SPSS Statistics (v24) to Statistical Analysis Using IBM SPSS Statistics is a two day instructor-led classroom course that provides an application-oriented introduction

More information

Review of Multiple Regression

Review of Multiple Regression Ronald H. Heck 1 Let s begin with a little review of multiple regression this week. Linear models [e.g., correlation, t-tests, analysis of variance (ANOVA), multiple regression, path analysis, multivariate

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Chapter 19: Logistic regression

Chapter 19: Logistic regression Chapter 19: Logistic regression Self-test answers SELF-TEST Rerun this analysis using a stepwise method (Forward: LR) entry method of analysis. The main analysis To open the main Logistic Regression dialog

More information

4/22/2010. Test 3 Review ANOVA

4/22/2010. Test 3 Review ANOVA Test 3 Review ANOVA 1 School recruiter wants to examine if there are difference between students at different class ranks in their reported intensity of school spirit. What is the factor? How many levels

More information

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization.

Ø Set of mutually exclusive categories. Ø Classify or categorize subject. Ø No meaningful order to categorization. Statistical Tools in Evaluation HPS 41 Dr. Joe G. Schmalfeldt Types of Scores Continuous Scores scores with a potentially infinite number of values. Discrete Scores scores limited to a specific number

More information

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers

Nominal Data. Parametric Statistics. Nonparametric Statistics. Parametric vs Nonparametric Tests. Greg C Elvers Nominal Data Greg C Elvers 1 Parametric Statistics The inferential statistics that we have discussed, such as t and ANOVA, are parametric statistics A parametric statistic is a statistic that makes certain

More information

Model Estimation Example

Model Estimation Example Ronald H. Heck 1 EDEP 606: Multivariate Methods (S2013) April 7, 2013 Model Estimation Example As we have moved through the course this semester, we have encountered the concept of model estimation. Discussions

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi)

Regression ( Kemampuan Individu, Lingkungan kerja dan Motivasi) Regression (, Lingkungan kerja dan ) Descriptive Statistics Mean Std. Deviation N 3.87.333 32 3.47.672 32 3.78.585 32 s Pearson Sig. (-tailed) N Kemampuan Lingkungan Individu Kerja.000.432.49.432.000.3.49.3.000..000.000.000..000.000.000.

More information

Correlation Analysis

Correlation Analysis Simple Regression Correlation Analysis Correlation analysis is used to measure strength of the association (linear relationship) between two variables Correlation is only concerned with strength of the

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Chs. 16 & 17: Correlation & Regression

Chs. 16 & 17: Correlation & Regression Chs. 16 & 17: Correlation & Regression With the shift to correlational analyses, we change the very nature of the question we are asking of our data. Heretofore, we were asking if a difference was likely

More information

FinQuiz Notes

FinQuiz Notes Reading 10 Multiple Regression and Issues in Regression Analysis 2. MULTIPLE LINEAR REGRESSION Multiple linear regression is a method used to model the linear relationship between a dependent variable

More information