Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX. confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate)

Similar documents
Statistics 5100 Spring 2018 Exam 1

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

STAT 512 MidTerm I (2/21/2013) Spring 2013 INSTRUCTIONS

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

General Linear Model (Chapter 4)

UNIVERSITY OF TORONTO. Faculty of Arts and Science APRIL 2010 EXAMINATIONS STA 303 H1S / STA 1002 HS. Duration - 3 hours. Aids Allowed: Calculator

MATH 644: Regression Analysis Methods

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Lecture 11 Multiple Linear Regression

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 525 Fall Final exam. Tuesday December 14, 2010

Scenario 5: Internet Usage Solution. θ j

ST430 Exam 2 Solutions

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

The ARIMA Procedure: The ARIMA Procedure

STAT 350: Summer Semester Midterm 1: Solutions

Final Exam. Question 1 (20 points) 2 (25 points) 3 (30 points) 4 (25 points) 5 (10 points) 6 (40 points) Total (150 points) Bonus question (10)

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

Lecture 3: Inference in SLR

Autocorrelation or Serial Correlation

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

Stat 500 Midterm 2 12 November 2009 page 0 of 11

Simple logistic regression

Outline. Review regression diagnostics Remedial measures Weighted regression Ridge regression Robust regression Bootstrapping

STATISTICS 110/201 PRACTICE FINAL EXAM

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Inferences for Regression

Stat 5102 Final Exam May 14, 2015

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

LINEAR REGRESSION. Copyright 2013, SAS Institute Inc. All rights reserved.

5.3 Three-Stage Nested Design Example

Univariate ARIMA Models

STAT 3900/4950 MIDTERM TWO Name: Spring, 2015 (print: first last ) Covered topics: Two-way ANOVA, ANCOVA, SLR, MLR and correlation analysis

UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Applied Statistics Friday, January 15, 2016

Topic 20: Single Factor Analysis of Variance

ZERO INFLATED POISSON REGRESSION

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

Problem Set 2: Box-Jenkins methodology

Introduction to Regression

Ch 2: Simple Linear Regression

Lecture 10 Multiple Linear Regression

Chapter 1 Linear Regression with One Predictor

STAT 501 EXAM I NAME Spring 1999

a. YOU MAY USE ONE 8.5 X11 TWO-SIDED CHEAT SHEET AND YOUR TEXTBOOK (OR COPY THEREOF).

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

STATISTICS 479 Exam II (100 points)

Chapter 8 Quantitative and Qualitative Predictors

Modeling Machiavellianism Predicting Scores with Fewer Factors

Swarthmore Honors Exam 2012: Statistics

Lecture 6 Multiple Linear Regression, cont.

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

A discussion on multiple regression models

STOR 455 STATISTICAL METHODS I

Use of Dummy (Indicator) Variables in Applied Econometrics

Overview Scatter Plot Example

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

Exam Applied Statistical Regression. Good Luck!

Count data page 1. Count data. 1. Estimating, testing proportions

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Section 9c. Propensity scores. Controlling for bias & confounding in observational studies

ST Correlation and Regression

APPLIED MACROECONOMETRICS Licenciatura Universidade Nova de Lisboa Faculdade de Economia. FINAL EXAM JUNE 3, 2004 Starts at 14:00 Ends at 16:30

Topic 18: Model Selection and Diagnostics

5. Let W follow a normal distribution with mean of μ and the variance of 1. Then, the pdf of W is

Problem #1 #2 #3 #4 #5 #6 Total Points /6 /8 /14 /10 /8 /10 /56

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

ST430 Exam 1 with Answers

5 Transfer function modelling

STAT 212 Business Statistics II 1

ANOVA Longitudinal Models for the Practice Effects Data: via GLM

Simple Linear Regression: One Qualitative IV

Simple Linear Regression: One Quantitative IV

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

CHAPTER 1: BINARY LOGIT MODEL

Exam ECON3150/4150: Introductory Econometrics. 18 May 2016; 09:00h-12.00h.

Correlation and Regression

Multicollinearity Exercise

Eco 391, J. Sandford, spring 2013 April 5, Midterm 3 4/5/2013

Testing methodology. It often the case that we try to determine the form of the model on the basis of data

Lab 11. Multilevel Models. Description of Data

Final Examination 7/6/2011

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Section 2 NABE ASTEF 65

3 Variables: Cyberloafing Conscientiousness Age

Inference for Regression Inference about the Regression Model and Using the Regression Line

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

Chapter 6 Multiple Regression

Unit 11: Multiple Linear Regression

MULTICOLLINEARITY AND VARIANCE INFLATION FACTORS. F. Chiaromonte 1

Final Exam. 1. Definitions: Briefly Define each of the following terms as they relate to the material covered in class.

LI EAR REGRESSIO A D CORRELATIO

Suan Sunandha Rajabhat University

Group Comparisons: Differences in Composition Versus Differences in Models and Effects

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Psychology Seminar Psych 406 Dr. Jeffrey Leitzel

Final Exam. Name: Solution:

Transcription:

Statistics 5100 Exam 2 (Practice) Directions: Be sure to answer every question, and do not spend too much time on any part of any question. Be concise with all your responses. Partial SAS output and statistical tables are found in an accompanying handout. For any tests of significance, use α=0.05. The point value of each question is given, and the points sum to 100. Good luck! (Q1) (1 point) What is your name? Possibly useful formulas for this exam: b1 = Corr(X,Y) SDY / SDX b0 = Y b 1 X ei = Y i Y i confidence interval: Estimate ± (Critical Value) (Standard Error of Estimate) regression equation: Y i = β 0 + β 1 X i1 + β 2 X i2 + + β p 1 X i,p 1 + ε i (SSE reduced SSE full ) F p q = SSE full n p o p = # β s in full model (incl. intercept) o q = # β s in reduced model (incl. intercept) o n = sample size SSR(U V) = SSE(V) SSE(U, V) R 2 = SS model = 1 SSE SS total SS total 2 R adj = 1 n 1 SSE, where p = # predictors in model n p SS total First difference: Yt Yt-1 = Yt B Yt = (1 B) Yt (1 B) d Yt = (β0 + β1 Xt1 + + βk-1 Xt,k-1) + (1 ϕ1 B - ϕp Bp) -1 (1 θ1 B θq Bq) at o p : value at time t depends on previous p values o d : # of differences applied o q : value at time t depends on previous q random shocks Li = log π i = β 1 π 0 + β 1 X i1 + β 2 X i2 + + β p 1 X i,p 1 i OR j = e b j π i = 1/(1 + e L i)

Data: This exam will consider the following two data sets: I. The Out-of-state tuition data set, where variables in the following table are recorded on 109 public doctoral-granting (Type I) universities for the 1993-1994 academic year. Variable Name Interpretation ostuit out-of-state tuition (in $'s) rbcost room and board costs (in $'s) sfratio student-to-faculty ratio palumdon percent of alumni who donate instexp instructional expenditure per student (in $'s) gradrate graduation rate fullcom average compensation for full professors (in $100's) fullsal average salary for full professors (in $100's) fullnum number of full professors high_ostuit indicator for high ostuit (1 if ostuit > 8,000; 0 otherwise) eastms indicator for being east of the Mississippi River (1 if yes; 0 otherwise) II. The Viscosity data set, where measurements have been taken on the viscosity of a chemical product known as XR-22 for 150 consecutive days. A chemical s viscosity is essentially its resistance to flow, or a measure of its consistency or texture. The variable daily is the only one in this data set. Of primary interest with this data set is better understanding the behavior of Daily over time, so that useful short-term predictions can be made.

Data Set I: Out-of-state tuition (Q2) A researcher suspects that, on average, universities with higher student-to-faculty ratios tend to have lower out-of-state tuition, so she regresses ostuit on sfratio (see partial SAS output for Model 1 ), assuming the following linear model: ostuiti = β0 + β1 sfratioi + εi (a) (6 points) In terms of the appropriate parameter(s) in this model, write out the null and alternative hypotheses of interest to this researcher. Also report the appropriate P-value. (b) (4 points) Based on this P-value, what can the researcher conclude regarding her suspicion, in the context of these data? (Be as specific as the null and alternative will allow.) (c) (6 points) Why can (or cannot) the researcher trust this conclusion? (Refer to the output.)

(Q3) (9 points) Referring to the P-value you reported in Q2 part a, what is the correct interpretation of this P-value, in the context of these data? (Not what conclusion would you reach based on its value [that s what you reported in Q2 part b], but what does the number itself actually mean?) (Q4) (4 points) In the Model 1 output, one plot refers to Cook s D. Explain clearly what this represents (and how it is used) in linear regression. (No credit will be given for formulas or specific numeric thresholds.) (Q5) (6 points) Model 2 involves ostuit being regressed on eight predictors. In the resulting output, which predictors, if any, appear to be collinear? Explain your response using specific numbers from the output. (Q6) (3 points; there is no SAS output for this question) Give one reason why multicollinearity is a problem in multiple linear regression. (For example, what will happen to the model if multicollinearity is not resolved?)

(Q7) (5 points) If you know there is collinearity involving two predictors, what does that tell you about their interaction in a linear regression model? (No output is provided for this question.) (Q8) (5 points) What proportion of the variation in out-of-state tuition is explained by its linear relationship with the eight predictors in Model 2? (Q9) (5 points) Model 3 involves ostuit being regressed on five predictors. Models 1-3 were fit using 80 randomly selected universities (of the 109 possible). The Model 3 fit was used to calculate predicted ostuit values for the remaining 29 universities, and their mean squared prediction error (MSPR) was calculated to be about 3.6 million (not shown in output). What can be concluded from this MSPR value? (Provide numerical justification.) (Q10) (6 points) One of the parameter estimates in the Model 3 output is reported as 62.09706. What is the correct interpretation of this number?

(Q11) (8 points) Model 4 is the logistic regression model used to predict the probability of high out-of-state tuition (high_ostuit=1) based on which side of the Mississippi River the university falls (eastms). (a) (6 points) What is the correct interpretation of the estimated eastms effect? (b) (2 points) What can be concluded from the plot following the Model 4 output? (c) (6 points) Based on the Model 4 fit, what is the predicted probability that Utah State University would have high out-of-state tuition? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Data Set II: Viscosity (Q12) (6 points) The Viscosity ARIMA Output reports the PROC ARIMA output for these data, with no predictor variables used. Referring to this output, what evidence is there of stationarity?

(Q13) (3 points) Referring again to the Viscosity ARIMA Output, what numerical evidence is there of significant dependence structure? (Q14) (8 points) Someone has proposed AR(2), MA(2), and ARIMA(2,0,1) models as possibly appropriate for these data. Based on the Viscosity ARIMA Output SAC and SPAC plots, which of these models appears most appropriate, and why? (Q15) (8 points) Each of the three proposed models are fit to the data (with partial output reported in the handout). For the model you selected in Q14, what can you say about its adequacy? (Refer to both numerical and graphical output.) (Q16) (1 point) What topic(s) did you study most that did not appear on this exam?

Output and Tables for STAT 5100 Exam 2 (Practice) Model 1 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 1 16321286 16321286 3.44 0.0674 Error 78 370054998 4744295 Corrected Total 79 386376284 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 9601.52327 1099.56418 8.73 <.0001 sfratio 1-123.41286 66.53790-1.85 0.0674

Model 2 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 8 160027308 20003413 6.27 <.0001 Error 71 226348976 3188014 Corrected Total 79 386376284 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 2995.05637 2436.35422 1.23 0.2230 rbcost 1 0.87707 0.32795 2.67 0.0093 sfratio 1-132.74237 82.35454-1.61 0.1114 palumdon 1 16.09292 27.80723 0.58 0.5646 instexp 1-0.20465 0.11864-1.72 0.0889 gradrate 1 50.95141 17.13040 2.97 0.0040 fullcom 1-0.83622 9.22880-0.09 0.9281 fullsal 1 3.48162 12.01909 0.29 0.7729 fullnum 1 1.31827 1.17464 1.12 0.2655 Model 2 output continued on next page

Model 2 (continued) Collinearity Diagnostics Number Eigenvalue Condition Index Proportion of Variation Intercept rbcost sfratio palumdon instexp gradrate fullcom fullsal fullnum 1 8.37381 1.00000 0.00009363 0.00029188 0.00028031 0.00213 0.00036342 0.00055057 0.00001071 0.00000965 0.00165 2 0.23825 5.92852 0.00149 0.00139 0.01888 0.00871 0.00856 0.00066788 0.00001840 0.00001889 0.36484 3 0.20808 6.34376 0.00020922 0.00780 0.00036921 0.71506 0.00818 0.00015641 0.00004617 0.00004065 0.00507 4 0.10011 9.14604 0.00047194 0.00960 0.06802 0.09529 0.09012 0.00352 1.605633E-8 2.004957E-7 0.33789 5 0.04251 14.03519 0.00779 0.00353 0.00499 0.04031 0.06069 0.77085 1.66882E-7 4.048249E-7 0.04380 6 0.01867 21.17668 0.04379 0.76989 0.00948 0.09012 0.05708 0.00372 0.00294 0.00214 0.09911 7 0.01349 24.91605 0.00020344 0.06475 0.36811 0.00003072 0.50205 0.12228 0.01235 0.01110 0.00865 8 0.00471 42.14376 0.91676 0.10491 0.51430 0.04569 0.25969 0.09825 0.01626 0.00827 0.13665 9 0.00036784 150.88095 0.02920 0.03783 0.01558 0.00266 0.01327 0.00000103 0.96838 0.97841 0.00234

Model 3 Number of Observations Read 80 Number of Observations Used 80 Analysis of Variance Source DF Sum of Squares Mean Square F Value Pr > F Model 5 152598353 30519671 9.66 <.0001 Error 74 233777931 3159161 Corrected Total 79 386376284 Root MSE 1777.40294 R-Square 0.3949 Dependent Mean 7612.72500 Adj R-Sq 0.3541 Coeff Var 23.34779 Parameter Estimates Variable DF Parameter Estimate Standard Error t Value Pr > t Intercept 1 3540.96295 1897.63263 1.87 0.0660 rbcost 1 0.83282 0.29091 2.86 0.0055 sfratio 1-114.16588 79.60894-1.43 0.1558 palumdon 1 20.61485 27.44613 0.75 0.4550 instexp 1-0.13101 0.10510-1.25 0.2165 gradrate 1 62.09706 14.65924 4.24 <.0001

Model 4 Response Profile Ordered Value high_ostuit Total Frequency 1 0 68 2 1 41 Probability modeled is high_ostuit=1. Convergence criterion (GCONV=1E-8) satisfied. Analysis of Maximum Likelihood Estimates Parameter DF Estimate Standard Error Wald Chi-Square Pr > ChiSq Intercept 1-1.0986 0.3333 10.8625 0.0010 eastms 1 1.0002 0.4205 5.6567 0.0174 Odds Ratio Estimates Effect Point Estimate 95% Wald Confidence Limits eastms 2.719 1.192 6.199

Viscosity ARIMA Output The ARIMA Procedure Name of Variable = daily Mean of Working Series 35.20133 Standard Deviation 2.922008 Number of Observations 150 Autocorrelation Check for White Noise To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 46.31 6 <.0001-0.415 0.319 0.049 0.004-0.114 0.109 12 50.46 12 <.0001-0.110 0.000 0.037-0.042-0.083 0.059

AR(2) model fit to Viscosity Data Unconditional Least Squares Estimation Parameter Estimate Standard Error t Value Approx Pr > t Lag MU 35.18996 0.18534 189.87 <.0001 0 AR1,1-0.34636 0.08094-4.28 <.0001 1 AR1,2 0.18551 0.08148 2.28 0.0242 2 Constant Estimate 40.85019 Variance Estimate 6.944562 Std Error Estimate 2.635254 AIC 719.6145 SBC 728.6464 Number of Residuals 150 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 21.80 4 0.0002-0.046 0.099 0.310-0.120-0.110 0.077 12 28.68 10 0.0014-0.107-0.043 0.080-0.093-0.107 0.050 18 39.04 16 0.0011-0.089 0.006 0.062-0.119 0.173-0.070 24 49.34 22 0.0007 0.018 0.142-0.097 0.112 0.033-0.120 30 50.51 28 0.0057 0.004-0.014-0.066-0.000 0.041-0.009 AR(2) model fit output continued on next page

AR(2) model fit to Viscosity Data (continued)

MA(2) model fit to Viscosity Data Parameter Unconditional Least Squares Estimation Estimate Standard Error t Value Approx Pr > t Lag MU 35.18224 0.22168 158.71 <.0001 0 MA1,1 0.52381 0.06306 8.31 <.0001 1 MA1,2-0.65701 0.06366-10.32 <.0001 2 Constant Estimate 35.18224 Variance Estimate 5.792187 Std Error Estimate 2.406696 AIC 693.363 SBC 702.3949 Number of Residuals 150 Autocorrelation Check of Residuals To Lag Chi- Square DF Pr > ChiSq Autocorrelations 6 2.36 4 0.6697 0.084-0.049 0.014-0.012-0.074 0.001 12 10.54 10 0.3943-0.090 0.028 0.089-0.090-0.159-0.002 18 15.47 16 0.4907 0.015 0.031 0.021-0.102 0.127-0.024 24 22.09 22 0.4545-0.011 0.052-0.008 0.141 0.017-0.119 30 23.72 28 0.6963-0.031 0.005-0.047-0.026 0.044 0.054 MA(2) model fit output continued on next page

MA(2) model fit to Viscosity Data (continued)

ARIMA(2,0,1) model fit to Viscosity Data Parameter Unconditional Least Squares Estimation Estimate Standard Error t Value Approx Pr > t Lag MU 35.18556 0.24078 146.13 <.0001 0 MA1,1 0.46378 0.24570 1.89 0.0611 1 AR1,1 0.10739 0.22678 0.47 0.6365 1 AR1,2 0.42498 0.09920 4.28 <.0001 2 Constant Estimate 16.45382 Variance Estimate 6.714524 Std Error Estimate 2.59124 AIC 715.6398 SBC 727.6824 Number of Residuals 150 Autocorrelation Check of Residuals To Lag Chi- Square DF Pr > ChiSq Autocorrelations 6 18.59 3 0.0003-0.047 0.050 0.244-0.159-0.166 0.047 12 25.62 9 0.0024-0.111-0.034 0.087-0.089-0.111 0.045 18 34.85 15 0.0026-0.077 0.010 0.060-0.113 0.164-0.070 24 44.82 21 0.0018 0.016 0.136-0.085 0.116 0.033-0.125 30 46.20 27 0.0121-0.006-0.014-0.065 0.005 0.054 0.005 ARIMA(2,0,1) model fit continued on next page

ARIMA(2,0,1) model fit to Viscosity Data (continued)