FIN822 project 2 Project 2 contains part I and part II. (Due on November 10, 2008)

Similar documents
Scenario 5: Internet Usage Solution. θ j

FORECASTING SUGARCANE PRODUCTION IN INDIA WITH ARIMA MODEL

Minitab Project Report - Assignment 6

Advanced Quantitative Data Analysis

Basics: Definitions and Notation. Stationarity. A More Formal Definition

Lab: Box-Jenkins Methodology - US Wholesale Price Indicator

1 Correlation and Inference from Regression

The log transformation produces a time series whose variance can be treated as constant over time.

MODELING INFLATION RATES IN NIGERIA: BOX-JENKINS APPROACH. I. U. Moffat and A. E. David Department of Mathematics & Statistics, University of Uyo, Uyo

TIME SERIES ANALYSIS AND FORECASTING USING THE STATISTICAL MODEL ARIMA

Final Examination 7/6/2011

Using SPSS for One Way Analysis of Variance

TESTING FOR CO-INTEGRATION

at least 50 and preferably 100 observations should be available to build a proper model

Investigating Models with Two or Three Categories

Lecture 5: Estimation of time series

Firstly, the dataset is cleaned and the years and months are separated to provide better distinction (sample below).

Circle a single answer for each multiple choice question. Your choice should be made clearly.

Suan Sunandha Rajabhat University

Chapter 6: Model Specification for Time Series

Simple Linear Regression

Univariate ARIMA Models

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Decision 411: Class 9. HW#3 issues

University of Oxford. Statistical Methods Autocorrelation. Identification and Estimation

IT 403 Practice Problems (2-2) Answers

Computer simulation of radioactive decay

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay

Forecasting using R. Rob J Hyndman. 3.2 Dynamic regression. Forecasting using R 1

Problem Set 2: Box-Jenkins methodology

NANYANG TECHNOLOGICAL UNIVERSITY SEMESTER II EXAMINATION MAS451/MTH451 Time Series Analysis TIME ALLOWED: 2 HOURS

Regression of Inflation on Percent M3 Change

Introduction. Pre-Lab Questions: Physics 1CL PERIODIC MOTION - PART II Fall 2009

176 Index. G Gradient, 4, 17, 22, 24, 42, 44, 45, 51, 52, 55, 56

2 Prediction and Analysis of Variance

Time Series I Time Domain Methods

The ARIMA Procedure: The ARIMA Procedure

SPSS LAB FILE 1

MAT3379 (Winter 2016)

Chapter 19: Logistic regression

EDF 7405 Advanced Quantitative Methods in Educational Research MULTR.SAS

Introduction. Pre-Lab Questions: Physics 1CL PERIODIC MOTION - PART II Spring 2009

Inference with Simple Regression

STAT 436 / Lecture 16: Key

ECONOMETRIA II. CURSO 2009/2010 LAB # 3

Modeling and forecasting global mean temperature time series

Lecture Notes 12 Advanced Topics Econ 20150, Principles of Statistics Kevin R Foster, CCNY Spring 2012

Homework 2. For the homework, be sure to give full explanations where required and to turn in any relevant plots.

Prof. Dr. Roland Füss Lecture Series in Applied Econometrics Summer Term Introduction to Time Series Analysis

Ordinary Least Squares Regression Explained: Vartanian

1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.

Chapter 12: An introduction to Time Series Analysis. Chapter 12: An introduction to Time Series Analysis

Decision 411: Class 7

LAB 3 INSTRUCTIONS SIMPLE LINEAR REGRESSION

EDF 7405 Advanced Quantitative Methods in Educational Research. Data are available on IQ of the child and seven potential predictors.

Taguchi Method and Robust Design: Tutorial and Guideline

Ratio of Polynomials Fit One Variable

Problem Set 2 Solution Sketches Time Series Analysis Spring 2010

Analysis of Violent Crime in Los Angeles County

Topic 1. Definitions

Repeated-Measures ANOVA in SPSS Correct data formatting for a repeated-measures ANOVA in SPSS involves having a single line of data for each

Static and Kinetic Friction

Designing a Quilt with GIMP 2011

Motion II. Goals and Introduction

( ), which of the coefficients would end

Box-Jenkins ARIMA Advanced Time Series

Ch 6. Model Specification. Time Series Analysis

The Identification of ARIMA Models

Gravity: How fast do objects fall? Teacher Advanced Version (Grade Level: 8 12)

Univariate analysis. Simple and Multiple Regression. Univariate analysis. Simple Regression How best to summarise the data?

Univariate, Nonstationary Processes

Interactions and Centering in Regression: MRC09 Salaries for graduate faculty in psychology

Design of Time Series Model for Road Accident Fatal Death in Tamilnadu

Problems from Chapter 3 of Shumway and Stoffer s Book

Homework 4. 1 Data analysis problems

The data was collected from the website and then converted to a time-series indexed from 1 to 86.

Ch 8. MODEL DIAGNOSTICS. Time Series Analysis

Review of Multiple Regression

Designing Information Devices and Systems I Spring 2017 Babak Ayazifar, Vladimir Stojanovic Homework 2

The inductive effect in nitridosilicates and oxysilicates and its effects on 5d energy levels of Ce 3+

EASTERN MEDITERRANEAN UNIVERSITY ECON 604, FALL 2007 DEPARTMENT OF ECONOMICS MEHMET BALCILAR ARIMA MODELS: IDENTIFICATION

CHAPTER 8 MODEL DIAGNOSTICS. 8.1 Residual Analysis

Forecasting using R. Rob J Hyndman. 2.4 Non-seasonal ARIMA models. Forecasting using R 1

APPLIED ECONOMETRIC TIME SERIES 4TH EDITION

Analysis. Components of a Time Series

Stat 5100 Handout #12.e Notes: ARIMA Models (Unit 7) Key here: after stationary, identify dependence structure (and use for forecasting)

Read Section 1.1, Examples of time series, on pages 1-8. These example introduce the book; you are not tested on them.

Simulating Future Climate Change Using A Global Climate Model

A stochastic modeling for paddy production in Tamilnadu

Binary Dependent Variables

Classic Time Series Analysis

1 Introduction to Minitab

Marcel Dettling. Applied Time Series Analysis SS 2013 Week 05. ETH Zürich, March 18, Institute for Data Analysis and Process Design

477/577 In-class Exercise 5 : Fitting Wine Sales

STAT 153: Introduction to Time Series

A Data-Driven Model for Software Reliability Prediction

Econometrics for Policy Analysis A Train The Trainer Workshop Oct 22-28, 2016 Organized by African Heritage Institution

Modelling using ARMA processes

5:1LEC - BETWEEN-S FACTORIAL ANOVA

ARIMA Models. Jamie Monogan. January 25, University of Georgia. Jamie Monogan (UGA) ARIMA Models January 25, / 38

Transcription:

FIN822 project 2 Project 2 contains part I and part II. (Due on November 10, 2008) Part I Logit Model in Bankruptcy Prediction You do not believe in Altman and you decide to estimate the bankruptcy prediction model using the following logit model: y= Λ (a+b1* X1+b2* X2+b3* X3+b4* X4+b5* X5) + error term where variables are defined as: X1: re_ta=retainedearnings/totalassets X2: cacl_ta=(totalcurrentassetstotalcurrentliabilities)/totalassets X3: ebit_ta=earningsbeforeinterestandtaxes/totalassets (Do not use ROA) X4: totalliabilities/totalassets X5: totalrevenue/totalassets y= banc_indicator: a dummy variable that takes value of 1 if the firm goes bankrupt in year 2004 or 2005, and 0 otherwise. Download data that I have constructed from my web http://online.sfsu.edu/~donglin/altman.xls Say a few words on your sample (how many firms in total, how many went bankruptcy? etc). Report your estimated coefficients and the respective t-statistics (or p-values). Which variable(s) are significant at 0.05 levels? Do the results make sense? Based on your logit model output, what is the estimated bankruptcy probability for a firm with the following ratios: x1=0.2, x2=1, x3= - 20%, x4=0.7, x5=1? (Think of the in-class-work question on high school student graduation probability.) Tips: For illustration I will used an abbreviated version of the dataset which contains fewer observations. You should use the full sample.

Analyze>Regression>Binary Logistic 2

Coefficients are given in this table (you should expect different results): B are estimated coefficients, Sig. are the p-values. Ignore other statistics. Step 1 a x1 x2 x3 x4 x5 Variables in the Equation B S.E. Wald df Sig. Exp(B).077.202.144 1.704 1.080 -.034 1.646.000 1.984.967 -.792 1.236.411 1.521.453 2.983 1.308 5.206 1.023 19.754.215.127 2.859 1.091 1.240-7.196 1.123 41.093 1.000.001 Constant a. Variable(s) entered on step 1: x1, x2, x3, x4, x5. The numbers of non-bankrupt and bankrupt firms can be seen from this table: Classification Table(a,b) Predicted Percentage banc_indicator Correct Observed 0 1 0 Step banc_indicator 0 2122 0 100.0 0 1 14 0.0 Overall Percentage 99.3 a Constant is included in the model. b The cut value is.500 3

Part II. Conduct an analysis on a time series. Download data that I have constructed from http://online.sfsu.edu/~donglin/project2part2.xls Explain how you have chosen the number of lags (p=?, q=?) in your ARIMA model. Write down the dynamic structure for the original time series. A tutorial is attached below. You can estimate the time series using either of the two approaches illustrated in the example below. Time-Series Analysis Example Suppose you are given a time-series. The example data is available from my website at http://online.sfsu.edu/~donglin/project2_example.xls First, load in SPSS. If you see the following window, click cancel. Then open the excel file. Note that you should choose All Files (*.*) in the following window. Otherwise you may not see the excel file. 4

Click Ok if you see this. 5

Let s take a look at the time-series plot. Pull down the menu: Analyze> Times Series> Sequence Chart Then choose Y and X axis, and click OK Copy object and paste it below. The plot looks like this. 200.000000000000 150.000000000000 x 100.000000000000 50.000000000000 0.000000000000 6

If you want to see a better-looking picture with horizontal axis shown, you have to export the graph as a PDF file or PowerPoint file and then select-paste it. What a pain It will look like this. Clearly there is a trend in this series. So I can either (1) run a regression on time (t) and get the residuals, or (2) compute the first difference. 7

First approach: regress the series on t Analyze>regression>linear Choose X as dependent variable, t as independent variable. Also click Save 8

Select Unstandarized Residuals in the above window. This will save the residual from regression. Click Continue, OK Here is regression output: Coefficients(a) Unstandardized Coefficients Standardized Coefficients t Sig. Model B Std. Error Beta B Std. Error 1 (Constant) 3.375.466 7.239.000 t.795.004.997 197.647.000 a Dependent Variable: x You also see estimated residuals in the 3 rd column in the SPSS data window. (Below) 9

Now let s see a plot of the residual. It appears stationary. 10.00000 5.00000 Unstandardized Residual 0.00000-5.00000-10.00000 10

Then let s draw the autocorrelation function (ACF) and partial-autocorrelation function (PACF) graphs for this residual. Analyze>Times Series>Autocorrelations Choose the unstandarized residuals as our interested variables. Click OK. Unstandardized Residual 1.0 Coefficient Upper Confidence L Lower Confidence L 0.5 ACF 0.0-0.5-1.0 11

Unstandardized Residual 1.0 Coefficient Upper Confidence L Lower Confidence L 0.5 Partial ACF 0.0-0.5-1.0 From the above graphs, the ACF decays gradually while the PACF stands out only for lag 1. So a ARIMA(1,0,0)=ARMA(1,0)=AR(1) model may be appropriate. You also see some info like this 12

Series: Unstandardized Residual Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Autocorrelations Autocorrel Box-Ljung Statistic ation Std. Error a Value df Sig. b.448.070 40.781 1.000.187.070 47.903 2.000.081.070 49.240 3.000.023.070 49.352 4.000.049.069 49.848 5.000 -.015.069 49.893 6.000 -.125.069 53.161 7.000 -.090.069 54.859 8.000 -.083.069 56.317 9.000 -.074.069 57.472 10.000 -.018.068 57.540 11.000 -.120.068 60.642 12.000 -.070.068 61.689 13.000 -.093.068 63.567 14.000 -.069.068 64.594 15.000 -.085.067 66.166 16.000 a. The underlying process assumed is independence (white noise). b. Based on the asymptotic chi-square approximation. The p-value (Sig.) based on Box-Ljung statistic is significant for each lag. This means the time-series of unstandarized residuals are significantly different from a white noise process by examining all autocorrelations up to that lag. (We already know this because, from the two graphs above, the ACF and PACF at lag 1 are significant.) Then run the ARIMA model, Analyze>Times Series> ARIMA 13

Choose 1 for p and click OK. Choose Unstandarized residuals as the variable Then you will see this. It tells us that SPSS will add some residuals, fitted variable, and confidence intervals, to our dataset. Click OK. Here is the ARIMA coefficient estimate: Parameter Estimates Estimates Std Error t Approx Sig Non-Seasonal Lags AR1.451.063 7.112.000 Constant.011.376.029.977 Melard's algorithm was used for estimation. (About the constant term: When d=0 a constant term equals to the mean of the time series. When d=1 a constant term reflects the non-zero average trend of the original time series.) 14

In the data window we also see that the residual from ARIMA modeling, under the name ERR_1 Did our model do a good job? Let s see the ACF and PACF of the new residual ERR_1 (This is the residual from ARIMA model, not the one from linear regression) Analyze>Times Series> Autocorrelations... Then update the name of our interested variable (as shown in the window below 15

Click OK. Error for RES_1 from ARIMA, MOD_2, CON 1.0 Coefficient Upper Confidence L Lower Confidence L 0.5 ACF 0.0-0.5-1.0 Error for RES_1 from ARIMA, MOD_2, CON 1.0 Coefficient Upper Confidence L Lower Confidence L 0.5 Partial ACF 0.0-0.5-1.0 No significant ACF or PACF, so the residual from our ARIMA modeling seems to be white noise. Our modeling job has finished. 16

What is the structure of the original time series? Let me call the residual from the linear regression (first step) z. So from ARIMA modeling: Parameter Estimates Estimates Std Error t Approx Sig Non-Seasonal Lags AR1.451.063 7.112.000 Constant.011.376.029.977 Melard's algorithm was used for estimation. (About the constant term: When d=0 a constant term equals to the mean of the time series. When d=1 a constant term reflects the non-zero average trend of the original time series.) We have: ( z 0.011) = 0.451*( z 0.011) + ε (1) t t 1 t From step 1 linear regression, we know that x = 3.375 + 0.795* t+ z (2) t t Combing the two equations (1) and (2) together, we have ( x 3.375 0.795* t 0.011) = 0.451*( x 3.375 0.795*( t 1) 0.011) + ε t t 1 t That can be simplified as: x = 2.217459 + 0.451* x + 0.43646* t+ ε (3) t t 1 t (3) is our description of the original time-series. 17

Second approach: Using the first difference Open the excel data file, compute first difference and call the variable u. Save the file. In SPSS, open the excel file just saved. Files of Type should choose All Files (*.*) Click OK 18

Plot the differenced series u. Analyze>Times Series>Sequence chart It appears stationary. 10.000000000000 5.000000000000 u 0.000000000000-5.000000000000-10.000000000000 Then plot the ACF and PACF graph for u ( u is the first difference of the original series ) 19

u 1.0 Coefficient Upper Confidence L Lower Confidence L 0.5 ACF 0.0-0.5-1.0 u 1.0 Coefficient Upper Confidence L Lower Confidence L 0.5 Partial ACF 0.0-0.5-1.0 The ACF and PACF both seems to decay over lags. So I choose an ARMA(1,1) model. (Some people might think it is an MA(1) but you will find out the residual from an MA(1) does not really follow a white noise. Sometimes the estimation does require try-and-error. ) 20

To estimate the parameters, we go Analyze > Times Series > ARIMA Select the differenced series u as dependent variable.then choose p=1 and q=1 in the window below. Alternatively, you can specify: p=1, d=1, and q=1 for the original series x. You will get same output. 21

Click Ok Click Ok Below is our model parameter estimates Parameter Estimates Estimates Std Error t Approx Sig Non-Seasonal AR1.470.068 6.888.000 Lags MA1.986.036 27.593.000 Constant.794.009 89.118.000 Melard's algorithm was used for estimation. (About the constant term: When d=0 a constant term equals to the mean of the time series. When d=1 a constant term reflects the non-zero average trend of the original time series.) Did the model do a good job? We can check the ACF, PACF of the newly obtained residual and the Box- Ljung statistics. 22

Error for u from ARIMA, MOD_1, CON Error for u from ARIMA, MOD_1, CON 1.0 Coefficient Upper Confidence L Lower Confidence L 1.0 Coefficient Upper Confidence L Lower Confidence L 0.5 0.5 ACF 0.0 Partial ACF 0.0-0.5-0.5-1.0-1.0 Judging from the ACF and PACF the residual from ARIMA seems to be white noise, also the Box-Ljung statistics are not significant, so we can stop now. Autocorrelations Series: Error for u from ARIMA, MOD_1, CON Lag 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Autocorrel Box-Ljung Statistic ation Std. Error a Value df Sig. b -.011.070.023 1.878 -.021.070.111 2.946.021.070.199 3.978 -.042.070.567 4.967.072.070 1.643 5.896.041.069 1.991 6.920 -.119.069 4.922 7.669 -.003.069 4.924 8.766 -.027.069 5.077 9.828 -.054.069 5.686 10.841.083.069 7.136 11.788 -.128.068 10.622 12.562.012.068 10.652 13.640 -.072.068 11.784 14.624.002.068 11.785 15.695 -.060.068 12.566 16.704 a. The underlying process assumed is independence (white noise). b. Based on the asymptotic chi-square approximation. 23

What is the structure of the original time series? From ARIMA modeling, we have Parameter Estimates Estimates Std Error t Approx Sig Non-Seasonal AR1.470.068 6.888.000 Lags MA1.986.036 27.593.000 Constant.794.009 89.118.000 Melard's algorithm was used for estimation. (About the constant term: When d=0 a constant term equals to the mean of the time series. When d=1 a constant term reflects the non-zero average trend of the original time series.) So, ( u 0.794) = 0.470( u 0.794) + ε 0.986* ε (4) t t 1 t t 1 By the way, SPSS tries to behave special and naughty, so unlike the general assumption as in other softwares which assume: X = φ X + φ X +... + φ X + ε + θε + θ ε +... + θ ε SPSS would assume a form: X = φ X + φ X +... + φ X + ε θε θ ε... θ ε That is why you see a negative - 0.986 for the coefficient before In step 1, we took a first difference. That means: Combining the (4) and (5), we get: t 1 t 1 2 t 2 p t p t 1 t 1 2 t 2 q t q t 1 t 1 2 t 2 p t p t 1 t 1 2 t 2 q t q εt 1 xt = xt 1 + ut (5) x = 0.421+ 1.47x 0.47x + ε 0.986 ε (6) t t 1 t 2 t t 1 (6) is our description of the original time-series. 24

Final comments Using two approaches we have derived the structure (3) and (6) respectively for the original time series. Are the structure (3) and (6) similar? (Did we get similar results using two different approaches?) Below I will show you that the two results indeed are very close. x = 2.217459 + 0.451x + 0.43646* t+ ε (3) t t 1 t From (3) we can write xt 1 as x = 2.217459 + 0.451x + 0.43646( t 1) + ε (3') t 1 t 2 t 1 Subtract (3 ) from (3), we get x x = 0.451x 0.451x + 0.43646 + ε ε t t 1 t 1 t 2 t t 1 Which can be simplified to x = 0.43646 + 1.451x 0.451x + ε ε t t 1 t 2 t t 1 This indeed is very close to equation (6) we derived using the second approach. 25