1
OUTLINE Analysis of Data and Model Hypothesis Testing Dummy Variables Research in Finance 2
ANALYSIS: Types of Data Time Series data Cross-Sectional data Panel data Trend Seasonal Variation Cyclical Variation Irregular Variation 1-dimensional Data set Observing many subjects (size, company, counties, etc) at the same time Multi-dimensional data set Time-Series + Cross- Sectional Data MULTIPLE REGRESSION 3
Trend Component Persistent, overall upward or downward pattern Due to population, technology etc. Several years duration Response Mo., Qtr., Yr. 1984-1994 T/Maker Co.
Trend Component Overall Upward or Downward Movement Data Taken Over a Period of Years Sales Time
Cyclical Component Repeating up & down movements Due to interactions of factors influencing economy Usually 2-10 years duration Response Cycle Mo., Qtr., Yr.
Cyclical Component Upward or Downward Swings May Vary in Length Usually Lasts 2-10 Years Sales Time
Seasonal Component Regular pattern of up & down fluctuations Due to weather, customs etc. Occurs within one year Response Summer Mo., Qtr. 1984-1994 T/Maker Co.
Seasonal Component Upward or Downward Swings Regular Patterns Observed Within One Year Sales Time (Monthly or Quarterly)
Irregular Component Erratic, unsystematic, residual fluctuations Due to random variation or unforeseen events Union strike War Short duration & nonrepeating
Apr-75 May-76 Jun-77 Jul-78 Aug-79 Sep-80 Oct-81 Nov-82 Dec-83 Jan-85 Feb-86 Mar-87 Apr-88 May-89 Jun-90 Jul-91 Aug-92 Sep-93 Oct-94 Nov-95 Dec-96 Jan-98 Feb-99 Mar-00 Apr-01 May-02 Jun-03 Jul-04 Aug-05 Sep-06 Oct-07 Nov-08 Dec-09 Jan-11 Feb-12 Mar-13 Apr-14 May-15 Jun-16 Jul-17 Time Series Data SET Index 2000 1800 1600 1400 1200 1000 800 600 400 200 0
Cross Sectional Data
Pool (Panel) Data
ANALYSIS: Type of Estimator Least Square Estimator Maximum Likelihood Estimator Y i = β 1 + β 2 X 1i + β 3 X 2i + u i 14
ANALYSIS: Type of Model Linear model Non Linear Model DTAC t = α + β 1 X 1t + β 2 X 2t + ε t Y t = AIS RETURN 15
ANALYSIS: Fitted Regression on Model Y ~ regressand var response var dependent var observed var Y = a + b x X ~ regressor independent variable explanatory variable predictor Variable Time series Time-Series with Condition Panel Model Multiple Regression ARMA/ ARIMA ARCH/GARCH Pooled or Panel Model Fixed-Effect Model Random-Effect Model 16
ANALYSIS: Fitted Regression on Model Y = a + b x Logit Model Y is discrete Probit Model 17
ANALYSIS: Fitted Regression on Model Y = a + b x Y and X are Dynamic Vector Auto Regression (VAR) Error Correction Model (ECM) 18
ANALYSIS: Expansion from Simple Regression to Multiple Regression FITTED REGRESSION MODEL Y = a + b x 19
x is the independent variable y is the dependent variable The regression model is simple linear regression y 0 1 x The model has two variables, the independent or explanatory variable, x, and the dependent variable y, the variable whose variation is to be explained. The relationship between x and y is a linear or straight line relationship. Two parameters to estimate the slope of the line β 1 and the y- intercept β 0 (where the line crosses the vertical axis). ε is the unexplained, random, or error component. Much more on this later.
The regression model is Regression line y 0 1 x Data about x and y are obtained from a sample. From the sample of values of x and y, estimates b 0 of β 0 and b 1 of β 1 are obtained using the least squares or another method. The resulting estimate of the model is ŷ yˆ b0 b1 x The symbol is termed y hat and refers to the predicted values of the dependent variable y that are associated with values of x, given the linear model.
Income hrs/week Income hrs/week 8000 38 8000 35 6400 50 18000 37.5 2500 15 5400 37 3000 30 15000 35 6000 50 3500 30 5000 38 24000 45 8000 50 1000 4 4000 20 8000 37.5 11000 45 2100 25 25000 50 8000 46 4000 20 4000 30 8800 35 1000 200 5000 30 2000 200 7000 43 4800 30
Summer Income as a Function of Hours Worked 30000 25000 20000 Income 15000 10000 5000 0 0 10 20 30 40 50 60 Hours per Week
yˆ 2461 297x R 2 = 0.311
Outliers Rare, extreme values may distort the outcome. Could be an error. Could be a very important observation. Outlier: more than 3 standard deviations from the mean.
GPA vs. Time Online 12 10 8 Time Online 6 4 2 0 50 55 60 65 70 75 80 85 90 95 100 GPA
GPA vs. Time Online 9 8 7 6 Time Online 5 4 3 2 1 0 50 55 60 65 70 75 80 85 90 95 100 GPA
U-Shaped Relationship 12 10 OMITTED VARIABLE Correlation = +0.12. 8 Y 6 4 2 0 0 2 4 6 8 10 12 X
TESTING MULTIPLE HYPOTHESIS: F-test F-Test is of interest to test more than one coefficient simultaneously. F-Test Conditional to Reject H0: Significant if p-value < 0.05 31
TESTING MULTIPLE HYPOTHESIS: t-test t-test is of interest to test ONLY one coefficient t-test Conditional to Reject H0: Significant if p-value < 0.05 Oh my gosh!!!! It fails to reject H 0, what does it mean? What I should do? Cut it or leave it? 32
Example I: Stock Asset Price Regression TMB 1990M01 2011 M12 RP1 BBL NPL FRN JAS DJ NIKKEI 33
Example II: Hedonic Pricing Model Dependent Variable : Y ~ Rental Values Definitions 34
TESTING MULTIPLE HYPOTHESIS: Goodness of Fit Testing R 2 R 2 is desirable to answer how well regression model actually fits the data In other words, R 2 is desirable to answer how well does the model containing the explanatory variables 0 R 2 1 R 2 = 1 0 < R 2 < 1 35
TESTING MULTIPLE HYPOTHESIS: Problem with using R 2 Cannot compare R 2 of two models with same X but change Y R 2 never falls if more regressors are added to the regression R 2 2 R 1 2 R2 can take values of 0.9 or higher for time series regressions, and hence it is not good at discrimanating between models 36
TESTING MULTIPLE HYPOTHESIS: Adjusted R 2 If an extra regressor is added to the model, k increases and unless R2 increases by a more than off-setting amount, will actually fall. If model contains a lot of significant and insignificant variables, can be negative 37
DUMMY VARIABLE: How to Create Dummy Dummy is variables that assume such 0 and 1 values If a model contains M categories, then only M-1 dummy variables should be created. Otherwise, multicollinearity Problem Category for which no dummy variable is assigned is known as base, benchmark 2 types of dummy variables: Intercept vs. slope change dummy 38
DUMMY VARIABLE: 2 Type of Dummy Variables I. Different Intercept II. Different Slope R t R f = α + β 1 R M R f + β 2 SMB + β 3 HML + β 4 JAN RENT t = α + β 1 LNAGE + β 2 NOROOM + β 3 DIST + β 4 DDIST JAN is dummy = 1 if January = 0 otherwise D is dummy = 1 if Safe Area = 0 Otherwise Y Regression for JAN RENT Regression for Safe Area Slop = Β 3 + β 4 D α +β 4 α β 4 Regression for Other months X α Regression for Criminal Area DISTANT 39
STEP BY STEP Quantitative Analysis (Multiple Regression) 1. Conceptual Framework 2. Choose Type of regression (Linear vs. Non Linear) 3. Group Variables 4. Analyze Data (Take logarithm or not) 5. Look at the sign of estimated parameters. 6. Test Hypothesis 7. Take a look at Adjust R 2 40
RESEARCH PAPER: THREE FACTOR MODEL Three Factor Model (Fama and French (1992)) Eugene Fama Kenneth R. French 41
42
43
WORK SHOP #1 44
WORK ORDERS : Multiple Regression (1) Using Three Factor Model to regress Multiple Regression on your group assignment (2) Interpret F-test, and T-Test. (3) Explain Adjusted R 2 (4) Create Dummy variables o Monthly Data : (1) Window Dressing in June and (2) End-Year Effect. o Annual Data : (1) Asian Crisis during 1997-1999, (2) Subprime Crisis during 2008-2010, (3) Europe Debt crisis during 2008-2012. (5) Redo Work Orders (1) (4) with new model 45