Circling the Square: Experiments in Regression

Similar documents
Financial Econometrics Lecture 6: Testing the CAPM model

ECON4515 Finance theory 1 Diderik Lund, 5 May Perold: The CAPM

PROPOSED TABLE OF CONTENTS FOR YEAR 11 GENERAL MATHEMATICS AND ESSENTIAL MATHEMATICS

1 Numbers, Sets, Algebraic Expressions

TEACHER NOTES FOR YEAR 11 GENERAL MATHEMATICS

Correlation: California State Curriculum Standards of Mathematics for Grade 6 SUCCESS IN MATH: BASIC ALGEBRA

DELAWARE COMPENSATION RATING BUREAU, INC. Internal Rate Of Return Model

Math Literacy. Curriculum (457 topics)

Florida Math Curriculum (433 topics)

Pre Algebra. Curriculum (634 topics)

Integrated Math III. IM3.1.2 Use a graph to find the solution set of a pair of linear inequalities in two variables.

Copyright 2016 Pearson Education, Inc. or its affiliates. All rights reserved. NES, the NES logo, Pearson, the Pearson logo, and National Evaluation

Lecture 13. Simple Linear Regression

= ( 17) = (-4) + (-6) = (-3) + (- 14) + 20

Algebra Readiness. Curriculum (445 topics additional topics)

Middle School Math Course 2

Correlation of Moving with Math Grade 7 to HSEE Mathematics Blueprint

FONTANA UNIFIED SCHOOL DISTRICT High School Glencoe Algebra 1 Quarter 1 Standards and Objectives Pacing Map

CAHSEE Math Released Test Questions

PENNSYLVANIA COMPENSATION RATING BUREAU F CLASS FILING INTERNAL RATE OF RETURN MODEL

New Rochelle High School Geometry Summer Assignment

Algebra II Crosswalk. Red font indicates a passage that is not addressed in the compared sets of standards.

YEAR 10 PROGRAM TERM 1 TERM 2 TERM 3 TERM 4

Intermediate Algebra Semester Summary Exercises. 1 Ah C. b = h

Math Review for AP Calculus

High School Preparation for Algebra 1

Foundations of High School Math

R = µ + Bf Arbitrage Pricing Model, APM

California. Performance Indicator. Form B Teacher s Guide and Answer Key. Mathematics. Continental Press

Algebra 2. Curriculum (524 topics additional topics)

Part 1 - Pre-Algebra Summary Page 1 of 22 1/19/12

Your quiz in recitation on Tuesday will cover 3.1: Arguments and inference. Your also have an online quiz, covering 3.1, due by 11:59 p.m., Tuesday.

Math Prep for Statics

Middle School Math Course 3

Algebra II. A2.1.1 Recognize and graph various types of functions, including polynomial, rational, and algebraic functions.

Math Review. for the Quantitative Reasoning measure of the GRE General Test

Mathematics. Essential Learning Concepts

Regression: Ordinary Least Squares

Topic I can Complete ( ) Mark Red/Amber/Green Parent s signature. Inverclyde Academy Mathematics Department Page 1

THE TEACHER UNDERSTANDS THE REAL NUMBER SYSTEM AND ITS STRUCTURE, OPERATIONS, ALGORITHMS, AND REPRESENTATIONS

Chapter R - Review of Basic Algebraic Concepts (26 topics, no due date)

2014 Summer Review for Students Entering Algebra 2. TI-84 Plus Graphing Calculator is required for this course.

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Another Look at the Boom and Bust of Financial Bubbles

Course Readiness and Skills Review Handbook (Topics 1-10, 17) (240 topics, due. on 09/11/2015) Course Readiness (55 topics)

SOLVING EQUATIONS CONTAINING FRACTIONS GENERAL PROCEDURE FOR SOLVING LINEAR EQUATIONS EXAMPLES Try This One!...

California Content Standard. Essentials for Algebra (lesson.exercise) of Test Items. Grade 6 Statistics, Data Analysis, & Probability.

Integrated Math 3 Math 3 Course Description:

Hawai`i Post-Secondary Math Survey -- Content Items

Modes of reasoning in Israeli seventh grade mathematics textbook explanations

Algebra 1 Math Year at a Glance

MATH 60 Course Notebook Chapter #1

Fundamentals of Algebra, Geometry, and Trigonometry. (Self-Study Course)

A Guide to Modern Econometric:

Chapter 8B - Trigonometric Functions (the first part)

Name Period Date. Use mathematical reasoning to create polynomial expressions that generalize patterns. Practice polynomial arithmetic.

California: Algebra 1

MILLIS PUBLIC SCHOOLS

Give algebraic and numeric examples to support your answer. Which property is demonstrated when one combines like terms in an algebraic expression?

Common Core State Standards for Mathematics Integrated Pathway: Mathematics I

Errata for Campbell, Financial Decisions and Markets, 01/02/2019.

California Algebra 1

Multiple Linear Regression CIVL 7012/8012

Practice Questions for Math 131 Exam # 1

Index. Cambridge University Press An Introduction to Mathematics for Economics Akihito Asano. Index.

THE TEACHER UNDERSTANDS THE REAL NUMBER SYSTEM AND ITS STRUCTURE, OPERATIONS, ALGORITHMS, AND REPRESENTATIONS

SUMMER MATH PACKET students. Entering Geometry-2B

Math Lesson Plan 8th Grade Curriculum Total Activities: 345

General and Specific Learning Outcomes by Strand Applied Mathematics

Identifying Financial Risk Factors

Pre-Algebra (6/7) Pacing Guide

Pre Algebra and Introductory Algebra

MATHEMATICS Math I. Number and Quantity The Real Number System

Pure Mathematics 20 Unit 1. Systems of Equations and Linear Inequalities

MOEMS What Every Young Mathlete Should Know

Alberta Mathematics Kindergarten to Grade 12 Scope and Sequence 2017

Math 016 Lessons Wimayra LUY

FinQuiz Notes

Module 1. Identify parts of an expression using vocabulary such as term, equation, inequality

Algebra Mat: Working Towards Year 6

MATHS WORKSHOPS Algebra, Linear Functions and Series. Business School

Region 16 Board of Education. Precalculus Curriculum

Ohio s State Tests ITEM RELEASE SPRING 2017 INTEGRATED MATHEMATICS II

Multivariate GARCH models.

California CCSS Mathematics Grades 1-3

Pre Algebra. Curriculum (634 topics additional topics)

A marks are for accuracy and are not given unless the relevant M mark has been given (M0 A1 is impossible!).

10.2 Systems of Linear Equations

Asset Pricing. Chapter IX. The Consumption Capital Asset Pricing Model. June 20, 2006

STEM-Prep Pathway SLOs

LEARN ABOUT the Math

Answer Explanations for: ACT June 2012, Form 70C

Massachusetts Tests for Educator Licensure (MTEL )

Prep for the CSU ELM

Florida Math 0022 Correlation of the ALEKS course Florida Math 0022 to the Florida Mathematics Competencies - Lower and Upper

THE TEACHER UNDERSTANDS THE REAL NUMBER SYSTEM AND ITS STRUCTURE, OPERATIONS, ALGORITHMS, AND REPRESENTATIONS

7. Integrated Processes

Basic Math. Curriculum (358 topics additional topics)

Lesson four. Linear Depreciation 29. Terminal Objective. Lesson 4. Students will solve depreciation word problems by writing linear equations.

Transcription:

Circling the Square: Experiments in Regression R. D. Coleman [unaffiliated] This document is excerpted from the research paper entitled Critique of Asset Pricing Circularity by Robert D. Coleman dated 15 February 2002. Copyright 2002. All rights reserved.

CIRCLING THE SQUARE 1 Econometrics is inferential statistics applied to economics. Econometrics, an inductive method, is not mathematics, a deductive method. Time series analysis is not econometrics. Time series analysis of a sequence of variables generated by human behavior is not science, it is history. For example, time series analysis of a sequence of radioactive decay intervals is scientific. Time series analysis of a sequence of stock market prices is not scientific, but rather history. Likewise, the issue of equations in circular form is not econometrics, but rather logic and mathematics. Just as in grammar, a definition that includes the term to be defined is a circular definition and a fallacy, in math, an equation with the term to be solved on both sides is a circular equation and a fallacy. In a theoretical algebraic equation, the relationship among the variables is known, but the values of the variables are unknown. In an applied algebraic equation, the value of one variable is unknown, and the values of the other variables are known. In contrast, in a statistical regression equation, the values of all variables are known in a sample of observations, but the relationships among the variables are unknown. Conjoining variables renders an equation or variable relationships intractable due to statistical interaction and similar effects and thereby confounds interpretation. To avoid such inextricable intertwining, it is necessary in an algebraic equation to isolate the unknown variable on the left-hand side and the known variables on the right-hand side. For the same reason, it is necessary in a regression equation to isolate the explained or dependent variable on the left-hand side and the explanatory or independent variables on the right-hand side. This is accomplished by various operations. The basic operations are simplify, factor, reduce to lowest common denominator, substitute, transpose and

CIRCLING THE SQUARE 2 rearrange. Other operations are multiply, divide, add or subtract the same term or otherwise transform on both sides. Additional operations include multiplying or dividing one side with a combination of terms that is equivalent to one, and adding or subtracting on one side a combination of terms that is equivalent to zero. If the equation contains practical variables rather than abstract variables, it is also necessary to insure that the values of the variables have consistent units of measurement. This is done by either converting variables or adjusting data values. Isolation and units analysis result in the proper reduced form of the equation. Only then is the equation ready for solution. An algebraic equation is solved in a specific case by replacing the variables on the right-hand side with their respective values. In a statistical regression, the equation is solved by estimating the coefficients of the right-hand side variables based on a sample of observations. Any variable included on both sides of a least-squares regression equation may result in greater significance as measured by Student s t and in a better fit as measured by R-square. For such improvement in these and other test statistics, there is a price to pay in bias in the estimated coefficients. This bias is the result of the induced correlation due to logical circularity. The dependent explained variable may be isolated on the lefthand side and yet the equation may still be logically circular if there is any adjustment factor that appears on both sides. Such circular adjustment factors due to economic reasoning are sometimes necessary due to the theory represented by the equation. But any econometric technique that requires a circular adjustment factor is elective and not logically necessary.

CIRCLING THE SQUARE 3 An equation is logically circular in any variable that appears on both sides. Thus, an equation with the explained variable on both sides is logically circular in the explained variable. Such an equation lacks face validity and thus scientific interest, and is an unwarranted assertion at best. Moreover, a mere equation is not a theory. A regression equation represents the operationalization of a theory. Any author of a logically circular equation bears the burden of proof to justify the theory behind it. The specified equation must be motivated by a plausible story that explains the inclusion of each variable. This justification is of greater importance if application of the theory can result in greater damage. A. Example Formula The Table summarizes selected regressions of logically circular equations for unit rectangles. The terms of the equations are related by formulas, i.e., they are tautological. These ordinary least squares (OLS) regressions illustrate the kinds of bias in the estimated coefficients that result when the equation is not in proper reduced form. These estimated coefficients have greater precision because the observations are calculated from the formulas rather than measured on actual rectangles. The sample size includes all rectangles with lengths and widths from 2 to 9 integer units for a total of 64 observations. First, we compare six regressions of length and width on perimeter, on area, and on compactness. Length (L) is defined as the vertical dimension, and width (W) as the horizontal dimension. Perimeter (P) is defined as length added to width and then multiplied by two. Area (A) is defined as length multiplied by width. Compactness (C) is defined as area divided by perimeter. Not surprisingly, length and width have high significance as measured by Student s t and high explanatory power as measured by R-

CIRCLING THE SQUARE 4 square. Also not surprisingly, significance and power decrease as we go from perimeter as the explained variable which is additive in length and width, to area which is simple multiplicative, to compactness which is compound multiplicative. Next, we compare five regressions of area, perimeter, and inverse perimeter on compactness. The inverse of a variable is defined as one unit divided by that variable. It is not surprising that perimeter and area, each of which entails both length and width, are highly significant and that the specified variables have high explanatory power. It also is not surprising that in the equation with two explanatory variables, inverse Perimeter is more significant than perimeter because that is the form in which perimeter is entailed in compactness. A natural logarithm transform of both sides should result in a zero intercept, coefficients equal to 1.00, Student s t equivalent to zero probability, and 100% R-square. Last, we compare 8 regressions of oddness, sides ratio, diagonal, and radius on compactness, and 4 regressions of these four variables on compactness with deflation and the intercept constrained to equal zero. Deflating the variables by oddness transforms the intercept to inverse oddness. Oddness (O) is defined as a nominally measured variable with four numerical categories: 1 if both length and width are even; 2 if length is even and width is odd, 3 if length is odd and width is even, and 4 if both length and width are odd. Sides ratio (S) is defined as the long side divided by the short side. Diagonal (D) is defined as the hypotenuse of the right triangle with length and width as sides. Radius (R) is defined as that of a circle with area equal to a rectangle with length and width as sides. It is not surprising that all four of these variables contribute individually to explanatory power because they each entail length and width. It is not surprising that the

CIRCLING THE SQUARE 5 explanatory power of these variables increases from oddness to sides ratio to diagonal to radius due to increasing information and closer conformance to the form of the explained variable. For the opposite reasons, it is not surprising that oddness is not statistically significant at the 5% probability level in such a small sample. It also is not surprising that the increase in explanatory power due to the step-wise addition of inverse oddness as a deflator to the regression equation is greater when the explanatory power at the prior step begins at a lower level. The increase for oddness alone is 54 percentage points from 4%. The increase is greater for oddness alone than in combination with sides ratio (up 39 percentage points from 29%), in combination with sides ratio than in combination with diagonal (up 12 percentage points from 78%), and in combination with diagonal than in combination with radius (up one percentage point from 98%). Oddness entails length and width quite indirectly, in only four categories. In this sample, neither oddness nor its inverse is statistically significant when regressed on compactness, and each explains only 4% of the total variation in compactness. Yet, dividing all terms on both sides by oddness as a deflator in the regression of either radius, diagonal, or sides ratio on compactness, results in an increase of one to 39 percentage points in explanatory power. The regression of oddness on compactness illustrates the phenomenon of false negatives, i.e., logically circular variables do not necessarily result in high statistical significance or notable explanatory power. Values of test statistics depend on the form of the variable, the presence of other variables competing in the same equation, sample size, sample composition, and methodology.

CIRCLING THE SQUARE 6 This illustration of counter-intuitive false negatives is reinforced by comparison of the regressions of Oddness with the regressions of X, defined as a random variable equal to an integer ranging from one through 100 for each rectangle. One run each of two regressions of X is reported in the Table. Ten runs resulted in a Student s t ranging from 0.45 to 1.85 for X when regressed on compactness, but it was always equal to zero for inverse X when X was a deflator with the intercept constrained to equal zero. These ten runs resulted in a R-square ranging from near zero to 7% for each regression with different combinations of Student s t below or above the 5% probability level and R- square increasing or decreasing from the regression on compactness to the regression on compactness deflated by X. Non-explanatory variables are sometimes used to modify operational variable definitions or to adjust sample data. In addition to size deflators and unit conversion factors, this includes heteroskedasticity correction and other data weightings. Such data adjustments are ad hoc and arbitrary. They are avoidable and not necessary. The specification of such adjustment factors on both sides of the equation may improve the goodness of fit at the price of bias in the estimated coefficients due to the spurious correlation that results from logical circularity. This bias in turn impacts both the levels of significance and the explanatory power. If a variable used as an adjustment factor is isolated on the left-hand side of an equation, then its role changes from adjustment factor to explained variable. All regressions run on formulas of rectangles are reported in the Table. The most unexpected result is the high significance and explanatory power of oddness (t=10 and R- square=58%) when specified as a deflator in the explanation of sample variation in

CIRCLING THE SQUARE 7 compactness. Yet, it can be concluded there is no evidence that oddness belongs in the equation in any form. It appears in a significant form in the deflated equation only because the other variables were divided by it. Also unexpected is the high significance and explanatory power of diagonal (t=15 and R-square=78%) and of radius (t=56 and R- square=98%), even with each directly entailing both length and width. B. Return Formula To identify potential logical circularity in capital asset pricing models, we analyze the formula for return on investment. The definition of return on a common stock is: Dt R St t = Pt St 1 1 + Pt Pt St St Pt 1 St 1 1 1 = Dt ( Pt 1)( St St 1) ( 1)( 1) ( Pt 1)( St St 1) + Pt Pt St St where R is return, D is dividend per share, P is share price at the end of a holding period, S is number of shares, and t indexes holding periods. It can be seen from this equation that R t, D t, P t, P t-1, S t and S t-1 should not appear on the right-hand side of any leastsquares regression equation to explain R t on the left-hand side. None of the six variables should appear on the right-hand side in any form, whether directly, entailed, forming portfolios, or any other method. The capital asset pricing model (CAPM) of modern portfolio theory was independently developed by Sharpe (1964) and Lintner (1965). Reconciliation of these two versions of the CAPM involves excluding the priced security from the market portfolio. To avoid logical circularity, the stock with beta risk factor, β i, and return, R i, on the left-hand side is not included in the market portfolio, represented by R m, or its proxy, R M, on the right-hand side of the equation. Thus,

CIRCLING THE SQUARE 8 β, i m i = ~ ~ cov ( Ri, R m i) ~ 2 σ ( R m i) ~ ~ ~ and R = a + b R it i i m i, t where beta of the stock relative to the market is constant and return on the stock varies by holding period. For example, it would be logically circular to use the DJIA as a proxy for the stock market to estimate beta for one of the 30 component stocks in the DJIA. Analysis of units of measurement may reveal changes in shares. Time-indexed shares represent different units when they are claims on different fractions of total equity and voting rights. For example, stock splits based on the exchange ratio, S t :S t-1, result in S t new shares for each S t-1 old share. More generally, the shares adjustment factor equal to S t /S t-1 is used for adjusting the old price, P t-1. Before solving, old shares must be converted to new shares and the price of old shares adjusted. If there are no stock splits, stock dividends, spin-offs or recapitalizations that change the number of shares outstanding during the period from time t to time t-1, then S t and S t-1 are effectively eliminated from the equation because both would be equal to one unit. Net return can be calculated with the inclusion of variables for transaction costs and taxes. Real return can be calculated with the addition of a price-level inflation variable. Other-currency return can be calculated with the addition of a foreign exchange ratio variable. Adjustment factors for net, real, and foreign-currency return may be necessary in some cases, yet introduce a bias when present on both sides of an equation. Autoregressions of an economic time sequence are not necessarily logically circular. Each term in the sequence is a different variable due to time indexing. The regression equation, P t = β*p t-1 + e t, is a first-order autoregression with two price variables. This autoregression is in proper reduced form and thus is not logically circular.

CIRCLING THE SQUARE 9 The regression equation, R t = β*r t-1 + e t, is also first-order autoregressive, but it is logically circular because P t-1 is entailed in R on both sides. This analysis of the formula for return shows that high statistical significance and explanatory power are neither certain nor surprising if R t, D t, P t, P t-1, S t or S t-1 is specified as an explanatory variable in a regression equation to explain either return or a variable entailing return. All five variables embedded in return are controlled by human behavior. The share prices are determined by stock market participants, and the shares outstanding and dividends per share are set by company directors. Thus, autoregressions of return or any of its entailed variables are not scientifically valid.

CIRCLING THE SQUARE 10 Table. Estimated OLS Regression of Logically Circular Equations for Unit Rectangles. Explained Variable (DV), Explanatory Variables (IV), estimated coefficients (Β), Student s t, and R-square coefficient of multiple determination. Sample Size = 64. DV IV Β t IV Β t R-SQ P L 2.00 8 50% P L 2.00 2e15 W 2.00 2e15 100% A L 5.50 7 46% A L 5.50 19 W 5.50 19 92% C L 0.13 7 45% C L 0.13 16 W 0.13 16 90% C A 0.02 44 97% C P 0.06 23 90% C P 0.002 0.46 A 0.02 12 97% C 1/P -19.24 11 67% C 1/P -2.29 3 A 0.02 26 97% C O 0.08 1.66 4% C 1/O -0.30 1.55 4% C/O * 1/O 1.16 10 58% C S -0.26 5 27% C S -0.25 5 O 0.06 1.39 29% C/O * S/O -0.23 5 1/O 1.62 15 68% C D 0.17 15 78% C D 0.17 14 O 0.01 0.51 78% C/O * D/O 0.17 14 1/O -0.14 1.46 90% C R 0.46 56 98% C R 0.46 55 O -0.0001 0.11 98% C/O * R/O 0.46 55 1/O -0.12 5 99% C X -0.004 1.85 5% C/X * 1/X 0.00 0.00 1% * Intercept constrained to equal zero. t (60 df, 2-tailed test): 2.00 = 5% probability and 2.66 = 1% probability. LEGEND A = area = L*W P = perimeter = (L+W)*2 C = compactness = A/P R = radius of circle = sqrt(a/π) D = diagonal = sqrt(l**2+w**2) S = sides ratio = long side/short side L = length (2,, 9) W = width (2,, 9) O = oddness (1, 2, 3 or 4) X = random integer (1,, 100)

CIRCLING THE SQUARE 11 REFERENCES Lintner, J., 1965, The valuation of risk assets and the selection of risky investments in stock portfolios and capital budgets, Review of Economics and Statistics 47 (February): 13-37. Sharpe, W. F., 1964, Capital asset prices: A theory of market equilibrium under conditions of risk, Journal of Finance 19 (September): 425-442.