Proofs and derivations

Similar documents
12 - Nonparametric Density Estimation

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Statistical Methods for SVM

Jeff Howbert Introduction to Machine Learning Winter

Support Vector Machines

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Statistics: Learning models from data

Support Vector Machines

Online Appendix to Career Prospects and Effort Incentives: Evidence from Professional Soccer

Linear & nonlinear classifiers

Non-parametric Methods

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Chapter 9. Non-Parametric Density Function Estimation

Numerical Analysis: Solving Nonlinear Equations

Linear & nonlinear classifiers

BAYESIAN DECISION THEORY

Nonparametric Methods

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Supplemental Material for KERNEL-BASED INFERENCE IN TIME-VARYING COEFFICIENT COINTEGRATING REGRESSION. September 2017

ECS171: Machine Learning

Bayesian Inference on Joint Mixture Models for Survival-Longitudinal Data with Multiple Features. Yangxin Huang

Mechanism Design: Bayesian Incentive Compatibility

Figure A1: Treatment and control locations. West. Mid. 14 th Street. 18 th Street. 16 th Street. 17 th Street. 20 th Street

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Estimation of Dynamic Regression Models

Chapter 9. Non-Parametric Density Function Estimation

Midterm exam CS 189/289, Fall 2015

Functions: Polynomial, Rational, Exponential

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Controlling versus enabling Online appendix

Robust Mechanism Design with Correlated Distributions. Michael Albert and Vincent Conitzer and

Mathematical Foundations -1- Constrained Optimization. Constrained Optimization. An intuitive approach 2. First Order Conditions (FOC) 7

Solution of Nonlinear Equations

Descriptive Statistics

Linear Models in Machine Learning

One-Sample Numerical Data

The Limits to Partial Banking Unions: A Political Economy Approach Dana Foarta. Online Appendix. Appendix A Proofs (For Online Publication)

Exam C Solutions Spring 2005

ECE276A: Sensing & Estimation in Robotics Lecture 10: Gaussian Mixture and Particle Filtering

Optimality, Duality, Complementarity for Constrained Optimization

Lecture 4: Optimization. Maximizing a function of a single variable

Statistical Data Analysis

Statistics 3858 : Maximum Likelihood Estimators

Maximum Likelihood Estimation. only training data is available to design a classifier

Theory of Value Fall 2017

Spring 2012 Math 541B Exam 1

SOLUTIONS TO ADDITIONAL EXERCISES FOR II.1 AND II.2

Homework 4. Convex Optimization /36-725

A New Class of Non Existence Examples for the Moral Hazard Problem

Machine Learning. Support Vector Machines. Manfred Huber

STAT 6350 Analysis of Lifetime Data. Probability Plotting

Math Scope & Sequence Grades 3-8

AP Calculus Curriculum Guide Dunmore School District Dunmore, PA

What happens when there are many agents? Threre are two problems:

Introduction to Machine Learning

Review. December 4 th, Review

Lecture 35. Summarizing Data - II

Nonparametric Econometrics

Probability Distribution And Density For Functional Random Variables

CSE 417T: Introduction to Machine Learning. Final Review. Henry Chai 12/4/18

Stat 710: Mathematical Statistics Lecture 12

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

9. Least squares data fitting

Machine Learning And Applications: Supervised Learning-SVM

Recitation 7: Uncertainty. Xincheng Qiu

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

STAT100 Elementary Statistics and Probability

Curve Fitting Re-visited, Bishop1.2.5

1. Basic Model of Labor Supply

Discriminative Models

Minimum Wages and Excessive E ort Supply

Non-parametric Methods

Estimating the Tradeoff Between Risk Protection and Moral Hazard

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

2 (Statistics) Random variables

Appendix F. Computational Statistics Toolbox. The Computational Statistics Toolbox can be downloaded from:

minimize x subject to (x 2)(x 4) u,

Math Numerical Analysis Mid-Term Test Solutions

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Optimization. Yuh-Jye Lee. March 21, Data Science and Machine Intelligence Lab National Chiao Tung University 1 / 29

Complexity and Progressivity in Income Tax Design: Deductions for Work-Related Expenses

Continuous r.v. s: cdf s, Expected Values

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Measuring the Sensitivity of Parameter Estimates to Estimation Moments

CS 147: Computer Systems Performance Analysis

Differentiable Welfare Theorems Existence of a Competitive Equilibrium: Preliminaries

Linear Discrimination Functions

9.5. Polynomial and Rational Inequalities. Objectives. Solve quadratic inequalities. Solve polynomial inequalities of degree 3 or greater.

Game Theory and Economics of Contracts Lecture 5 Static Single-agent Moral Hazard Model

Smooth simultaneous confidence bands for cumulative distribution functions

Mathematical Foundations -1- Convexity and quasi-convexity. Convex set Convex function Concave function Quasi-concave function Supporting hyperplane

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Stat 516, Homework 1

Algebra 2. Curriculum (524 topics additional topics)

Transcription:

A Proofs and derivations Proposition 1. In the sheltering decision problem of section 1.1, if m( b P M + s) = u(w b P M + s), where u( ) is weakly concave and twice continuously differentiable, then f b continuous. is Proof. Let s (b P M w) denote the optimal sheltering solution. The assumptions that c (0) < u (w b P M ) and lim s u (w b P M + s) c (s) < 0 imply an interior solution, determined by the first order condition u (w b P M + s (b P M w)) = c (s (b P M w)). The implicit function theorem guarantees that s (b P M w) is continuously differentiable. As a result, we can express final balance due as a continuously differentiable function of pre-manipulation balance due: b(b P M ) = b P M s (b P M w). The convexity of c( ) guarantees that b(b P M ) is strictly increasing, and thus invertible. Denote the inverse function as ψ(b). The CDF of b may be expressed in terms of the CDF for b P M by the relationship F b (x) = Fb P M (ψ(b)). Differentiating yeilds f b (x) = fb P M (ψ(b))ψ (b), which expresses the PDF of b as a product of continuous functions. QED Bunching-based bounds on excess sheltering These calculations establish the bunching-based bounds on excess sheltering, which are reported in table 1. To simplify notation, denote the high and low sheltering amounts as c 1 (1 + ηλ) s H and c 1 (1 + η) s L. To begin, notice that the mass present in the zero-dollar bin of the balance due histogram corresponds to 0.5 0.5 f b(x) dx, the probability of balance due falling within 50 cents of zero. Expressing this value with respect to the pre-manipulation balance-due distribution, this mass is s H +0.5 f P M s L 0.5 b (x) dx, the probability of pre-manipulation balance due falling within 50 cents of the shelter to zero region. 46

Due to the assumption that f P M b is decreasing on the relevant range, it must hold that s H +0.5 fb P M (s H + 0.5) dx s L 0.5 fb P M (s H + 0.5) (s H s L + 1 ) s H +0.5 s L 0.5 s H +0.5 s L 0.5 s H +0.5 fb P M (x) dx fb P M (s L 0.5) dx s L 0.5 fb P M (x) dx fb P M (s L 0.5) (s H s L + 1 ) (14) (15) We wish to use these inequalities to bound s H s L, using terms from the estimated final balance-due distribution. These inequalities imply that s H +0.5 s L 0.5 f b P M (x) dx f P M b (s L 0.5) 1 s H s L s H +0.5 s L 0.5 f b P M (x) dx f P M b (s H + 0.5) 1 (16) Let ˆf(x) denote the estimated distribution of observed balance due, as was generated from the regression described in equation 9 and reported in table 1. Substituting the values in equation 16 with their estimated values, we arrive at ˆf(0) ˆf( 0.5) 1 sh s L }{{} additional sheltering in loss domain ˆf(0) ˆf(0.5) 1 (17) These calculations generate the upper and lower bounds reported in table 1. 47

Derivation of the likelihood function when excess mass is diffusely distributed Here I present a details of the maximum likelihood estimates discussed in section 3.3 and presented in figure 5. Let g(b θ) denote a three-component mixture of normal PDFs, assumed to have a common mean to preserve symmetry. G(b θ) denotes the CDF. θ is a parameter vector containing the relevant model components: the common mean, the standard deviations, and the mixing probabilities. The shifting parameter s is not included in this vector, and will be chosen endogenously to rationalize the excess mass near zero. All such excess mass is assumed to fall within range [ w 2, + w 2 ]. Using this mixture distribution to fit f P M b, and applying the results of proposition 3, the likelihood of a given observation conditional on being outside of the range [ w 2, + w 2 ], where the likelihood is unknown is: g(b i θ) if b i < w 2 L(b i θ) = g(b i + s θ) if b i > w 2 (18) To rationalize the empirical mass found in [ w 2, + w 2 ], s must satisfy: N i=1 s = G 1 ( N I(b i [ w 2, + w 2 ] ) N i=1 I(b i [ w 2, + w 2 ] ) N ( w ) ( ) θ w = G 2 + s G 2 θ ( w ) ) + G 2 θ θ w 2 (19) (20) While an analytic equation for G 1 is not available, the solution to equation 20 can be solved numerically. The numerical solution is generated with Newton s method, with tolerance set to 0.001. With the resulting endogenous specification of s, the log-likelihood function implied by equation 18 is maximized to produce the estimated model. 48

B Supplemental tables and figures Figure A.1: Bounds on integral of shelter to zero region Notes: This figure presents the hypothetical distribution previously presented in figure 1, zoomed on the shelter to zero region. Under the assumption that the distribution is decreasing over these values, the integral of the shelter to zero region can be no larger than the width of the region times the density on the left, and no smaller than the width of the region times the density on the right. 49

Figure A.2: Fit of predicted skew-normal distributions Notes: Plots of distributions fitted to the balance due frequency histogram. Balance due expressed in 1990 dollars, and rounded to $1 bins. Grey lines indicate the estimated models from table A.5, fitting a skew-normal distribution with a shift in the loss domain. For comparison, black lines indicate local-average kernel regressions (bandwidth: 10, kernel: Epanechnikov). Range of plot restricted to [ 2000, 2000], with zero excluded. 50

Figure A.3: Distribution with fixed cost at zero balance due Notes: This figure presents a hypothetical distribution that could be generated if fixed costs were incurred in the loss domain. A fixed cost associated with positive balance due generates a region, [0, b max ], where the taxpayer shelters to zero to avoid the fixed cost. Outside of this region, decisions are made according to marginal incentives, as they would be in the absence of the fixed cost. Similar to the model with loss aversion (illustrated in figure 1), fixed costs generate excess mass at zero. In contrast to the model with loss aversion, fixed costs do not shift the entire loss domain, and furthemore generate a region of missing mass. 51

Table A.1: Sample size across SSN codes and model years SSN Code A B C, D, E Total 1979 8852 9013 26935 44800 1980 9107 9205 27709 46021 1981 9131 9282 27825 46238 1982 9129 0 0 9129 1983 9389 9514 0 18903 1984 9636 0 0 9636 1985 9948 10013 0 19961 1986 9990 0 0 9990 1987 10362 10543 0 20905 1988 10627 10707 0 21334 1989 10952 11054 0 22006 1990 11122 11230 0 22352 Total 118245 90561 82469 291275 Unique Taxpayers 15950 15919 32158 64027 Notes: This table presents the number of responses over time by different SSN groups. Five randomly determined four-digit SSN endings were chosen to form the sample, labeled A-E. Group A was sampled from 1979-1990. Group B was not sampled in 1982, 1984, or 1986. Groups C, D, and E were sampled only for the first three years of the data collection. 52

Table A.2: Summary statistics Mean Standard Deviation p5 p25 p50 p75 p95 Balance Due -523 3287-3202 -1170-521 -56 1988 Taxes (Before Credits) 4595 6868 166 1084 2687 5633 14409 Payments 5147 6738 445 1508 3318 6556 14784 Adjusted Gross Income 32631 26112 6865 15100 26124 42702 77473 Filed 1040 0.66 Filed 1040A 0.26 Filed 1040EZ 0.08 Observations 229116 Unique Taxpayers 53177 Notes: Mean, standard deviation, and quantiles of variables relevant for the balance due calculation. Monetary amounts expressed in 1990 dollars. 53

Table A.3: Estimates of excess mass at zero balance due, alternative standard errors (1) (2) (3) (4) (5) All AGI groups 1st AGI quartile 2nd AGI quartile 3rd AGI quartile 4th AGI quartile γ: I(balance due = 0) 136.43 46.57 26.79 21.06 42.01 (15.64) (9.12) (7.46) (6.77) (7.73) δ : I(balance due > 0) -16.26-3.50-4.20-3.42-5.14 (5.13) (3.03) (2.74) (2.43) (2.06) α : Constant 99.57 33.43 27.21 21.94 16.99 (2.90) (1.74) (1.56) (1.38) (1.20) Balance due polynomial X X X X X N: Bins in histogram 201 201 201 201 201 Observations 16348 5725 4553 3602 2468 R 2 0.490 0.479 0.259 0.209 0.489 Estimates of excess sheltering Bunching-based upper bound 1.83 1.68 1.34 1.32 3.97 (0.21) (0.34) (0.35) (0.40) (0.81) Bunching-based lower bound 1.37 1.39 0.99 0.96 2.48 (0.17) (0.30) (0.29) (0.33) (0.52) Notes: Table 1, recreated with bootstrapped standard errors. Bootstrap iterations: 10,000. * p < 0.10, ** p < 0.05, *** p < 0.01. 54

Table A.4: NLLS estimates of shift in loss domain, alternative standard errors (1) (2) (3) (4) (5) Full sample 1st AGI quartile 2nd AGI quartile 3rd AGI quartile 4th AGI quartile s: s H s L 389.37 35.64 70.05 184.22 585.93 (5.49) (6.79) (8.85) (13.93) (101.77) p1 0.80 0.45 0.58 0.52 0.64 (0.00) (0.01) (0.02) (0.03) (0.12) µ -419.08-392.41-487.87-621.93-412.11 (2.21) (2.07) (3.99) (7.78) (45.39) σ1 1058.15 702.44 937.59 1585.22 2610.20 (5.85) (13.60) (21.12) (39.41) (246.70) σ2 252.97 246.84 375.22 741.72 1218.48 (4.31) (3.49) (10.43) (25.66) (387.55) N: Bins in histogram 8000 8000 8000 8000 8000 Observations 216227 57163 56799 55272 46993 Notes: Table 2, recreated with bootstrapped standard errors. Bootstrap iterations: 1000. * p < 0.10, ** p < 0.05, *** p < 0.01. 55

Table A.5: NLLS estimates of shift in loss domain: skew-normal density (1) (2) (3) (4) (5) Full sample 1st AGI quartile 2nd AGI quartile 3rd AGI quartile 4th AGI quartile s: s H s L 405.92 57.49 158.70 291.51 917.65 (8.92) (6.06) (8.48) (12.91) (23.72) ξ: location 299.15-85.19-16.48 292.43-1418.40 (11.20) (6.34) (16.39) (18.69) (106.86) θω: scale index 6.97 6.21 6.66 7.28 7.72 (0.01) (0.01) (0.02) (0.01) (0.03) α: shape -1.63-1.55-1.19-1.55 0.91 (0.06) (0.07) (0.08) (0.07) (0.11) N: Bins in histogram 8000 8000 8000 8000 8000 Observations 216227 57163 56799 55272 46993 Notes: Standard errors in parentheses. Balance due expressed in 1990 dollars. Reported are nonlinear least squares estimates of the model Cj = Obs f skew (bj + s I(b > 0) ξ, ω, α) + ɛj, where f skew (bj ξ, ω, α) is the skew-normal distribution ( 2 φ b j ξ ω ω ) Φ ( α b j ξ ω ). In this equation, j indexes each dollar bin of balance due b from -4000 to 4000, with zero excluded. Cj indicates frequency counts. To constrain the scale parameter away from zero, it is estimated as ω = 10 + exp(θω). * p < 0.10, ** p < 0.05, *** p < 0.01. 56

Table A.6: Estimates of AGI shocks at zero balance due interacted with income source (1) (2) (3) (4) (5) (6) Dependent Variable : AGI Balance due = 0 946 451 1703 482 441 942 (743) (747) (979) (1016) (1047) (862) Income from Schedule C-F 0-578 -2344-699 482-807 26 (76) (100) (91) (147) (155) (153) Balance due = 0 8786 10581 13106 9735 13263 11007 Income from Schedule C-F 0 (4092) (4094) (4124) (4770) (4745) (4677) Balance due > 0-453 1025 389 1398 (126) (123) (156) (135) Balance due > 0 1087 221 915 332 Income from Schedule C-F 0 (182) (176) (244) (210) Filing-year fixed effects X X X X X X Balance due polynomial X X X X Lagged AGI polynomial X X Taxpayer fixed effects X X X N 148325 148325 148325 148325 148325 148325 Notes: Standard errors, clustered by taxpayer, in parentheses. Monetary quantities expressed in 1990 dollars. Xs indicate the presence of filing-year or taxpayer fixed effects, a third-order polynomial in lagged AGI, or a third-order polynomial in balance due interacted with I(balance due > 0) to allow for discontinuity at zero. * p < 0.10, ** p < 0.05, *** p < 0.01. 57