Lecture 2 One too many inequalities

Similar documents
Lecture Quantitative Finance Spring Term 2015

A Brief Introduction to Copulas

Explicit Bounds for the Distribution Function of the Sum of Dependent Normally Distributed Random Variables

Tail and Concentration Inequalities

Convergence of Random Variables Probability Inequalities

Probability inequalities 11

R. Koenker Spring 2017 Economics 574 Problem Set 1

Lecture 6 Basic Probability

Probability Background

CLASSICAL PROBABILITY MODES OF CONVERGENCE AND INEQUALITIES

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Multivariate Measures of Positive Dependence

4 Expectation & the Lebesgue Theorems

1 Exercises for lecture 1

STA205 Probability: Week 8 R. Wolpert

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

ECE 4400:693 - Information Theory

Chp 4. Expectation and Variance

BIVARIATE P-BOXES AND MAXITIVE FUNCTIONS. Keywords: Uni- and bivariate p-boxes, maxitive functions, focal sets, comonotonicity,

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Copulas. Mathematisches Seminar (Prof. Dr. D. Filipovic) Di Uhr in E

Foundations of Machine Learning

LECTURE 2. Convexity and related notions. Last time: mutual information: definitions and properties. Lecture outline

Linear regression COMS 4771

3 Multiple Discrete Random Variables

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

Formulas for probability theory and linear models SF2941

ON UNIFORM TAIL EXPANSIONS OF BIVARIATE COPULAS

March 1, Florida State University. Concentration Inequalities: Martingale. Approach and Entropy Method. Lizhe Sun and Boning Yang.

Sklar s theorem in an imprecise setting

Regression and Statistical Inference

Appendix B: Inequalities Involving Random Variables and Their Expectations

Copulas and dependence measurement

Lecture 3: Expected Value. These integrals are taken over all of Ω. If we wish to integrate over a measurable subset A Ω, we will write

Contents 1. Coping with Copulas. Thorsten Schmidt 1. Department of Mathematics, University of Leipzig Dec 2006

Lecture 19 Inference about tail-behavior and measures of inequality

arxiv: v1 [math.pr] 9 Jan 2016

STA 711: Probability & Measure Theory Robert L. Wolpert

STAT 200C: High-dimensional Statistics

Lecture 2: Review of Basic Probability Theory

3. Review of Probability and Statistics

Lecture 7 Introduction to Statistical Decision Theory

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

The intersection axiom of

Financial Econometrics and Volatility Models Copulas

arxiv: v1 [math.pr] 24 Aug 2018

Notes on Random Variables, Expectations, Probability Densities, and Martingales

MAS223 Statistical Inference and Modelling Exercises

Statistics 3657 : Moment Generating Functions

SUFFICIENT STATISTICS

Lecture 4. David Aldous. 2 September David Aldous Lecture 4

Study Guide on Dependency Modeling for the Casualty Actuarial Society (CAS) Exam 7 (Based on Sholom Feldblum's Paper, Dependency Modeling)

Stochastic Comparisons of Weighted Sums of Arrangement Increasing Random Variables

L p Spaces and Convexity

Economics 574 Appendix to 13 Ways

Networks: Lectures 9 & 10 Random graphs

Discrete Probability Refresher

Page 2 of 8 6 References 7 External links Statements The classical form of Jensen's inequality involves several numbers and weights. The inequality ca

Thus far, we ve made four key assumptions that have greatly simplified our analysis:

Lecture 18: March 15

FORMULATION OF THE LEARNING PROBLEM

3. Probability and Statistics

Solution for Problem 7.1. We argue by contradiction. If the limit were not infinite, then since τ M (ω) is nondecreasing we would have

conditional cdf, conditional pdf, total probability theorem?

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 12 November 3

Large Sample Properties of Estimators in the Classical Linear Regression Model

The main results about probability measures are the following two facts:

FRÉCHET HOEFFDING LOWER LIMIT COPULAS IN HIGHER DIMENSIONS

A Measure of Monotonicity of Two Random Variables

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich

Lecture 2: Linear Algebra Review

1 Review of The Learning Setting

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Lecture 4: Inequalities and Asymptotic Estimates

Metric spaces and metrizability

PCA with random noise. Van Ha Vu. Department of Mathematics Yale University

CSE 525 Randomized Algorithms & Probabilistic Analysis Spring Lecture 3: April 9

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

Lines, parabolas, distances and inequalities an enrichment class

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 8 10/1/2008 CONTINUOUS RANDOM VARIABLES

Generators of Markov Processes

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Extreme Value Analysis and Spatial Extremes

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

COMS 4771 Introduction to Machine Learning. Nakul Verma

Convexity of chance constraints with dependent random variables: the use of copulae

The Hilbert Space of Random Variables

1. Algebra and Functions

Lecture 22: Variance and Covariance

Walk through Combinatorics: Sumset inequalities.

Chapter 5 Random vectors, Joint distributions. Lectures 18-23

Concentration inequalities and tail bounds

Lecture 1 Measure concentration

Exercises with solutions (Set D)

Multivariate Distributions

Stable Process. 2. Multivariate Stable Distributions. July, 2006

Entropy and Ergodic Theory Lecture 15: A first look at concentration

VaR vs. Expected Shortfall

Statistics 581, Problem Set 1 Solutions Wellner; 10/3/ (a) The case r = 1 of Chebychev s Inequality is known as Markov s Inequality

Transcription:

University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 2 One too many inequalities In lecture 1 we introduced some of the basic conceptual building materials of the course. In this lecture we introduce some basic tools for handling these materials, these tools consist of a sequence of inequalities which play a fundamental role in subsequent developments. Most of these results may be found in any introductory test; a very comprehensive appendix on probabilistic inequalities may be found in Shorack and Wellner (1986, Empirical Processes). 1. Classical Inequalities We begin with the Markov inequality which may be regarding as a basic inequality generating theorem. Theorem 1. Let Z be a real r.v. and g a nonnegative, nondecreasing function on the support of Z, i.e., a set B such that P (Z B) = 1, then, P (Z a) Eg(Z) g(a) The proof is very simple. By hypothesis, g(a)i(z a) g(z)i(z a) g(z). This can be interpreted as holding for each ω Ω, with Z = Z(ω). Taking expectations yields the result. Examples: We will use the common convention that t + = max{0, t} denotes the positive part of t 1. Markov. Z = X, g(t) = t + P ( X a) E X a 2. Chebychev. Z = X, g(t) = t 2 P ( X a) EX2 a 2 3. Bernstein. Z = X, g(t) = e st P (X a) EesZ e sa Generally speaking, these inequalities are rather crude, but one can construct examples for which they are actually sharp. For example, in the case of the Markov inequality, suppose, X = Then, EX = µ, and obviously P (X a) = EX a. a µ/a w.p. 0 1 µ/a The next important inequality is the Cauchy Schwartz (or correlation) inequality. Theorem 2. Let X = (X 1,..., X p ) be a p-vector of real r.v. s and U = EXX. The matrix U is symmetric, nonnegative definite with singularity ( U = 0) iff there exists a p-vector α 0 such that 1

2 ( ) E(α X) 2 = 0. Since E is applied componentwise (and multiplication commutes) symmetry is immediate. Nonnegative definiteness follows from α Uα = E(α X) 2 0. If equality holds, then clearly (*) holds and U is singular, since Uα = 0. On the other hand, if U is singular, there must exist α 0 such that Uα = 0, so equality holds. Corollary 1. (Cauchy-Schwartz) For r.v. s X 1, X 2 (EX 1 X 2 ) 2 EX 2 1EX 2 2 and centering X 1, X 2 at their respective means yields, (Cov(X 1, X 2 )) 2 V (X 1 )V (X 2 ) Specializing the previous result to p = 2, and recalling that U may be expressed as the product of its eigenvalues which are nonnegative we have and we obtain the second form by centering. Remark: 0 U = EX 2 1EX 2 2 (EX 1 X 2 ) 2 Note that the latter form implies the well-loved correlation inequality, The simple result V (X) 0 which implies 0 ρ 2 Cov(X 1, X 2 ) 2 V (X 1 )V (X 2 ) 1. EX 2 (EX) 2 can be thought of as a special case of the following general result. It plays a crucial role in establishing consistency of the maximum likelihood estimator, among many other applications. Theorem 3. (Jensen) If X and g(x) are integrable r.v. s and g( ) is convex, then g(ex) Eg(X) Convexity of g implies that for any ξ there exists a line L through the point (ξ, g(ξ)) such that the graph of g is above the line, i.e., Let ξ = EX, then for all x, g(x) g(ξ) + λ(x ξ). g(x) g(ex) + λ(x EX). Note that λ depends on ξ, but not on x. Now let x = X and we get g(x) g(ex) + λ(x EX). And taking expectations yields the result. This can be interpreted geometrically in R p in terms of supporting hyperplanes. Corollary 2. (Liapounov) (E X r ) 1/r is in r for r 0.

By Jensen s inequality, since X r is convex in X for r 1 we have (E X ) r E X r so E X (E X r ) 1/r. Now replace X by X q for 0 < q < r, so (E X q ) 1/q (E X rq ) 1/rq (E X s ) 1/s where s = rq, for 0 < q < s < since r 1 so q rq = s. Remark: This is usually called the moment inequality, obviously it implies that if X has a r th moment, it also has a q th moment for q < r in the sense that the r th is finite if the q th is. 3 2. Exponential Inequalities An important class of further exponential inequalities play a critical role in the asymptotic theory of empirical processes. They involve approximating the tail probabilities of sums of independent r.v.s. See, Pollard (Appendix B) for further details. Lemma 1. (Feller, I 3rd ed, p. 175) As x 1 Φ(x) x 1 φ(x) or more precisely, for any x > 0 (x 1 x 3 )φ(x) 1 Φ(x) < x 1 φ(x) Clearly for x > 0, (1 3x 4 )φ(x) φ(x) (1 + x 2 )φ(x) the result follows by integrating from x to. Hint: It is obviously easier to just differentiate the expression in the statement of the theorem and then use: φ (x) = xφ(x). Now we would like to extend this result to sums of independent r.v. s. By the Bernstein form of the Markov inequality we have for S = X i, ( ) P (S γ) e tγ Ee ts = e tγ n For X i iid N (0, 1) the situation is simple, we have for all i, Ee tx i = e 1 2 t2 We can choose t to minimize the bound, i.e., { n } min t 2 t2 tγ t = γ/n so we have the bound, P (S γ) e γ2 /2n i=1 Ee tx i For non-normal X i we need to work a bit harder. Here is one of the first results along these lines. Theorem 4. (Hoeffding) Let X 1,..., X n be independent r.v. s with EX i = 0 and for each i P (X i [a i, b i ]) = 1. For each γ > 0 and S = X i P (S γ) exp{ 2γ 2 / (b i a i ) 2 }

4 P(Z>z) 1e 06 1e 04 1e 02 1e+00 Actual Feller Chebyshev 0 1 2 3 4 5 z Figure 1. Approximation for the normal tail probability By convexity Taking expectations, and using EX = 0 Ee tx e ta ( ( ) ( ) b X X a e tx e ta + e tb b a b a b b a ) e tb ( ) a b a Now set α = 1 β = a/(b a) and u = t(b a) and rewrite this as log Ee tx log(βe αu + αe βu ) or, factoring out e αu, using α + β = 1, and writing log Ee tx as ϕ(u) leaving the dependence of u on t implicit, ϕ(u) αu + log(β + αe u )

5 Expanding where so ϕ(u) = ϕ(0) + uϕ (0) + 1 2 u2 ϕ (u ) ϕ (u) = α + α/(α + βe u ) ϕ (u) = αβe u /(α + βe u ) 2 ( ) ( ) α βe u = α + βe u α + βe u 1 4 ϕ(u) = 1 2 u2 ϕ (u ) 1 8 u2 = 1 8 t2 (b a) 2 Applying this inequality to each X i and then using the Bernstein inequality as in the normal case above, we have, P (S γ) e tγ exp{ 1 8 t2 (b i a i ) 2 } Now minimizing with respect to t as before we have t = 4γ/ (b i a i ) 2 which yields the result. Corollary 3. Under the same conditions, P ( S γ) 2 exp{ 2γ 2 / (b i a i ) 2 } Apply the same argument to X i and combine. Finally, we will state without proof the following generalization, see Pollard for details. Theorem 5. Let X 1,..., X n be independent r.v. s with EX i = 0, EX 2 i = σ 2 i and P ( X i M) = 1. Suppose v = σ 2 i then for each γ > 0, P ( S > γ) 2 exp{ 1 2 γ2 /(v + 1 3 Mγ)} Remark: Note that for standard normal r.v. s we don t need the 1 3Mγ term. 3. Copulae and Fréchet Bounds Given a pair, (X, Y ) of random variables with joint distribution G(x, y) and marginal distribution functions, F (x) and H(y), there is an associated copula function, C(u, v) = G(F 1 (u), H 1 (v)). Copulas are supposed to bind together two things: in linguistics, for example the subject and predicate of a sentence. In our setting F and H are bound together by G. The copula function C : [0, 1] 2 [0, 1] provides a convenient way to describe dependence between X and Y that avoids the usual reliance on normal theory and global notions of correlation. A first, basic question about copulas is: what kind of functions, C, can be valid copulae? Evidently, not just any function F : [0, 1] 2 [0, 1] will do. Required properties of copulae are quite self evident: A copula function C : [0, 1] 2 [0, 1] satisfies: i. C(u, 0) = C(0, v) = 0, for all u and v in [0, 1], ii. C(u, 1) = u and C(1, v) = v for all u and v in [0, 1], iii. For all u 1 < u 2 and v 1 < v 2 in [0, 1], C(u 2, v 2 ) C(u 2, v 1 ) C(u 1, v 2 ) + C(u 1, v 1 ) 0.

6 Note that C(u, v) can be viewed as a distribution function for a pair of random variables, (U, V ), that have uniform marginal distributions. The expression in (iii.) is simply the probability associated with the rectangle defined by the (u, v) s and therefore must be non-negative. (Draw the picture!) Theorem 6. Let C be a copula, then for any (u, v) [0, 1] 2 max{u + v 1, 0} C(u, v) min{u, v}. Property (ii.) implies that C(u, v) C(u, 1) u and C(u, v) C(1, v) v yielding the upper bound, and taking u 1 = u, u 2 = 1, v 1 = v, v 2 = 1 in (iii) implies that C(u, v) u v+1 0 and since C(u, v) 0 we obtain the lower bound. Given a copula function and the associated marginals, Sklar s theorem asserts that we can recover the joint distribution: G(x, y) = C(F (x), H(y)) and thus we obtain the celebrated Fréchet bounds: max{f (x) + H(y) 1, 0} G(x, y) min{f (x), H(y)}. These bounds play an important role in recent developments for partially identified models. Both bounds are sharp in the sense that they can be achieved. For the upper bound his occurs when X and Y are comonotonic that is when Y can be expressed as a deterministic, nondecreasing function of X. The lower bound is achieved when X and Y are countermonotonic, so Y is a deterministic, non-increasing function of X. These two very special cases correspond to the situations in which all of the mass of the copula function is concentrated on a curve connecting opposite corners of the unit square. These special cases correspond to rank correlation of +1 and -1 respectively. The other important special case is independent X and Y, which obviously corresponds to C(u, v) = uv. An interesting, at least to me result about copulae is that one can construct conditional versions by the following differentiation trick. Lemma 2. (Bouyé and Salmon (2002)) C(u, v)/ u = F Y X (F 1 1 (v) FX (u)). From it follows that, P (Y < y, X = x) = F X,Y (x, y)/ x P (Y < y X = x) = (f X (x)) 1 F X,Y (x, y)/ x Y = (f X (x)) 1 C(F X (x), F Y (y))/ x = (f X (x)) 1 C(u, v)/ u F X (x)/ x, where the middle term is evaluated at u = F X (x) and v = F Y (y), and the last term cancels the first. Examples: Roy s (1951) model of occupational choice is the Ur-text for the emerging literature on sample selection in micro-econometrics. In its simplest form individuals can be either fishers or hunters. Productivities in the two occupations have joint distribution function, G(x, y) and marginal distributions, F (x) and H(y). Individuals, in accordance with their self interest choose their most productive occupation, so observed productivity is Z = max{x, Y }, and thus has distribution function G(z) = G(z, z). [Why?] Consequently, by Fréchet, the observed productivity probabilities obey, max{0, F (z) + H(y) 1} G(x, y) min{f (x), H(y)}

As noted above, the upper bound corresponds to comonotonic productivities good hunters make good fishers while the lower bound holds when productivities are countermonotonic so the best hunters are the worst fishers and vice versa. In the absence of further evidence in which individuals are required to demonstrate their prowess in their other occupation, these bounds are all we are able to infer about the joint distribution. Very similar stories can be constructed for binary treatment models in which the occupations correspond to the control and treatment responses in settings in which participants are free to choose between the two options. There is a rapidly expanding literature on copulae much of which has been motivated by finance where more general forms of dependence than the familiar normal, linear correlation models are needed. A standard reference is Nelsen (1999). In survival models with competing risks an important extension of the Fréchet bounds by Peterson (1976) has also proven to be useful. Jim Heckman has some nice notes on the Peterson results posted on the web. 4. References Nelsen, R. B. (1999), An Introduction to Copulas, Springer. Peterson, A.V. (1976) Bounds for a joint distribution function with fixed sub-distribution functions: Application to Competing Risks, PNAS, 73, 11-13, Roy, A.D. (1951) Some Thoughts on the Distribution of Earnings Oxford Economic Papers, 3, 135-146. 7