Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018

Similar documents
h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Entrance Exam, Real Analysis September 1, 2017 Solve exactly 6 out of the 8 problems

Exercises Measure Theoretic Probability

Midterm 1. Every element of the set of functions is continuous

Exercises Measure Theoretic Probability

X n D X lim n F n (x) = F (x) for all x C F. lim n F n(u) = F (u) for all u C F. (2)

The main results about probability measures are the following two facts:

Probability and Measure

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

Math 421, Homework #9 Solutions

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

STAT 7032 Probability Spring Wlodek Bryc

Lecture Notes on Metric Spaces

Real Analysis Math 131AH Rudin, Chapter #1. Dominique Abdi

FUNCTIONAL ANALYSIS HAHN-BANACH THEOREM. F (m 2 ) + α m 2 + x 0

Lecture 2. We now introduce some fundamental tools in martingale theory, which are useful in controlling the fluctuation of martingales.

3 Measurable Functions

MATH 202B - Problem Set 5

7 Convergence in R d and in Metric Spaces

1 Stat 605. Homework I. Due Feb. 1, 2011

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Lecture 8: Information Theory and Statistics

Selçuk Demir WS 2017 Functional Analysis Homework Sheet

Lecture 7 Introduction to Statistical Decision Theory

Chapter 5. Measurable Functions

WHY SATURATED PROBABILITY SPACES ARE NECESSARY

Lectures 22-23: Conditional Expectations

Math 5051 Measure Theory and Functional Analysis I Homework Assignment 3

The Caratheodory Construction of Measures

SEPARABILITY AND COMPLETENESS FOR THE WASSERSTEIN DISTANCE

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

(2) E M = E C = X\E M

Review of measure theory

I. ANALYSIS; PROBABILITY

Probability Theory. Richard F. Bass

Integration on Measure Spaces

Math 321 Final Examination April 1995 Notation used in this exam: N. (1) S N (f,x) = f(t)e int dt e inx.

Math 209B Homework 2

02. Measure and integral. 1. Borel-measurable functions and pointwise limits

Consequences of the Completeness Property

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Math 117: Infinite Sequences

Tools from Lebesgue integration

Some Background Material

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Maths 212: Homework Solutions

Overview of normed linear spaces

2.2 Some Consequences of the Completeness Axiom

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

The Arzelà-Ascoli Theorem

MATS113 ADVANCED MEASURE THEORY SPRING 2016

Product measures, Tonelli s and Fubini s theorems For use in MAT4410, autumn 2017 Nadia S. Larsen. 17 November 2017.

Notes on Measure, Probability and Stochastic Processes. João Lopes Dias

Problem List MATH 5143 Fall, 2013

van Rooij, Schikhof: A Second Course on Real Functions

Stochastic integration. P.J.C. Spreij

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

1.4 Outer measures 10 CHAPTER 1. MEASURE

Fundamental Inequalities, Convergence and the Optional Stopping Theorem for Continuous-Time Martingales

Metric Spaces Math 413 Honors Project

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Metric Spaces and Topology

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

Analysis Comprehensive Exam Questions Fall F(x) = 1 x. f(t)dt. t 1 2. tf 2 (t)dt. and g(t, x) = 2 t. 2 t

Part II Probability and Measure

Math 140A - Fall Final Exam

Hints/Solutions for Homework 3

Probability and Measure

7 About Egorov s and Lusin s theorems

1. Let A R be a nonempty set that is bounded from above, and let a be the least upper bound of A. Show that there exists a sequence {a n } n N

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

Problem set 1, Real Analysis I, Spring, 2015.

Math 61CM - Solutions to homework 6

λ(x + 1)f g (x) > θ 0

Stochastic Processes. Winter Term Paolo Di Tella Technische Universität Dresden Institut für Stochastik

Thus f is continuous at x 0. Matthew Straughn Math 402 Homework 6

PROBLEMS. (b) (Polarization Identity) Show that in any inner product space

Weak convergence. Amsterdam, 13 November Leiden University. Limit theorems. Shota Gugushvili. Generalities. Criteria

Immerse Metric Space Homework

Mathematical Methods for Neurosciences. ENS - Master MVA Paris 6 - Master Maths-Bio ( )

Exercises to Applied Functional Analysis

ABSTRACT INTEGRATION CHAPTER ONE

PROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS

Problem Set 2: Solutions Math 201A: Fall 2016

Introduction to Proofs in Analysis. updated December 5, By Edoh Y. Amiran Following the outline of notes by Donald Chalice INTRODUCTION

ADVANCED PROBABILITY: SOLUTIONS TO SHEET 1

2. Function spaces and approximation

A D VA N C E D P R O B A B I L - I T Y

On the convergence of sequences of random variables: A primer

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

Homework 1 due on Monday September 8, 2008

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

Empirical Processes: General Weak Convergence Theory

Product measure and Fubini s theorem

4th Preparation Sheet - Solutions

Chapter 4. Measure Theory. 1. Measure Spaces

CHAPTER 1. Metric Spaces. 1. Definition and examples

Summer Jump-Start Program for Analysis, 2012 Song-Ying Li

1 Inner Product Space

Transcription:

Homework Assignment #2 for Prob-Stats, Fall 2018 Due date: Monday, October 22, 2018 Topics: consistent estimators; sub-σ-fields and partial observations; Doob s theorem about sub-σ-field measurability; conditional expectations as meansquared loss optimal estimators; the Bridge-Crossing Lemma and conditional distributions; Jensen s inequality and expected utility maximization. Readings: Bierens: Ch. 3.1-2, 3.4-5. CB: Ch. 2; Ch. 8.1-3; and Ch. 10.1. CSZ: Ch. 6.6; Ch. 7.6; Ch. 8.4. A. Let X be a measurable function taking values in R. E X1 X>0 = 0 if and only if P (X > 0) = 0. [Despite its simplicity, this will be a valuable factoid for working with conditional distributions.] B. This problem will take you through several aspects and applications of the probability integral transform. The end goal of this material, in a later problem set, is Skorohod s representation theorem. A note: there are two results having that name, one involves stopping times and Brownian motions; the one we re looking at shows that you can replace a weaker kind of convergence of random variables with almost everywhere convergence. 1. CB, Exercise 2.10. 2. We say that the random variable X first order stochastically dominates the random variable Y if for all r R, P (X > r) P (Y > r). Suppose that X FOSDs Y, let X : (0, 1) R and Y : (0, 1) R denote the probability integral transform of X and Y. Show that X (r) Y (r) for all r (0, 1). 3. X FOSDs Y if and only if u(x(ω)) dp (ω) u(y (ω)) dp (ω) for all bounded, non-decreasing u : R R. 4. If X n X almost everywhere, then for all bounded continuous f : R R, f(x n (ω)) dp (ω) f(x(ω)) dp (ω). 5. If for all bounded continuous f : R R, f(xn (ω)) dp (ω) f(x(ω)) dp (ω), then for all closed F R, lim supn P (X n F ) P (X F ). [Hint: the functions f(r) = max{0, 1 n d(r, F )} are continuous and bounded; what do you learn from integrating them?] 6. If for all closed interval F r := (, r], lim sup n P (X n F r ) P (X F r ), then the probability integral transforms of the X n converge almost everywhere to the probability integral transform of X. C. The reading by Breiman et al. (1964) shows that a set of probabilities Π on an observation space is strongly 0-1 if and only if there exists a consistent sequence of estimators for the p Π. This problem takes you through the most basic of such situtations. The space of observations for this problem is X := {0, 1} N. For each θ Θ = [0, 1], let p θ be the distribution on X defined by {proj n ( ) : n N} is an iid collection of Bernoulli(θ) distributions. In case it is 1

needed for later notational purposes, let Y n (x) = proj n (x). Let X n be the smallest σ-field on {0, 1} N making each proj m measurable, m n. Let X be the smallest σ-field containing all of the X n. 1. Show that the functions 1 x lim inf n n i n proj 1 i(x) and x lim sup n n are X -measurable. 2. Show that for each θ [0, 1], the set E θ := {x X : lim n 1 n i n proj i(x) = θ i n proj i(x) is measurable. 3. Show that the set E = {x {0, 1} N 1 : lim n n i n proj i(x) exists } is measurable. 4. Show that for each θ Θ, p θ (E θ ) = 1. 5. Show that θ n (Y 1,..., Y n ) = 1 n i n Y i is a consistent sequence of estimators for θ. 6. A set of probabilities Π on a measure space of observations, (X, X ), is strongly zero-one if there exists a measurable E X and an onto Φ : E Π such that for all p Π, p(φ 1 (p)) = 1. Show that Π = {p θ : θ Θ} is strongly zero-one. [A consistent sequence of estimators for the elements of Π exists if and only if it is strongly zero-one.] D. [The space of observations as a metric space] On X = {0, 1} N, consider the metric d(x, y) = n proj n(x) proj n (y) /2 n. We use the σ-fields X n and X from the previous problem. 1. Let x α be a sequence in X. Show that d(x α, x) 0 if and only if ( M N)( A)( α A)( m M)[proj m (x α ) = proj m (x)]. 2. Show that (X, d) is a compact metric space, hence is both complete and separable. [A metric space is said to be separable if it has a countable dense subset.] 3. Characterize the compact subsets of X. 4. Show that f : X R is continuous if and only if for each ɛ > 0, there exists an M N and an X M -measurable function g such that max x X f(x) g(x) < ɛ. [The function g is called finitely-determined, so this is saying that all continuous functions on X are nearly finitelydetermined.] 5. Show that X is the Borel σ-field on (X, d). 6. Let X = n X n. Show that this is the smallest field containing all of the X n, and that it is not a σ-field. 7. Show that every p θ is countably additive on the field X, that is, show that for every E n in X, p θ (E n ) 0. Comments: Carathéodory s extension theorem tells us that if p is a countably additive probability on a field F and F = σ(f ), the p is a unique countably additive extension to F. 2

E. Let (M, d) be a separable metric space with Borel σ-field M, and let (Ω, F, P ) be our usual probability space, i.e. Ω, F a σ-field of subsets of Ω, and P : F [0, 1] a countably additive probability. 1. Suppose that X : Ω M is measurable and p is the image law of X, i.e. p(a) = P (X A) for all A M. Show that p is countably additive. 2. If p is a countably additive probability on the Borel σ-field, M, then for all E M, p(e) = sup{p(f ) : F E, F closed }. [This is a good sets argument.] 3. If p and q are countably additive probabilities on the Borel σ-field, then p = q if and only if p(f ) = q(f ) for all closed F. 4. We say that a probability p is tight if for every ɛ > 0, there exists a compact K with p(k) > (1 ɛ). If p is tight and p and q are countably additive probabilities on the Borel σ-field, then p = q if and only if p(k) = q(k) for all compact K. 5. If X : Ω R is a measurable random variable and p(a) = P (X A) is the distribution of X, then p is tight. 6. If (M, d) is a separable metric space and p and q are countably ad- probabilities on the Borel σ-field, then p = q if and only if ditive M f(x) dp(x) = M f(x) dq(x) for all bounded, Lipschitz continuous f : M R. F. [The Neyman-Pearson Lemma] Suppose that X 1,..., X n {0, 1} n has the Bernoulli density f(x θ) for some θ Θ = {θ 0, θ 1 }. Ahead of time, we assign probability β (0, 1) to θ 0 and probability 1 β to θ 1. After observing the data, we are going to take an action a {0, 1} to solve max a {0,1} E (u(a, θ) X 1,..., X n ) where if θ = θ 0, then a = 0 is the better choice and if if θ = θ 1, then a = 1 is the better choice. Specifically, we are going to suppose that the utilities are a θ θ 0 θ 1 a = 0 u(0, 0) u(0, 1) a = 1 u(1, 0) u(1, 1) where u(0, 0) > u(1, 0) and u(1, 1) > u(1, 0). 1. Show that for any function f(θ) and probability q of θ 0, the set of solutions to the following two problems are the same, max a {0,1} u(a, θ) and max a {0,1} [u(a, θ) + f(θ)]. 2. Let f(θ 0 ) = u(1, 0) and f(θ 1 ) = u(0, 1) and rewrite the payoff matrix above in the form a θ θ 0 θ 1 a = 0 r 0 a = 1 0 s where r, s > 0. 3

3. Give the set of data realizations, x {0, 1} n, for which the optimal decision is a = 0 and for which it is a = 1. 4. Express your regions above in terms of likelihood ratios. 5. Show that for large n, the effect of β washes out. 6. Compare the optimal regions with β = 1 2 to the Neyman-Pearson lemma s probability of Type I and Type II error. G. [Bernoulli estimations with missing data] Suppose that (Y i, Z i ) {0, 1} {0, 1}, i = 1,..., n, are iid. Let θ Y = E Y i, θ 1 = E (Y i Z i = 1), θ 0 = E (Y i Z i = 0), and θ Z = E Z i. Assume that all four θ s belong to the open interval (0, 1). 1. Assuming that the entire vector (Y i, Z i ) is observable, give consistent, unbiased estimators of the θ s and show that they are consistent and unbiased. 2. Now suppose that when Z i = 0, the value of Y i cannot be observed. That is, suppose that what is observed is X i defined, for some constant g, as { (Y i, 1) if Z i = 1 X i = (g, 0) if Z i = 0. Your previous consistent unbiased estimators of θ 1 and θ Z are still consistent and unbiased. Supposing that there is enough data to pin down θ 1 and θ Z. Give the range of possible values of θ Y as a function of the other three θ s. Explain the patterns of dependence of the range. 3. Suppose now that we know that θ 1 θ 0 b for some 0 b < 2. Give the range of possible values of θ Y as a function of b and the other three θ s. 4. Suppose now that the Y i and Z i are independent. How does this change the previous answer? H. [L 2 -best estimators] Let L 2 = {X L 0 : X 2 (ω) dp (ω) < }. For any X, Y L 2, define X, Y = X(ω)Y (ω) dp (ω). For any X L 2, define X 2 = X, X, and for any X, Y L 2, define d(x, Y ) = X Y 2. 1. Solve min r Y r 2. 2. If Y = n N γ n1 An and X = m M β n1 Bn, solve min g Y g(x) 2 where g : R R. The random variable g(x) is called the expectation of Y conditional on X. It is denoted E (Y X) or E (Y σ(x)). 3. If Y L 2 and X = m M β n1 Bn, solve min g Y g(x) 2 where g : R R. The random variable g(x) is called the expectation of Y conditional on X. It is denoted E (Y X) or E (Y σ(x)). 4. If Y n Y 2 0, and X = m M β n1 Bn, then E (Y n X) E (Y X) 2 0. 5. For X, Y L 2 and X n a sequence of simple functions in L 2 with X n X 2 0, show that E (Y X n ) is a Cauchy sequence in L 2. Its limit is denoted E (Y X). 6. Bierens 3.6.10 (p. 83). 4

I. Random variables X, Y are independent, denoted X Y if for all measurable A, B, P ((X A) (Y B)) = P (X A) P (Y B). If X, Y are in L 2, they are orthogonal, denoted X Y, if X, Y = 0. Independence can be understood as a very strong form of independence. 1. Suppose that X, Y L 2 and X Y. Solve the problem min α,β R Y (α + βx) 2. 2. For X, Y L 2, X E X, Y E Y = 0 if and only if E XY = (E X) (E Y ). 3. If X and Y are simple, show that X Y if and only if for all bounded (measurable) f, g : R R, f(x) E f(x), g(y ) E g(y ) = 0. 4. If X and Y belong to L 2, show that X Y if and only if for all bounded measurable f, g : R R, f(x) E f(x), g(y ) E g(y ) = 0. 5. If X, Y L 2 and X Y, solve the problem min g Y g(x) 2 where g is a measurable function satisfying g(x) L 2. J. Suppose that for some (α, β ) Θ = R 2 and some X L 2, Y = α + β X + ε where E ε = 0 and ε X. 1. Let Xn L be the Lebesgue sequence of simple approximations to X. Solve min α,β Y (α + βxn L ) 2. 2. Show that your solutions, (α n, β n ), to the previous problem converge to (α, β ). K. Yield per acre is a random variable Y and fertilizer per acre is a random variable X. Let h(x) = E (Y X = x). Suppose that h (x) > 0 but that ph (x) > c where p is the price of the crop and c the cost of the fertilizer. This looks like an argument that the farmers are under-utilizing fertilizer. This problem is about omitted variable bias. However, suppose that yield per acre satisfies Y = f(x, Q) + ε, ε X, Q, where Q is the quality of the acre. Suppose that f/ x > 0, f/ q > 0, and that 2 f/ x q > 0, i.e. suppose that f(, ) is increasing and supermodular. Suppose also that farmers know their Q but that it is random to the observers/econometricians who are forming their conditional expectations. The risk-neutral farmers pick x(q) to solve x = arg max q E(pf(x, q) + ε) cx. Compare the true marginal product of fertilizer in a field that uses an amount X = x to h (x). L. [Doob s Theorem] Let (Ω, F) and let (Y, Y) be non-empty sets and σ- fields of subsets. The most frequent class of sub-σ-fields, G F, that we will encounter arise from a measurable g : Ω Y, they are of the form G = g 1 (Y). Let B denote the Borel σ-field on R, that is, the smallest σ-field of subsets of R containing the open subsets of R. This problem shows that if f : Ω [0, 1] has the property that f 1 (B) G, then there exists a measurable h : Y R such that f(ω) = h(g(ω)). Thus, the only G-measurable functions are in fact functions of g, the function g contains everything (measurable) that one could ever get from G. 5

1. Give an elementary proof of the assertion if g(ω) is a finite set G = {y 1,..., y N } and {y n } Y for each n. 2. Show that G is a σ-field. 3. Show that A i,n := {ω : f(ω) [i/2 n, (i + 1)/2 n )} is of the form g 1 (B i,n ) for some B i,n Y. 4. Define the functions f n : Ω [0, 1] and h n : Y [0, 1] by f n = 2 n i=1 i 2 n 1 Ai,n and h n = 2 n i=1 i 2 n 1 Bi,n. Show that for all ω, f n (ω) = h n (g(ω)). 5. Define h(y) = lim sup n h n (y). Show that for all ω, f(ω) = h(g(ω)). 6. In the previous step, why couldn t we define h(y) = lim n h n (y)? M. [I ll cross that bridge when I come to it] Suppose that X and S are simple random variables taking values in {x 1,..., x N } and s 1,..., s M. The time line for the problem is that the value of S is observed before X is known, and an action a A must be picked. We suppose that A is a compact metric space, and that for each x n, u(, x n ) is a continuous function. Amongst the elements of A S, let the function a ( ) solve the problem max a( ) E (u(a (S), X)). The functions a( ) are called, for hopefully obvious reasons, complete contingent plans. Define P ( S = s m ) ({x 1,..., x N }) as the usual conditional probability. 1. Show that a ( ) is a solution if and only if for all s m with P (S = s m ) > 0, a (s m ) solves the problem max a A x n u(a, x n )P (x n s m ). 2. Check that the average conditional probability of any event X A is the prior probability, P (X A). That is, P (X A) = m P (X A S = s m ) P (S = s m ). Comments: Conditional probabilities map s m s to ({x 1,..., x n }). When S and X are more general random variables, we will want a measurable function from the range of S to (R) that also solves these kinds of optimization problems. For example, we are often interested in seeing how educational procedures affect the performance of the median and the bottom and top quartiles. Knowing P ( S = s) answers this, and many other questions. 6