7 Influence Functions

Size: px
Start display at page:

Download "7 Influence Functions"

Transcription

1 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the direction G is defined by T((1 ǫ)f + ǫg) T(F) L F (G) = lim. (37) ǫ 0 ǫ If G = δ x is a point mass at x then we write L F (x) L F (δ x ) and we call L F (x) the influence function. Thus, T((1 ǫ)f + ǫδ x ) T(F) L F (x) = lim. (38) ǫ 0 ǫ The empirical influence function is defined by L(x) = L Fn (x). Thus, T((1 ǫ) L(x) = lim F n + ǫδ x ) T( F n ). (39) ǫ 0 ǫ Often we drop the subscript F and write L(x) instead of L F (x). 7.2 Theorem. Let T(F) = a(x)df(x) be a linear functional. Then: 22

2 1. L F (x) = a(x) T(F) and L(x) = a(x) T( F n ). 2. For any G, T(G) = T(F) + L F (x)dg(x). (40) 3. L F (x)df(x) = Let τ 2 = L 2 F (x)df(x). Then, τ2 = (a(x) T(F)) 2 df(x) and if τ 2 <, n(t(f) T( Fn )) N(0, τ 2 ). (41) 5. Let τ 2 = 1 n n L 2 (X i ) = 1 n n (a(x i ) T( F n )) 2. (42) Then, τ 2 τ P 2 and ŝe/se P 1 where ŝe = τ/ n and se = V(T( F n )). 6. We have that n(t(f) T( Fn )) τ N(0, 1). (43) Proof. The first three claims follow easily from the definition of the influence function. To prove the fourth 23

3 claim, write T( F n ) = T(F) + = T(F) + 1 n L F (x)d F n (x) n L F (X i ). From the central limit theorem and the fact that L F (x)df(x) = 0, it follows that n(t(f) T( Fn )) N(0, τ 2 ) where τ 2 = L 2 F (x)df(x). The fifth claim follows from the law of large numbers. The final statement follows from the fourth and fifth claims and Slutsky s theorem. The theorem above tells us that the influence function L F (x) behaves like the score function in parametric estimation. To see this, recall that if f(x; θ) is a parametric model, L n (θ) = n f(x i; θ) is the likelihood function and the maximum likelihood estimator θ n is the value of θ that maximizes L n (θ). The score function is s θ (x) = log f(x; θ)/ θ which, under appropriate regularity conditions, satisfies s θ (x)f(x; θ)dx = 0 and V( θ n ) (s θ (x)) 2 f(x; θ)dx/n. Similarly, for the influence function we have that L F (x)df(x) = 0 and and V(T( F n )) L 2 F (x)df(x)/n. 24

4 If the functional T(F) is not linear, then (40) will not hold exactly, but it may hold approximately. 7.3 Theorem. If T is Hadamard differentiable 2 with respect to d(f, G) = sup x F(x) G(x) then n(t( Fn ) T(F)) N(0, τ 2 ) (44) where τ 2 = L F (x) 2 df(x). Also, (T( F n ) T(F)) ŝe where ŝe = τ/ n and τ = 1 n N(0, 1) (45) n L 2 (X i ). (46) We call the approximation (T( F n ) T(F))/ŝe N(0, 1) the nonparametric delta method. From the normal approximation, a large sample confidence interval is T( F n ) ±z α/2 ŝe. This is only a pointwise asymptotic confidence interval. In summary: The Nonparametric Delta Method A 1 α, pointwise asymptotic confidence interval for T(F) is T( F n ) ± z α/2 ŝe (47) 2 Hadamard differentiability is defined in the appendix. 25

5 where ŝe = τ n and τ 2 = 1 n n L 2 (X i ). 7.4 Example (The mean). Let θ = T(F) = x df(x). The plug-in estimator is θ = x d F n (x) = X n. Also, T((1 ǫ)f + ǫδ x ) = (1 ǫ)θ + ǫx. Thus, L(x) = x θ, L(x) = x Xn and ŝe 2 = σ 2 /n where σ 2 = n 1 n (X i X n ) 2. A pointwise asymptotic nonparametric 95 percent confidence interval for θ is X n ± 2 ŝe. Sometimes statistical functionals take the form T(F) = a(t 1 (F),...,T m (F)) for some function a(t 1,..., t m ). By the chain rule, the influence function is where L(x) = m a t i L i (x) T i ((1 ǫ)f + ǫδ x ) T i (F) L i (x) = lim. (48) ǫ 0 ǫ 7.5 Example (Correlation). Let Z = (X, Y ) and let T(F) = E(X µ X )(Y µ Y )/(σ x σ y ) denote the correlation where 26

6 F(x, y) is bivariate. Recall that T(F) = a(t 1 (F), T 2 (F), T 3 (F), T 4 (F), where T 1 (F) = x df(z) T 2 (F) = y df(z) T 3 (F) = xy df(z) T 4 (F) = x 2 df(z) T 5 (F) = y 2 df(z) and a(t 1,..., t 5 ) = It follows from (48) that where x = t 3 t 1 t 2 (t4 t 2 1 )(t 5 t 2 2 ). L(x, y) = xỹ 1 2 T(F)( x2 + ỹ 2 ) x xdf x2 df ( xdf) 2, ỹ = y ydf y2 df ( ydf) Example (Quantiles). Let F be strictly increasing with positive density f. The T(F) = F 1 (p) be the p th quantile. The influence function is (see Exercise 10) { p 1 f(θ) L(x) =, x θ p f(θ), x > θ. The asymptotic variance of T( F n ) is τ 2 n = 1 L 2 p(1 p) (x)df(x) = n nf 2 (θ). (49) 27

7 To estimate this variance we need to estimate the density f. Later we shall see that the bootstrap provides a simpler estimate of the variance. 8 Empirical Probability Distributions This section discusses a generalization of the DKW inequality. The reader may skip this section if desired. Using the empirical cdf to estimate the true cdf is a special case of a more general idea. Let X 1,..., X n P be an iid sample from a probability measure P. Define the empirical probability distribution P n by P n (A) = number of X i A. (50) n We would like to be able to say that P n is close to P in some sense. For a fixed A we know that n P n (A) Binomial(n, p) where p = P(A). By Hoeffding s inequality, it follows that P( P n (A) P(A) > ǫ) 2e 2nǫ2. (51) We would like to extend this to be a statement of the form P ( sup P n (A) P(A) > ǫ ) something small A A 28

8 for some class of sets A. This is exactly what the DKW inequality does by taking A = {A = (, t] : t R}. But DKW is only useful for one-dimensional random variables. We can get a more general inequality by using Vapnik Chervonenkis (VC) theory. Let A be a class of sets. Given a finite set R = {x 1,..., x n } let N A (R) = # { R A : A A } (52) be the number of subsets of R picked out as A varies over A. We say that R is shattered by A if N A (R) = 2 n. The shatter coefficient is defined by s(a, n) = max R F n N A (R) (53) where F n consists of all finite sets of size n. 8.1 Theorem (Vapnik and Chervonenkis, 1971). For any P, n and ǫ > 0, P ( sup P n (A) P(A) > ǫ ) 8s(A, n)e nǫ2 /32. (54) A A Theorem 8.1 is only useful if the shatter coefficients do not grow too quickly with n. This is where VC dimension enters. If s(a, n) = 2 n for all n set VC(A) =. Otherwise, define VC(A) to be the largest k for which s(a, k) = 2 k. We call VC(A) the Vapnik Chervonenkis dimension of A. Thus, the VC-dimension 29

9 is the size of the largest finite set F that is shattered by A. The following theorem shows that if A has finite VCdimension then the shatter coefficients grow as a polynomial in n. 8.2 Theorem. If A has finite VC-dimension v, then In this case, s(a, n) n v + 1. P ( sup P n (A) P(A) > ǫ ) 8(n v + 1)e nǫ2 /32. (55) A A 8.3 Example. Let A = {(, x]; x R}. Then A shatters every one point set {x} but it shatters no set of the form {x, y}. Therefore, VC(A) = 1. Since, P((, x]) = F(x) is the cdf and P n ((, x]) = F n (x) is the empirical cdf, we conclude that P ( sup x F n (x) F(x) > ǫ ) 8(n + 1)e nǫ2 /32 which is looser than the DKW bound. This shows that the bound (54) is not the tightest possible. 8.4 Example. Let A be the set of closed intervals on the real line. Then A shatters S = {x, y} but it cannot shatter sets with three points. Consider S = {x, y, z} where x < y < z. One cannot find an interval A such that A S = {x, z}. So, VC(A) = 2. 30

10 8.5 Example. Let A be all linear half-spaces on the plane. Any three-point set (not all on a line) can be shattered. No four-point set can be shattered. Consider, for example, four points forming a diamond. Let T be the leftmost and rightmost points. This set cannot be picked out. Other configurations can also be seen to be unshatterable. So VC(A) = 3. In general, halfspaces in R d have VC dimension d Example. Let A be all rectangles on the plane with sides parallel to the axes. Any four-point set can be shattered. Let S be a five-point set. There is one point that is not leftmost, rightmost, uppermost or lowermost. Let T be all points in S except this point. Then T can t be picked out. So, we have that VC(A) = 4. 9 Appendix Here are some details about Theorem 7.3. Let F denote all distribution functions and let D denote the linear space generated by F. Write T((1 ǫ)f + ǫg) = T(F + ǫd) where D = G F D. The Gateâux derivative, which we now write as L F (D), is defined by lim ǫ 0 T(F + ǫd) T(F) ǫ L F (D) 0. 31

11 Thus T(F +ǫd) ǫl F (D)+o(ǫ) and the error term o(ǫ) goes to 0 as ǫ 0. Hadamard differentiability requires that this error term be small uniformly over compact sets. Equip D with a metric d. T is Hadamard differentiable at F if there exists a linear functional L F on D such that for any ǫ n 0 and {D, D 1, D 2,...} D such that d(d n, D) 0 and F + ǫ n D n F, lim n 10 Exercises ( T(F + ǫn D n ) T(F) ǫ n L F (D n ) ) = Fill in the details of the proof of Theorem Prove Theorem (Computer experiment.) Generate 100 observations from a N(0,1) distribution. Compute a 95 percent confidence band for the cdf F. Repeat this 1000 times and see how often the confidence band contains the true distribution function. Repeat using data from a Cauchy distribution. 4. Let X 1,..., X n F and let F n (x) be the empirical distribution function. For a fixed x, find the limiting distribution of F n (x). 32

On A-distance and Relative A-distance

On A-distance and Relative A-distance 1 ADAPTIVE COMMUNICATIONS AND SIGNAL PROCESSING LABORATORY CORNELL UNIVERSITY, ITHACA, NY 14853 On A-distance and Relative A-distance Ting He and Lang Tong Technical Report No. ACSP-TR-08-04-0 August 004

More information

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0

8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0 8.7 Taylor s Inequality Math 00 Section 005 Calculus II Name: ANSWER KEY Taylor s Inequality: If f (n+) is continuous and f (n+) < M between the center a and some point x, then f(x) T n (x) M x a n+ (n

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 16: Sample quantiles and their asymptotic properties

Lecture 16: Sample quantiles and their asymptotic properties Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p

More information

Assignment-10. (Due 11/21) Solution: Any continuous function on a compact set is uniformly continuous.

Assignment-10. (Due 11/21) Solution: Any continuous function on a compact set is uniformly continuous. Assignment-1 (Due 11/21) 1. Consider the sequence of functions f n (x) = x n on [, 1]. (a) Show that each function f n is uniformly continuous on [, 1]. Solution: Any continuous function on a compact set

More information

Lecture 18: L-estimators and trimmed sample mean

Lecture 18: L-estimators and trimmed sample mean Lecture 18: L-estimators and trimmed sample mean L-functional and L-estimator For a function J(t) on [0,1], define the L-functional as T (G) = xj(g(x))dg(x), G F. If X 1,...,X n are i.i.d. from F and T

More information

Math 328 Course Notes

Math 328 Course Notes Math 328 Course Notes Ian Robertson March 3, 2006 3 Properties of C[0, 1]: Sup-norm and Completeness In this chapter we are going to examine the vector space of all continuous functions defined on the

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

STAT 830 Non-parametric Inference Basics

STAT 830 Non-parametric Inference Basics STAT 830 Non-parametric Inference Basics Richard Lockhart Simon Fraser University STAT 801=830 Fall 2012 Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830

More information

Springer Texts in Statistics. Advisors: George Casella Stephen Fienberg Ingram Olkin

Springer Texts in Statistics. Advisors: George Casella Stephen Fienberg Ingram Olkin Springer Texts in Statistics Advisors: George Casella Stephen Fienberg Ingram Olkin Springer Texts in Statistics Alfred: Elements of Statistics for the Life and Social Sciences Berger: An Introduction

More information

Stat 710: Mathematical Statistics Lecture 31

Stat 710: Mathematical Statistics Lecture 31 Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:

More information

Lecture 21: Convergence of transformations and generating a random variable

Lecture 21: Convergence of transformations and generating a random variable Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous

More information

Theoretical Statistics. Lecture 1.

Theoretical Statistics. Lecture 1. 1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.

More information

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets

Characterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets Convergence in Metric Spaces Functional Analysis Lecture 3: Convergence and Continuity in Metric Spaces Bengt Ove Turesson September 4, 2016 Suppose that (X, d) is a metric space. A sequence (x n ) X is

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Asymptotic statistics using the Functional Delta Method

Asymptotic statistics using the Functional Delta Method Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random

More information

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x

More information

Measure and Integration: Solutions of CW2

Measure and Integration: Solutions of CW2 Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost

More information

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf

Lecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under

More information

Math 209B Homework 2

Math 209B Homework 2 Math 29B Homework 2 Edward Burkard Note: All vector spaces are over the field F = R or C 4.6. Two Compactness Theorems. 4. Point Set Topology Exercise 6 The product of countably many sequentally compact

More information

Section 8.2. Asymptotic normality

Section 8.2. Asymptotic normality 30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical

More information

Midterm 1. Every element of the set of functions is continuous

Midterm 1. Every element of the set of functions is continuous Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions

More information

Uniform laws of large numbers 2

Uniform laws of large numbers 2 C H A P T E R 4 Uniform laws of large numbers The focus of this chapter is a class of results known as uniform laws of large numbers. 3 As suggested by their name, these results represent a strengthening

More information

Lecture 32: Asymptotic confidence sets and likelihoods

Lecture 32: Asymptotic confidence sets and likelihoods Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions

More information

The Arzelà-Ascoli Theorem

The Arzelà-Ascoli Theorem John Nachbar Washington University March 27, 2016 The Arzelà-Ascoli Theorem The Arzelà-Ascoli Theorem gives sufficient conditions for compactness in certain function spaces. Among other things, it helps

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

MAS331: Metric Spaces Problems on Chapter 1

MAS331: Metric Spaces Problems on Chapter 1 MAS331: Metric Spaces Problems on Chapter 1 1. In R 3, find d 1 ((3, 1, 4), (2, 7, 1)), d 2 ((3, 1, 4), (2, 7, 1)) and d ((3, 1, 4), (2, 7, 1)). 2. In R 4, show that d 1 ((4, 4, 4, 6), (0, 0, 0, 0)) =

More information

7. Let X be a (general, abstract) metric space which is sequentially compact. Prove X must be complete.

7. Let X be a (general, abstract) metric space which is sequentially compact. Prove X must be complete. Math 411 problems The following are some practice problems for Math 411. Many are meant to challenge rather that be solved right away. Some could be discussed in class, and some are similar to hard exam

More information

COMS 4771 Introduction to Machine Learning. Nakul Verma

COMS 4771 Introduction to Machine Learning. Nakul Verma COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric

More information

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,

More information

STAT Sample Problem: General Asymptotic Results

STAT Sample Problem: General Asymptotic Results STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative

More information

Chapter 2: Resampling Maarten Jansen

Chapter 2: Resampling Maarten Jansen Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,

More information

Principles of Real Analysis I Fall VII. Sequences of Functions

Principles of Real Analysis I Fall VII. Sequences of Functions 21-355 Principles of Real Analysis I Fall 2004 VII. Sequences of Functions In Section II, we studied sequences of real numbers. It is very useful to consider extensions of this concept. More generally,

More information

THE INVERSE FUNCTION THEOREM

THE INVERSE FUNCTION THEOREM THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)

More information

Advanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x

Advanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x . Define f n, g n : [, ] R by f n (x) = Advanced Calculus Math 27B, Winter 25 Solutions: Final nx2 + n 2 x, g n(x) = n2 x 2 + n 2 x. 2 Show that the sequences (f n ), (g n ) converge pointwise on [, ],

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

Understanding Generalization Error: Bounds and Decompositions

Understanding Generalization Error: Bounds and Decompositions CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

Problem Set 5: Solutions Math 201A: Fall 2016

Problem Set 5: Solutions Math 201A: Fall 2016 Problem Set 5: s Math 21A: Fall 216 Problem 1. Define f : [1, ) [1, ) by f(x) = x + 1/x. Show that f(x) f(y) < x y for all x, y [1, ) with x y, but f has no fixed point. Why doesn t this example contradict

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

Section 9: Generalized method of moments

Section 9: Generalized method of moments 1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i

More information

Solutions Final Exam May. 14, 2014

Solutions Final Exam May. 14, 2014 Solutions Final Exam May. 14, 2014 1. (a) (10 points) State the formal definition of a Cauchy sequence of real numbers. A sequence, {a n } n N, of real numbers, is Cauchy if and only if for every ɛ > 0,

More information

Fourth Week: Lectures 10-12

Fourth Week: Lectures 10-12 Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore

More information

Limiting Distributions

Limiting Distributions We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results

More information

Case study: stochastic simulation via Rademacher bootstrap

Case study: stochastic simulation via Rademacher bootstrap Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic

More information

Lecture 2: Uniform Entropy

Lecture 2: Uniform Entropy STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

δ -method and M-estimation

δ -method and M-estimation Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size

More information

Machine Learning 4771

Machine Learning 4771 Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines

More information

FORMULATION OF THE LEARNING PROBLEM

FORMULATION OF THE LEARNING PROBLEM FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

8 Laws of large numbers

8 Laws of large numbers 8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable

More information

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg

Metric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg Metric Spaces Exercises Fall 2017 Lecturer: Viveka Erlandsson Written by M.van den Berg School of Mathematics University of Bristol BS8 1TW Bristol, UK 1 Exercises. 1. Let X be a non-empty set, and suppose

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Immerse Metric Space Homework

Immerse Metric Space Homework Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps

More information

Masters Comprehensive Examination Department of Statistics, University of Florida

Masters Comprehensive Examination Department of Statistics, University of Florida Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics

Chapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,

More information

Stat 5101 Notes: Algorithms

Stat 5101 Notes: Algorithms Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................

More information

Asymptotic Statistics-III. Changliang Zou

Asymptotic Statistics-III. Changliang Zou Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (

More information

VC Dimension and Sauer s Lemma

VC Dimension and Sauer s Lemma CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

Parametric Models: from data to models

Parametric Models: from data to models Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:

More information

Simulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data

Simulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data Simulation Alberto Ceselli MSc in Computer Science Univ. of Milan Part 4 - Statistical Analysis of Simulated Data A. Ceselli Simulation P.4 Analysis of Sim. data 1 / 15 Statistical analysis of simulated

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

λ(x + 1)f g (x) > θ 0

λ(x + 1)f g (x) > θ 0 Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where

More information

Theoretical Statistics. Lecture 14.

Theoretical Statistics. Lecture 14. Theoretical Statistics. Lecture 14. Peter Bartlett Metric entropy. 1. Chaining: Dudley s entropy integral 1 Recall: Sub-Gaussian processes Definition: A stochastic process θ X θ with indexing set T is

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

Functional Analysis HW #3

Functional Analysis HW #3 Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform

More information

be the set of complex valued 2π-periodic functions f on R such that

be the set of complex valued 2π-periodic functions f on R such that . Fourier series. Definition.. Given a real number P, we say a complex valued function f on R is P -periodic if f(x + P ) f(x) for all x R. We let be the set of complex valued -periodic functions f on

More information

SDS : Theoretical Statistics

SDS : Theoretical Statistics SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff

More information

Metric Spaces Lecture 17

Metric Spaces Lecture 17 Metric Spaces Lecture 17 Homeomorphisms At the end of last lecture an example was given of a bijective continuous function f such that f 1 is not continuous. For another example, consider the sets T =

More information

Exercises and Answers to Chapter 1

Exercises and Answers to Chapter 1 Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean

More information

Exercises from other sources REAL NUMBERS 2,...,

Exercises from other sources REAL NUMBERS 2,..., Exercises from other sources REAL NUMBERS 1. Find the supremum and infimum of the following sets: a) {1, b) c) 12, 13, 14, }, { 1 3, 4 9, 13 27, 40 } 81,, { 2, 2 + 2, 2 + 2 + } 2,..., d) {n N : n 2 < 10},

More information

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă

STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani

More information

1. Is the set {f a,b (x) = ax + b a Q and b Q} of all linear functions with rational coefficients countable or uncountable?

1. Is the set {f a,b (x) = ax + b a Q and b Q} of all linear functions with rational coefficients countable or uncountable? Name: Instructions. Show all work in the space provided. Indicate clearly if you continue on the back side, and write your name at the top of the scratch sheet if you will turn it in for grading. No books

More information

Continuity. Chapter 4

Continuity. Chapter 4 Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of

More information

e x = 1 + x + x2 2! + x3 If the function f(x) can be written as a power series on an interval I, then the power series is of the form

e x = 1 + x + x2 2! + x3 If the function f(x) can be written as a power series on an interval I, then the power series is of the form Taylor Series Given a function f(x), we would like to be able to find a power series that represents the function. For example, in the last section we noted that we can represent e x by the power series

More information

Lecture 4: Completion of a Metric Space

Lecture 4: Completion of a Metric Space 15 Lecture 4: Completion of a Metric Space Closure vs. Completeness. Recall the statement of Lemma??(b): A subspace M of a metric space X is closed if and only if every convergent sequence {x n } X satisfying

More information

McGill University Math 354: Honors Analysis 3

McGill University Math 354: Honors Analysis 3 Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Solutions Final Exam May. 14, 2014

Solutions Final Exam May. 14, 2014 Solutions Final Exam May. 14, 2014 1. Determine whether the following statements are true or false. Justify your answer (i.e., prove the claim, derive a contradiction or give a counter-example). (a) (10

More information

The Delta Method and Applications

The Delta Method and Applications Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order

More information

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n

converges as well if x < 1. 1 x n x n 1 1 = 2 a nx n Solve the following 6 problems. 1. Prove that if series n=1 a nx n converges for all x such that x < 1, then the series n=1 a n xn 1 x converges as well if x < 1. n For x < 1, x n 0 as n, so there exists

More information

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models

The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak

More information

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it

Statistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students

More information