7 Influence Functions
|
|
- Henry Pierce
- 5 years ago
- Views:
Transcription
1 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the direction G is defined by T((1 ǫ)f + ǫg) T(F) L F (G) = lim. (37) ǫ 0 ǫ If G = δ x is a point mass at x then we write L F (x) L F (δ x ) and we call L F (x) the influence function. Thus, T((1 ǫ)f + ǫδ x ) T(F) L F (x) = lim. (38) ǫ 0 ǫ The empirical influence function is defined by L(x) = L Fn (x). Thus, T((1 ǫ) L(x) = lim F n + ǫδ x ) T( F n ). (39) ǫ 0 ǫ Often we drop the subscript F and write L(x) instead of L F (x). 7.2 Theorem. Let T(F) = a(x)df(x) be a linear functional. Then: 22
2 1. L F (x) = a(x) T(F) and L(x) = a(x) T( F n ). 2. For any G, T(G) = T(F) + L F (x)dg(x). (40) 3. L F (x)df(x) = Let τ 2 = L 2 F (x)df(x). Then, τ2 = (a(x) T(F)) 2 df(x) and if τ 2 <, n(t(f) T( Fn )) N(0, τ 2 ). (41) 5. Let τ 2 = 1 n n L 2 (X i ) = 1 n n (a(x i ) T( F n )) 2. (42) Then, τ 2 τ P 2 and ŝe/se P 1 where ŝe = τ/ n and se = V(T( F n )). 6. We have that n(t(f) T( Fn )) τ N(0, 1). (43) Proof. The first three claims follow easily from the definition of the influence function. To prove the fourth 23
3 claim, write T( F n ) = T(F) + = T(F) + 1 n L F (x)d F n (x) n L F (X i ). From the central limit theorem and the fact that L F (x)df(x) = 0, it follows that n(t(f) T( Fn )) N(0, τ 2 ) where τ 2 = L 2 F (x)df(x). The fifth claim follows from the law of large numbers. The final statement follows from the fourth and fifth claims and Slutsky s theorem. The theorem above tells us that the influence function L F (x) behaves like the score function in parametric estimation. To see this, recall that if f(x; θ) is a parametric model, L n (θ) = n f(x i; θ) is the likelihood function and the maximum likelihood estimator θ n is the value of θ that maximizes L n (θ). The score function is s θ (x) = log f(x; θ)/ θ which, under appropriate regularity conditions, satisfies s θ (x)f(x; θ)dx = 0 and V( θ n ) (s θ (x)) 2 f(x; θ)dx/n. Similarly, for the influence function we have that L F (x)df(x) = 0 and and V(T( F n )) L 2 F (x)df(x)/n. 24
4 If the functional T(F) is not linear, then (40) will not hold exactly, but it may hold approximately. 7.3 Theorem. If T is Hadamard differentiable 2 with respect to d(f, G) = sup x F(x) G(x) then n(t( Fn ) T(F)) N(0, τ 2 ) (44) where τ 2 = L F (x) 2 df(x). Also, (T( F n ) T(F)) ŝe where ŝe = τ/ n and τ = 1 n N(0, 1) (45) n L 2 (X i ). (46) We call the approximation (T( F n ) T(F))/ŝe N(0, 1) the nonparametric delta method. From the normal approximation, a large sample confidence interval is T( F n ) ±z α/2 ŝe. This is only a pointwise asymptotic confidence interval. In summary: The Nonparametric Delta Method A 1 α, pointwise asymptotic confidence interval for T(F) is T( F n ) ± z α/2 ŝe (47) 2 Hadamard differentiability is defined in the appendix. 25
5 where ŝe = τ n and τ 2 = 1 n n L 2 (X i ). 7.4 Example (The mean). Let θ = T(F) = x df(x). The plug-in estimator is θ = x d F n (x) = X n. Also, T((1 ǫ)f + ǫδ x ) = (1 ǫ)θ + ǫx. Thus, L(x) = x θ, L(x) = x Xn and ŝe 2 = σ 2 /n where σ 2 = n 1 n (X i X n ) 2. A pointwise asymptotic nonparametric 95 percent confidence interval for θ is X n ± 2 ŝe. Sometimes statistical functionals take the form T(F) = a(t 1 (F),...,T m (F)) for some function a(t 1,..., t m ). By the chain rule, the influence function is where L(x) = m a t i L i (x) T i ((1 ǫ)f + ǫδ x ) T i (F) L i (x) = lim. (48) ǫ 0 ǫ 7.5 Example (Correlation). Let Z = (X, Y ) and let T(F) = E(X µ X )(Y µ Y )/(σ x σ y ) denote the correlation where 26
6 F(x, y) is bivariate. Recall that T(F) = a(t 1 (F), T 2 (F), T 3 (F), T 4 (F), where T 1 (F) = x df(z) T 2 (F) = y df(z) T 3 (F) = xy df(z) T 4 (F) = x 2 df(z) T 5 (F) = y 2 df(z) and a(t 1,..., t 5 ) = It follows from (48) that where x = t 3 t 1 t 2 (t4 t 2 1 )(t 5 t 2 2 ). L(x, y) = xỹ 1 2 T(F)( x2 + ỹ 2 ) x xdf x2 df ( xdf) 2, ỹ = y ydf y2 df ( ydf) Example (Quantiles). Let F be strictly increasing with positive density f. The T(F) = F 1 (p) be the p th quantile. The influence function is (see Exercise 10) { p 1 f(θ) L(x) =, x θ p f(θ), x > θ. The asymptotic variance of T( F n ) is τ 2 n = 1 L 2 p(1 p) (x)df(x) = n nf 2 (θ). (49) 27
7 To estimate this variance we need to estimate the density f. Later we shall see that the bootstrap provides a simpler estimate of the variance. 8 Empirical Probability Distributions This section discusses a generalization of the DKW inequality. The reader may skip this section if desired. Using the empirical cdf to estimate the true cdf is a special case of a more general idea. Let X 1,..., X n P be an iid sample from a probability measure P. Define the empirical probability distribution P n by P n (A) = number of X i A. (50) n We would like to be able to say that P n is close to P in some sense. For a fixed A we know that n P n (A) Binomial(n, p) where p = P(A). By Hoeffding s inequality, it follows that P( P n (A) P(A) > ǫ) 2e 2nǫ2. (51) We would like to extend this to be a statement of the form P ( sup P n (A) P(A) > ǫ ) something small A A 28
8 for some class of sets A. This is exactly what the DKW inequality does by taking A = {A = (, t] : t R}. But DKW is only useful for one-dimensional random variables. We can get a more general inequality by using Vapnik Chervonenkis (VC) theory. Let A be a class of sets. Given a finite set R = {x 1,..., x n } let N A (R) = # { R A : A A } (52) be the number of subsets of R picked out as A varies over A. We say that R is shattered by A if N A (R) = 2 n. The shatter coefficient is defined by s(a, n) = max R F n N A (R) (53) where F n consists of all finite sets of size n. 8.1 Theorem (Vapnik and Chervonenkis, 1971). For any P, n and ǫ > 0, P ( sup P n (A) P(A) > ǫ ) 8s(A, n)e nǫ2 /32. (54) A A Theorem 8.1 is only useful if the shatter coefficients do not grow too quickly with n. This is where VC dimension enters. If s(a, n) = 2 n for all n set VC(A) =. Otherwise, define VC(A) to be the largest k for which s(a, k) = 2 k. We call VC(A) the Vapnik Chervonenkis dimension of A. Thus, the VC-dimension 29
9 is the size of the largest finite set F that is shattered by A. The following theorem shows that if A has finite VCdimension then the shatter coefficients grow as a polynomial in n. 8.2 Theorem. If A has finite VC-dimension v, then In this case, s(a, n) n v + 1. P ( sup P n (A) P(A) > ǫ ) 8(n v + 1)e nǫ2 /32. (55) A A 8.3 Example. Let A = {(, x]; x R}. Then A shatters every one point set {x} but it shatters no set of the form {x, y}. Therefore, VC(A) = 1. Since, P((, x]) = F(x) is the cdf and P n ((, x]) = F n (x) is the empirical cdf, we conclude that P ( sup x F n (x) F(x) > ǫ ) 8(n + 1)e nǫ2 /32 which is looser than the DKW bound. This shows that the bound (54) is not the tightest possible. 8.4 Example. Let A be the set of closed intervals on the real line. Then A shatters S = {x, y} but it cannot shatter sets with three points. Consider S = {x, y, z} where x < y < z. One cannot find an interval A such that A S = {x, z}. So, VC(A) = 2. 30
10 8.5 Example. Let A be all linear half-spaces on the plane. Any three-point set (not all on a line) can be shattered. No four-point set can be shattered. Consider, for example, four points forming a diamond. Let T be the leftmost and rightmost points. This set cannot be picked out. Other configurations can also be seen to be unshatterable. So VC(A) = 3. In general, halfspaces in R d have VC dimension d Example. Let A be all rectangles on the plane with sides parallel to the axes. Any four-point set can be shattered. Let S be a five-point set. There is one point that is not leftmost, rightmost, uppermost or lowermost. Let T be all points in S except this point. Then T can t be picked out. So, we have that VC(A) = 4. 9 Appendix Here are some details about Theorem 7.3. Let F denote all distribution functions and let D denote the linear space generated by F. Write T((1 ǫ)f + ǫg) = T(F + ǫd) where D = G F D. The Gateâux derivative, which we now write as L F (D), is defined by lim ǫ 0 T(F + ǫd) T(F) ǫ L F (D) 0. 31
11 Thus T(F +ǫd) ǫl F (D)+o(ǫ) and the error term o(ǫ) goes to 0 as ǫ 0. Hadamard differentiability requires that this error term be small uniformly over compact sets. Equip D with a metric d. T is Hadamard differentiable at F if there exists a linear functional L F on D such that for any ǫ n 0 and {D, D 1, D 2,...} D such that d(d n, D) 0 and F + ǫ n D n F, lim n 10 Exercises ( T(F + ǫn D n ) T(F) ǫ n L F (D n ) ) = Fill in the details of the proof of Theorem Prove Theorem (Computer experiment.) Generate 100 observations from a N(0,1) distribution. Compute a 95 percent confidence band for the cdf F. Repeat this 1000 times and see how often the confidence band contains the true distribution function. Repeat using data from a Cauchy distribution. 4. Let X 1,..., X n F and let F n (x) be the empirical distribution function. For a fixed x, find the limiting distribution of F n (x). 32
On A-distance and Relative A-distance
1 ADAPTIVE COMMUNICATIONS AND SIGNAL PROCESSING LABORATORY CORNELL UNIVERSITY, ITHACA, NY 14853 On A-distance and Relative A-distance Ting He and Lang Tong Technical Report No. ACSP-TR-08-04-0 August 004
More information8.7 Taylor s Inequality Math 2300 Section 005 Calculus II. f(x) = ln(1 + x) f(0) = 0
8.7 Taylor s Inequality Math 00 Section 005 Calculus II Name: ANSWER KEY Taylor s Inequality: If f (n+) is continuous and f (n+) < M between the center a and some point x, then f(x) T n (x) M x a n+ (n
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationLecture 16: Sample quantiles and their asymptotic properties
Lecture 16: Sample quantiles and their asymptotic properties Estimation of quantiles (percentiles Suppose that X 1,...,X n are i.i.d. random variables from an unknown nonparametric F For p (0,1, G 1 (p
More informationAssignment-10. (Due 11/21) Solution: Any continuous function on a compact set is uniformly continuous.
Assignment-1 (Due 11/21) 1. Consider the sequence of functions f n (x) = x n on [, 1]. (a) Show that each function f n is uniformly continuous on [, 1]. Solution: Any continuous function on a compact set
More informationLecture 18: L-estimators and trimmed sample mean
Lecture 18: L-estimators and trimmed sample mean L-functional and L-estimator For a function J(t) on [0,1], define the L-functional as T (G) = xj(g(x))dg(x), G F. If X 1,...,X n are i.i.d. from F and T
More informationMath 328 Course Notes
Math 328 Course Notes Ian Robertson March 3, 2006 3 Properties of C[0, 1]: Sup-norm and Completeness In this chapter we are going to examine the vector space of all continuous functions defined on the
More information1 Glivenko-Cantelli type theorems
STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then
More informationSTAT 830 Non-parametric Inference Basics
STAT 830 Non-parametric Inference Basics Richard Lockhart Simon Fraser University STAT 801=830 Fall 2012 Richard Lockhart (Simon Fraser University)STAT 830 Non-parametric Inference Basics STAT 801=830
More informationSpringer Texts in Statistics. Advisors: George Casella Stephen Fienberg Ingram Olkin
Springer Texts in Statistics Advisors: George Casella Stephen Fienberg Ingram Olkin Springer Texts in Statistics Alfred: Elements of Statistics for the Life and Social Sciences Berger: An Introduction
More informationStat 710: Mathematical Statistics Lecture 31
Stat 710: Mathematical Statistics Lecture 31 Jun Shao Department of Statistics University of Wisconsin Madison, WI 53706, USA Jun Shao (UW-Madison) Stat 710, Lecture 31 April 13, 2009 1 / 13 Lecture 31:
More informationLecture 21: Convergence of transformations and generating a random variable
Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous
More informationTheoretical Statistics. Lecture 1.
1. Organizational issues. 2. Overview. 3. Stochastic convergence. Theoretical Statistics. Lecture 1. eter Bartlett 1 Organizational Issues Lectures: Tue/Thu 11am 12:30pm, 332 Evans. eter Bartlett. bartlett@stat.
More informationCharacterisation of Accumulation Points. Convergence in Metric Spaces. Characterisation of Closed Sets. Characterisation of Closed Sets
Convergence in Metric Spaces Functional Analysis Lecture 3: Convergence and Continuity in Metric Spaces Bengt Ove Turesson September 4, 2016 Suppose that (X, d) is a metric space. A sequence (x n ) X is
More informationEfficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationAsymptotic statistics using the Functional Delta Method
Quantiles, Order Statistics and L-Statsitics TU Kaiserslautern 15. Februar 2015 Motivation Functional The delta method introduced in chapter 3 is an useful technique to turn the weak convergence of random
More informationd(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N
Problem 1. Let f : A R R have the property that for every x A, there exists ɛ > 0 such that f(t) > ɛ if t (x ɛ, x + ɛ) A. If the set A is compact, prove there exists c > 0 such that f(x) > c for all x
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More informationLecture 13: Subsampling vs Bootstrap. Dimitris N. Politis, Joseph P. Romano, Michael Wolf
Lecture 13: 2011 Bootstrap ) R n x n, θ P)) = τ n ˆθn θ P) Example: ˆθn = X n, τ n = n, θ = EX = µ P) ˆθ = min X n, τ n = n, θ P) = sup{x : F x) 0} ) Define: J n P), the distribution of τ n ˆθ n θ P) under
More informationMath 209B Homework 2
Math 29B Homework 2 Edward Burkard Note: All vector spaces are over the field F = R or C 4.6. Two Compactness Theorems. 4. Point Set Topology Exercise 6 The product of countably many sequentally compact
More informationSection 8.2. Asymptotic normality
30 Section 8.2. Asymptotic normality We assume that X n =(X 1,...,X n ), where the X i s are i.i.d. with common density p(x; θ 0 ) P= {p(x; θ) :θ Θ}. We assume that θ 0 is identified in the sense that
More informationAsymptotic Statistics-VI. Changliang Zou
Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous
More informationAsymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.
Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics. Dragi Anevski Mathematical Sciences und University November 25, 21 1 Asymptotic distributions for statistical
More informationMidterm 1. Every element of the set of functions is continuous
Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions
More informationUniform laws of large numbers 2
C H A P T E R 4 Uniform laws of large numbers The focus of this chapter is a class of results known as uniform laws of large numbers. 3 As suggested by their name, these results represent a strengthening
More informationLecture 32: Asymptotic confidence sets and likelihoods
Lecture 32: Asymptotic confidence sets and likelihoods Asymptotic criterion In some problems, especially in nonparametric problems, it is difficult to find a reasonable confidence set with a given confidence
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional
More informationSTAT 512 sp 2018 Summary Sheet
STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationChapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued
Chapter 3 sections Chapter 3 - continued 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.3 The Cumulative Distribution Function 3.4 Bivariate Distributions 3.5 Marginal Distributions
More informationThe Arzelà-Ascoli Theorem
John Nachbar Washington University March 27, 2016 The Arzelà-Ascoli Theorem The Arzelà-Ascoli Theorem gives sufficient conditions for compactness in certain function spaces. Among other things, it helps
More informationL p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by
L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may
More informationMAS331: Metric Spaces Problems on Chapter 1
MAS331: Metric Spaces Problems on Chapter 1 1. In R 3, find d 1 ((3, 1, 4), (2, 7, 1)), d 2 ((3, 1, 4), (2, 7, 1)) and d ((3, 1, 4), (2, 7, 1)). 2. In R 4, show that d 1 ((4, 4, 4, 6), (0, 0, 0, 0)) =
More information7. Let X be a (general, abstract) metric space which is sequentially compact. Prove X must be complete.
Math 411 problems The following are some practice problems for Math 411. Many are meant to challenge rather that be solved right away. Some could be discussed in class, and some are similar to hard exam
More informationCOMS 4771 Introduction to Machine Learning. Nakul Verma
COMS 4771 Introduction to Machine Learning Nakul Verma Announcements HW2 due now! Project proposal due on tomorrow Midterm next lecture! HW3 posted Last time Linear Regression Parametric vs Nonparametric
More information1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).
Estimation February 3, 206 Debdeep Pati General problem Model: {P θ : θ Θ}. Observe X P θ, θ Θ unknown. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ). Examples: θ = (µ,
More informationSTAT Sample Problem: General Asymptotic Results
STAT331 1-Sample Problem: General Asymptotic Results In this unit we will consider the 1-sample problem and prove the consistency and asymptotic normality of the Nelson-Aalen estimator of the cumulative
More informationChapter 2: Resampling Maarten Jansen
Chapter 2: Resampling Maarten Jansen Randomization tests Randomized experiment random assignment of sample subjects to groups Example: medical experiment with control group n 1 subjects for true medicine,
More informationPrinciples of Real Analysis I Fall VII. Sequences of Functions
21-355 Principles of Real Analysis I Fall 2004 VII. Sequences of Functions In Section II, we studied sequences of real numbers. It is very useful to consider extensions of this concept. More generally,
More informationTHE INVERSE FUNCTION THEOREM
THE INVERSE FUNCTION THEOREM W. PATRICK HOOPER The implicit function theorem is the following result: Theorem 1. Let f be a C 1 function from a neighborhood of a point a R n into R n. Suppose A = Df(a)
More informationAdvanced Calculus Math 127B, Winter 2005 Solutions: Final. nx2 1 + n 2 x, g n(x) = n2 x
. Define f n, g n : [, ] R by f n (x) = Advanced Calculus Math 27B, Winter 25 Solutions: Final nx2 + n 2 x, g n(x) = n2 x 2 + n 2 x. 2 Show that the sequences (f n ), (g n ) converge pointwise on [, ],
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results
Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics
More informationUnderstanding Generalization Error: Bounds and Decompositions
CIS 520: Machine Learning Spring 2018: Lecture 11 Understanding Generalization Error: Bounds and Decompositions Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationLecture 2: CDF and EDF
STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all
More informationProblem Set 5: Solutions Math 201A: Fall 2016
Problem Set 5: s Math 21A: Fall 216 Problem 1. Define f : [1, ) [1, ) by f(x) = x + 1/x. Show that f(x) f(y) < x y for all x, y [1, ) with x y, but f has no fixed point. Why doesn t this example contradict
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More informationContinuity. Chapter 4
Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of
More informationSection 9: Generalized method of moments
1 Section 9: Generalized method of moments In this section, we revisit unbiased estimating functions to study a more general framework for estimating parameters. Let X n =(X 1,...,X n ), where the X i
More informationSolutions Final Exam May. 14, 2014
Solutions Final Exam May. 14, 2014 1. (a) (10 points) State the formal definition of a Cauchy sequence of real numbers. A sequence, {a n } n N, of real numbers, is Cauchy if and only if for every ɛ > 0,
More informationFourth Week: Lectures 10-12
Fourth Week: Lectures 10-12 Lecture 10 The fact that a power series p of positive radius of convergence defines a function inside its disc of convergence via substitution is something that we cannot ignore
More informationLimiting Distributions
We introduce the mode of convergence for a sequence of random variables, and discuss the convergence in probability and in distribution. The concept of convergence leads us to the two fundamental results
More informationCase study: stochastic simulation via Rademacher bootstrap
Case study: stochastic simulation via Rademacher bootstrap Maxim Raginsky December 4, 2013 In this lecture, we will look at an application of statistical learning theory to the problem of efficient stochastic
More informationLecture 2: Uniform Entropy
STAT 583: Advanced Theory of Statistical Inference Spring 218 Lecture 2: Uniform Entropy Lecturer: Fang Han April 16 Disclaimer: These notes have not been subjected to the usual scrutiny reserved for formal
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationδ -method and M-estimation
Econ 2110, fall 2016, Part IVb Asymptotic Theory: δ -method and M-estimation Maximilian Kasy Department of Economics, Harvard University 1 / 40 Example Suppose we estimate the average effect of class size
More informationMachine Learning 4771
Machine Learning 477 Instructor: Tony Jebara Topic 5 Generalization Guarantees VC-Dimension Nearest Neighbor Classification (infinite VC dimension) Structural Risk Minimization Support Vector Machines
More informationFORMULATION OF THE LEARNING PROBLEM
FORMULTION OF THE LERNING PROBLEM MIM RGINSKY Now that we have seen an informal statement of the learning problem, as well as acquired some technical tools in the form of concentration inequalities, we
More informationMonte Carlo Studies. The response in a Monte Carlo study is a random variable.
Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating
More information8 Laws of large numbers
8 Laws of large numbers 8.1 Introduction We first start with the idea of standardizing a random variable. Let X be a random variable with mean µ and variance σ 2. Then Z = (X µ)/σ will be a random variable
More informationMetric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg
Metric Spaces Exercises Fall 2017 Lecturer: Viveka Erlandsson Written by M.van den Berg School of Mathematics University of Bristol BS8 1TW Bristol, UK 1 Exercises. 1. Let X be a non-empty set, and suppose
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing
More informationImmerse Metric Space Homework
Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps
More informationMasters Comprehensive Examination Department of Statistics, University of Florida
Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show
More informationEconomics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011
Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure
More informationChapter 6. Order Statistics and Quantiles. 6.1 Extreme Order Statistics
Chapter 6 Order Statistics and Quantiles 61 Extreme Order Statistics Suppose we have a finite sample X 1,, X n Conditional on this sample, we define the values X 1),, X n) to be a permutation of X 1,,
More informationStat 5101 Notes: Algorithms
Stat 5101 Notes: Algorithms Charles J. Geyer January 22, 2016 Contents 1 Calculating an Expectation or a Probability 3 1.1 From a PMF........................... 3 1.2 From a PDF...........................
More informationAsymptotic Statistics-III. Changliang Zou
Asymptotic Statistics-III Changliang Zou The multivariate central limit theorem Theorem (Multivariate CLT for iid case) Let X i be iid random p-vectors with mean µ and and covariance matrix Σ. Then n (
More informationVC Dimension and Sauer s Lemma
CMSC 35900 (Spring 2008) Learning Theory Lecture: VC Diension and Sauer s Lea Instructors: Sha Kakade and Abuj Tewari Radeacher Averages and Growth Function Theore Let F be a class of ±-valued functions
More informationExamples of Dual Spaces from Measure Theory
Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an
More informationParametric Models: from data to models
Parametric Models: from data to models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Jan 22, 2018 Recall: Model-based ML DATA MODEL LEARNING MODEL MODEL INFERENCE KNOWLEDGE Learning:
More informationSimulation. Alberto Ceselli MSc in Computer Science Univ. of Milan. Part 4 - Statistical Analysis of Simulated Data
Simulation Alberto Ceselli MSc in Computer Science Univ. of Milan Part 4 - Statistical Analysis of Simulated Data A. Ceselli Simulation P.4 Analysis of Sim. data 1 / 15 Statistical analysis of simulated
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationλ(x + 1)f g (x) > θ 0
Stat 8111 Final Exam December 16 Eleven students took the exam, the scores were 92, 78, 4 in the 5 s, 1 in the 4 s, 1 in the 3 s and 3 in the 2 s. 1. i) Let X 1, X 2,..., X n be iid each Bernoulli(θ) where
More informationTheoretical Statistics. Lecture 14.
Theoretical Statistics. Lecture 14. Peter Bartlett Metric entropy. 1. Chaining: Dudley s entropy integral 1 Recall: Sub-Gaussian processes Definition: A stochastic process θ X θ with indexing set T is
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationFunctional Analysis HW #3
Functional Analysis HW #3 Sangchul Lee October 26, 2015 1 Solutions Exercise 2.1. Let D = { f C([0, 1]) : f C([0, 1])} and define f d = f + f. Show that D is a Banach algebra and that the Gelfand transform
More informationbe the set of complex valued 2π-periodic functions f on R such that
. Fourier series. Definition.. Given a real number P, we say a complex valued function f on R is P -periodic if f(x + P ) f(x) for all x R. We let be the set of complex valued -periodic functions f on
More informationSDS : Theoretical Statistics
SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff
More informationMetric Spaces Lecture 17
Metric Spaces Lecture 17 Homeomorphisms At the end of last lecture an example was given of a bijective continuous function f such that f 1 is not continuous. For another example, consider the sets T =
More informationExercises and Answers to Chapter 1
Exercises and Answers to Chapter The continuous type of random variable X has the following density function: a x, if < x < a, f (x), otherwise. Answer the following questions. () Find a. () Obtain mean
More informationExercises from other sources REAL NUMBERS 2,...,
Exercises from other sources REAL NUMBERS 1. Find the supremum and infimum of the following sets: a) {1, b) c) 12, 13, 14, }, { 1 3, 4 9, 13 27, 40 } 81,, { 2, 2 + 2, 2 + 2 + } 2,..., d) {n N : n 2 < 10},
More informationSTAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă
STAT 535 Lecture 5 November, 2018 Brief overview of Model Selection and Regularization c Marina Meilă mmp@stat.washington.edu Reading: Murphy: BIC, AIC 8.4.2 (pp 255), SRM 6.5 (pp 204) Hastie, Tibshirani
More information1. Is the set {f a,b (x) = ax + b a Q and b Q} of all linear functions with rational coefficients countable or uncountable?
Name: Instructions. Show all work in the space provided. Indicate clearly if you continue on the back side, and write your name at the top of the scratch sheet if you will turn it in for grading. No books
More informationContinuity. Chapter 4
Chapter 4 Continuity Throughout this chapter D is a nonempty subset of the real numbers. We recall the definition of a function. Definition 4.1. A function from D into R, denoted f : D R, is a subset of
More informatione x = 1 + x + x2 2! + x3 If the function f(x) can be written as a power series on an interval I, then the power series is of the form
Taylor Series Given a function f(x), we would like to be able to find a power series that represents the function. For example, in the last section we noted that we can represent e x by the power series
More informationLecture 4: Completion of a Metric Space
15 Lecture 4: Completion of a Metric Space Closure vs. Completeness. Recall the statement of Lemma??(b): A subspace M of a metric space X is closed if and only if every convergent sequence {x n } X satisfying
More informationMcGill University Math 354: Honors Analysis 3
Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The
More information1 Probability theory. 2 Random variables and probability theory.
Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major
More informationSpring 2012 Math 541B Exam 1
Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote
More informationHypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3
Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest
More informationSolutions Final Exam May. 14, 2014
Solutions Final Exam May. 14, 2014 1. Determine whether the following statements are true or false. Justify your answer (i.e., prove the claim, derive a contradiction or give a counter-example). (a) (10
More informationThe Delta Method and Applications
Chapter 5 The Delta Method and Applications 5.1 Local linear approximations Suppose that a particular random sequence converges in distribution to a particular constant. The idea of using a first-order
More informationconverges as well if x < 1. 1 x n x n 1 1 = 2 a nx n
Solve the following 6 problems. 1. Prove that if series n=1 a nx n converges for all x such that x < 1, then the series n=1 a n xn 1 x converges as well if x < 1. n For x < 1, x n 0 as n, so there exists
More informationThe Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models
The Uniform Weak Law of Large Numbers and the Consistency of M-Estimators of Cross-Section and Time Series Models Herman J. Bierens Pennsylvania State University September 16, 2005 1. The uniform weak
More informationStatistics 300B Winter 2018 Final Exam Due 24 Hours after receiving it
Statistics 300B Winter 08 Final Exam Due 4 Hours after receiving it Directions: This test is open book and open internet, but must be done without consulting other students. Any consultation of other students
More information