Minimax Estimation of a nonlinear functional on a structured high-dimensional model
|
|
- Jane Robertson
- 5 years ago
- Views:
Transcription
1 Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38
2 Outline Heuristics (Minimax ) 2 / 38
3 Outline Heuristics First order Influence functions: n -regular case (Minimax ) 2 / 38
4 Outline Heuristics First order Influence functions: n -regular case Higher order influence functions non-regular rates n (Minimax ) 2 / 38
5 Outline Heuristics First order Influence functions: n -regular case Higher order influence functions non-regular rates n Application to quadratic functional and missing at random data functional (Minimax ) 2 / 38
6 Outline Heuristics First order Influence functions: n -regular case Higher order influence functions non-regular rates n Application to quadratic functional and missing at random data functional This is joint work with Robins, Li, Mukherjee and van der Vaart. (Minimax ) 2 / 38
7 Heuristics X 1,..., X n i.i.d p where p belongs to a collection P of densities and we aim to estimate the value χ (p) of a functional χ : P R (Minimax ) 3 / 38
8 Heuristics X 1,..., X n i.i.d p where p belongs to a collection P of densities and we aim to estimate the value χ (p) of a functional χ : P R We are particularly interested in settings of semiparametric or nonparametric model such that P is infinite dimensional. (Minimax ) 3 / 38
9 Heuristics X 1,..., X n i.i.d p where p belongs to a collection P of densities and we aim to estimate the value χ (p) of a functional χ : P R We are particularly interested in settings of semiparametric or nonparametric model such that P is infinite dimensional. We will give special attention to setting where the semiparametric model is described through parameters of low regularity, in which case n convergence may not be attainable. (Minimax ) 3 / 38
10 Heuristics For estimation, consider the class of "one-step" estimators of the form ˆχ = χ (ˆp n ) + P n χ ˆpn for ˆp n an initial estimator of p and x χ p (x) is a measurable function for each p and P n f = n 1 n i=1 f (X i ). (Minimax ) 4 / 38
11 Heuristics For estimation, consider the class of "one-step" estimators of the form ˆχ = χ (ˆp n ) + P n χ ˆpn for ˆp n an initial estimator of p and x χ p (x) is a measurable function for each p and P n f = n 1 n i=1 f (X i ). One possible choice for χ ˆpn = 0 leading to the plug-in estimator, this is typically suboptimal. (Minimax ) 4 / 38
12 Heuristics For estimation, consider the class of "one-step" estimators of the form ˆχ = χ (ˆp n ) + P n χ ˆpn for ˆp n an initial estimator of p and x χ p (x) is a measurable function for each p and P n f = n 1 n i=1 f (X i ). One possible choice for χ ˆpn = 0 leading to the plug-in estimator, this is typically suboptimal. ˆp n constructed from an independent sample, i.e. sample splitting. This is not required in the regular regime through use of empirical process theory. Such theory does not apply in non-regular regime. (Minimax ) 4 / 38
13 Heuristics To motivate a good choice of χ ˆpn consider ˆχ χ (p) = [ χ (ˆp n ) χ (p) + Pχ ˆpn ] + (Pn P) χ ˆpn where Pχ ˆpn = χ ˆpn (x) dp(x). (Minimax ) 5 / 38
14 Heuristics To motivate a good choice of χ ˆpn consider ˆχ χ (p) = [ χ (ˆp n ) χ (p) + Pχ ˆpn ] + (Pn P) χ ˆpn where Pχ ˆpn = χ ˆpn (x) dp(x). The second term is standard normal, a good choice of χ ˆpn is to make sure ( that the term in brackets has "small bias", i.e. no larger than O P n 1/2 ) (Minimax ) 5 / 38
15 Heuristics To motivate a good choice of χ ˆpn consider ˆχ χ (p) = [ χ (ˆp n ) χ (p) + Pχ ˆpn ] + (Pn P) χ ˆpn where Pχ ˆpn = χ ˆpn (x) dp(x). The second term is standard normal, a good choice of χ ˆpn is to make sure ( that the term in brackets has "small bias", i.e. no larger than O P n 1/2 ) This can be achieved if Pχ ˆpn acts like minus the derivative of the functional χ in the (ˆp n p) direction. (Minimax ) 5 / 38
16 Heuristics To motivate a good choice of χ ˆpn consider ˆχ χ (p) = [ χ (ˆp n ) χ (p) + Pχ ˆpn ] + (Pn P) χ ˆpn where Pχ ˆpn = χ ˆpn (x) dp(x). The second term is standard normal, a good choice of χ ˆpn is to make sure ( that the term in brackets has "small bias", i.e. no larger than O P n 1/2 ) This can be achieved if Pχ ˆpn acts like minus the derivative of the functional χ in the (ˆp n p) direction. A function χ ˆpn with this property is known as a "first order" influence function. (Minimax ) 5 / 38
17 Heuristics For a 1st order influence function the bias term is quadratic in the error d(ˆp n, p) for an appropriate distance d (Minimax ) 6 / 38
18 Heuristics For a 1st order influence function the bias term is quadratic in the error d(ˆp n, p) for an appropriate distance d no bias condition essentially requires d(ˆp n, p) = o p (n 1/4 ). To get this rate the model cannot be too large (Minimax ) 6 / 38
19 Heuristics For a 1st order influence function the bias term is quadratic in the error d(ˆp n, p) for an appropriate distance d no bias condition essentially requires d(ˆp n, p) = o p (n 1/4 ). To get this rate the model cannot be too large For instance regression or density can be estimated at rate n 1/4 if a-priori known to have at least d/2 derivatives (Minimax ) 6 / 38
20 Heuristics For a 1st order influence function the bias term is quadratic in the error d(ˆp n, p) for an appropriate distance d no bias condition essentially requires d(ˆp n, p) = o p (n 1/4 ). To get this rate the model cannot be too large For instance regression or density can be estimated at rate n 1/4 if a-priori known to have at least d/2 derivatives Our main concern will be for settings where the model is very large or has very low regularity, such that n 1/4 is not attainable for d(ˆp n, p). (Minimax ) 6 / 38
21 Heuristics The one step estimator ˆχ = χ (ˆp n ) + P n χ ˆpn is suboptimal in such cases because it does not strike the right balance between bias and variance; that is ˆχ χ (p) = [ χ (ˆp n ) χ (p) + Pχ ˆpn ] + (Pn P) χ ˆpn the magnitude of the first term dominates and one cannot obtain valid inferences about χ (p), i.e. CIs not centered properly to achieve nominal coverage. (Minimax ) 7 / 38
22 Heuristics The one step estimator ˆχ = χ (ˆp n ) + P n χ ˆpn is suboptimal in such cases because it does not strike the right balance between bias and variance; that is ˆχ χ (p) = [ χ (ˆp n ) χ (p) + Pχ ˆpn ] + (Pn P) χ ˆpn the magnitude of the first term dominates and one cannot obtain valid inferences about χ (p), i.e. CIs not centered properly to achieve nominal coverage. We will replace the first order IF P n χ ˆpn by a higher order IF U n χ ˆpn which is an mth order U-statistic, such that and ˆχ n = χ (ˆp n ) + U n χ ˆpn ˆχ n χ (p) = [ χ (ˆp n ) χ (p) + P m χ ˆpn ] + (Un P m ) χ ˆpn (Minimax ) 7 / 38
23 Heuristics The one step estimator ˆχ = χ (ˆp n ) + P n χ ˆpn is suboptimal in such cases because it does not strike the right balance between bias and variance; that is ˆχ χ (p) = [ χ (ˆp n ) χ (p) + Pχ ˆpn ] + (Pn P) χ ˆpn the magnitude of the first term dominates and one cannot obtain valid inferences about χ (p), i.e. CIs not centered properly to achieve nominal coverage. We will replace the first order IF P n χ ˆpn by a higher order IF U n χ ˆpn which is an mth order U-statistic, such that and ˆχ n = χ (ˆp n ) + U n χ ˆpn ˆχ n χ (p) = [ χ (ˆp n ) χ (p) + P m ] χ ˆpn + (Un P m ) χ ˆpn This in fact suggest choosing the HOIF such that P m χ ˆpn behaves like the first m terms of the Taylor expansion of χ (ˆp n ) χ (p). (Minimax ) 7 / 38
24 Heuristics Exact HOIFs exist only for special functionals. (Minimax ) 8 / 38
25 Heuristics Exact HOIFs exist only for special functionals. For general functionals, we find a "good approximate" functional in which case the convergence rate obtained by trading-off estimation bias+approximation bias with variance. (Minimax ) 8 / 38
26 Heuristics Exact HOIFs exist only for special functionals. For general functionals, we find a "good approximate" functional in which case the convergence rate obtained by trading-off estimation bias+approximation bias with variance. In some cases the optimal rate of the HOIF may still be n 1/2 in which case the semiparametric effi ciency bound may be achieved, the first order IF determines the variance and HOIFs correct the bias. (Minimax ) 8 / 38
27 Heuristics Exact HOIFs exist only for special functionals. For general functionals, we find a "good approximate" functional in which case the convergence rate obtained by trading-off estimation bias+approximation bias with variance. In some cases the optimal rate of the HOIF may still be n 1/2 in which case the semiparametric effi ciency bound may be achieved, the first order IF determines the variance and HOIFs correct the bias. Typically the rate will be slower than n 1/2. What is the optimal (i.e. minimax) rate of estimation? can we achieve it? (Minimax ) 8 / 38
28 Heuristics Exact HOIFs exist only for special functionals. For general functionals, we find a "good approximate" functional in which case the convergence rate obtained by trading-off estimation bias+approximation bias with variance. In some cases the optimal rate of the HOIF may still be n 1/2 in which case the semiparametric effi ciency bound may be achieved, the first order IF determines the variance and HOIFs correct the bias. Typically the rate will be slower than n 1/2. What is the optimal (i.e. minimax) rate of estimation? can we achieve it? Two cases emerge: (Minimax ) 8 / 38
29 Heuristics Exact HOIFs exist only for special functionals. For general functionals, we find a "good approximate" functional in which case the convergence rate obtained by trading-off estimation bias+approximation bias with variance. In some cases the optimal rate of the HOIF may still be n 1/2 in which case the semiparametric effi ciency bound may be achieved, the first order IF determines the variance and HOIFs correct the bias. Typically the rate will be slower than n 1/2. What is the optimal (i.e. minimax) rate of estimation? can we achieve it? Two cases emerge: case 1:smoothness of p is assumed known, we have obtained both lower and upper bounds (Minimax ) 8 / 38
30 Heuristics Exact HOIFs exist only for special functionals. For general functionals, we find a "good approximate" functional in which case the convergence rate obtained by trading-off estimation bias+approximation bias with variance. In some cases the optimal rate of the HOIF may still be n 1/2 in which case the semiparametric effi ciency bound may be achieved, the first order IF determines the variance and HOIFs correct the bias. Typically the rate will be slower than n 1/2. What is the optimal (i.e. minimax) rate of estimation? can we achieve it? Two cases emerge: case 1:smoothness of p is assumed known, we have obtained both lower and upper bounds case 2: smoothness of p is not known and therefore one must adapt to it; some recent results (Minimax ) 8 / 38
31 Example 1: Quadratic density functional Quadratic functional of a density p(x ): χ : p χ (p) = p (x) 2 dx R p p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. (Minimax ) 9 / 38
32 Example 1: Quadratic density functional Quadratic functional of a density p(x ): χ : p χ (p) = p (x) 2 dx R p p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. Of interest in bandwidth selection for optimal density smoothing. (Minimax ) 9 / 38
33 Example 1: Quadratic density functional Quadratic functional of a density p(x ): χ : p χ (p) = p (x) 2 dx R p p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. Of interest in bandwidth selection for optimal density smoothing. Well studied problem e.g Bickel and Ritov (1988), Laurent (1996, 1997), Cai and Low (2005), Cai, Low et al (2006), Donoho and Nussbaum (1990), Efromovitch and Low et al (1994, 1996), Giné and Nickl (2008), Laurent and Massart (2000). (Minimax ) 9 / 38
34 Example 1: Quadratic density functional Quadratic functional of a density p(x ): χ : p χ (p) = p (x) 2 dx R p p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. Of interest in bandwidth selection for optimal density smoothing. Well studied problem e.g Bickel and Ritov (1988), Laurent (1996, 1997), Cai and Low (2005), Cai, Low et al (2006), Donoho and Nussbaum (1990), Efromovitch and Low et al (1994, 1996), Giné and Nickl (2008), Laurent and Massart (2000). Exhibits elbow phenomenon in that n 1/2 -estimable only if α/d 1/4. Below the threshold, minimax lower bound is n d +4α 8α in squared error norm. (Birgé and Massart, 1995) (Minimax ) 9 / 38
35 Example 2:Nonlinear density functional nonlinear functional of a density p(x ): χ : p χ (p) = T (p (x)) dx R p where T is a smooth mapping p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. (Minimax ) 10 / 38
36 Example 2:Nonlinear density functional nonlinear functional of a density p(x ): χ : p χ (p) = T (p (x)) dx R p where T is a smooth mapping p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. Large body on estimating the entropy of an underlying distribution. Beirlant et al. (1997), more recent works include estimation of Renyi and Tsallis entropies (Leonenko and Seleznjev, 2010; Pál, Póczos and Szepesvári, 2010). Also see Kandasamy et al. (2014). (Minimax ) 10 / 38
37 Example 2:Nonlinear density functional nonlinear functional of a density p(x ): χ : p χ (p) = T (p (x)) dx R p where T is a smooth mapping p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. Large body on estimating the entropy of an underlying distribution. Beirlant et al. (1997), more recent works include estimation of Renyi and Tsallis entropies (Leonenko and Seleznjev, 2010; Pál, Póczos and Szepesvári, 2010). Also see Kandasamy et al. (2014). Exhibit elbow phenomenon in that n 1/2 -estimable only if α/d 1/4. Below the threshold, minimax lower bound is n d +4α 8α in squared error norm, Birgé and Massart,1995). (Minimax ) 10 / 38
38 Example 2:Nonlinear density functional nonlinear functional of a density p(x ): χ : p χ (p) = T (p (x)) dx R p where T is a smooth mapping p H d (α, C ) is in a d-dimensional Hölder ball with smoothness α and radius C. Large body on estimating the entropy of an underlying distribution. Beirlant et al. (1997), more recent works include estimation of Renyi and Tsallis entropies (Leonenko and Seleznjev, 2010; Pál, Póczos and Szepesvári, 2010). Also see Kandasamy et al. (2014). Exhibit elbow phenomenon in that n 1/2 -estimable only if α/d 1/4. in squared error norm, Birgé and Massart,1995). The case T (p) = p 3 studied when d = 1 by Kerkyacharian, Picard and Tsybakov (1998), general solution for d > 1 given by Tchetgen et al (2008). (Minimax ) 10 / 38 Below the threshold, minimax lower bound is n 8α d +4α
39 Example 3:Missing at random data Suppose full data is (Y, Z ) where Y is a binary response and Z a d-dimensional vector of covariates;however one observes (YA, A, Z ) from which we wish to estimate χ = E (Y ) = Pr(Y = 1) (Minimax ) 11 / 38
40 Example 3:Missing at random data Suppose full data is (Y, Z ) where Y is a binary response and Z a d-dimensional vector of covariates;however one observes (YA, A, Z ) from which we wish to estimate χ = E (Y ) = Pr(Y = 1) χ is nonparametrically identified under MAR A Y Z provided Pr (A = 1 Z ) > 0 a.s. (Minimax ) 11 / 38
41 Example 3:Missing at random data Suppose full data is (Y, Z ) where Y is a binary response and Z a d-dimensional vector of covariates;however one observes (YA, A, Z ) from which we wish to estimate χ = E (Y ) = Pr(Y = 1) χ is nonparametrically identified under MAR A Y Z provided Pr (A = 1 Z ) > 0 a.s. Note: isomorphic to estimating the average treatment effect of of a binary intervention A on Y under the assumption of no unmeasured confounding given Z.i.e. χ = E (Y a ) where Y a is the counterfactual response under regime a, full data is (Y a, X ), observed data is (Y a, X ) if A = a and X otherwise. Identified under A Y a Z (Minimax ) 11 / 38
42 Example 3:Missing at random data The functional can be expressed as an observed data functional in terms of the density f of Z (relative to a measure ν), the probability b(z) = Pr(Y = 1 Z = z) = Pr(Y = 1 Z = z, A = 1) : χ : p b,f χ (p b,f ) = bfdν (Minimax ) 12 / 38
43 Example 3:Missing at random data The functional can be expressed as an observed data functional in terms of the density f of Z (relative to a measure ν), the probability b(z) = Pr(Y = 1 Z = z) = Pr(Y = 1 Z = z, A = 1) : χ : p b,f χ (p b,f ) = bfdν Alternative parametrization in terms of a(z) 1 = Pr(A = 1 Z = z) and g = f /a (proportional to the density of Z given A = 1) χ : p a,b,g χ (p a,b,g ) = abgdν (Minimax ) 12 / 38
44 Example 3:Missing at random data Estimators of χ (p) that are n 1/2 consistent and asymptotically effi cient in the semiparametric sense have been constructed w a variety of methods, but only if a or b or both parameters are restricted to suffi ciently small regularity classes. (Minimax ) 13 / 38
45 Example 3:Missing at random data Estimators of χ (p) that are n 1/2 consistent and asymptotically effi cient in the semiparametric sense have been constructed w a variety of methods, but only if a or b or both parameters are restricted to suffi ciently small regularity classes. Formally, suppose that a and b belong to H d (α, C α ) and H d ( β, C β ) respectively, then n 1/2 consistent estimators have been obtained for α and β large enough that α 2α + d + β 2β + d 1 2 (1) (Minimax ) 13 / 38
46 Example 3:Missing at random data Estimators of χ (p) that are n 1/2 consistent and asymptotically effi cient in the semiparametric sense have been constructed w a variety of methods, but only if a or b or both parameters are restricted to suffi ciently small regularity classes. Formally, suppose that a and b belong to H d (α, C α ) and H d ( β, C β ) respectively, then n 1/2 consistent estimators have been obtained for α and β large enough that α 2α + d + β 2β + d 1 2 (1) In our work, we show that (1) is more stringent than required to achieve root-n-consistency which can be attained using HOIF provided α + β 2d 1 4 (Minimax ) 13 / 38
47 Example 3:Missing at random data For moderate to large dimensions d this is still a restrictive requirement, likewise if regularity is low even if d is small. (Minimax ) 14 / 38
48 Example 3:Missing at random data For moderate to large dimensions d this is still a restrictive requirement, likewise if regularity is low even if d is small. In fact in Robins, Li, Tchetgen, van der Vaart (2009) we establish that even if g were known, the minimax lower bound when (α + β) /2d < 1 4 is n 2(α+β) 2(α+β)+d (Minimax ) 14 / 38
49 Example 3:Missing at random data For moderate to large dimensions d this is still a restrictive requirement, likewise if regularity is low even if d is small. In fact in Robins, Li, Tchetgen, van der Vaart (2009) we establish that even if g were known, the minimax lower bound when (α + β) /2d < 1 4 is n 2(α+β) 2(α+β)+d Our estimators are shown to achieve this minimax rate when α and β are apriori known, provided g is smooth enough. (Minimax ) 14 / 38
50 First Order Semiparametric Theory Theory which goes back to Pfanzagl (1982), Newey (1990), van der Vaart (1991), Bickel et al (1993) (Minimax ) 15 / 38
51 First Order Semiparametric Theory Theory which goes back to Pfanzagl (1982), Newey (1990), van der Vaart (1991), Bickel et al (1993) Given a functional χ : P R defined on the semiparametric model P, an IF is a function χ p : x χ p (x) of the observed data which satisfies the following equation. (Minimax ) 15 / 38
52 First Order Semiparametric Theory Theory which goes back to Pfanzagl (1982), Newey (1990), van der Vaart (1991), Bickel et al (1993) Given a functional χ : P R defined on the semiparametric model P, an IF is a function χ p : x χ p (x) of the observed data which satisfies the following equation. For a suffi ciently regular submodel t p t P d χ (p t ) = d P t χ dt t=0 dt p = P ( χ p g ) t=0 where g = (d/dt) t=0 p t /p is the score function of the model t p t at t = 0. The closed linear span of all scored attached to submodels t p t of P is called the tangent space. (Minimax ) 15 / 38
53 First Order Semiparametric Theory Theory which goes back to Pfanzagl (1982), Newey (1990), van der Vaart (1991), Bickel et al (1993) Given a functional χ : P R defined on the semiparametric model P, an IF is a function χ p : x χ p (x) of the observed data which satisfies the following equation. For a suffi ciently regular submodel t p t P d χ (p t ) = d P t χ dt t=0 dt p = P ( χ p g ) t=0 where g = (d/dt) t=0 p t /p is the score function of the model t p t at t = 0. The closed linear span of all scored attached to submodels t p t of P is called the tangent space. Therefore an influence function is an element of the Hilbert space L 2 (p) whose inner product with elements of the tangent space represent the derivative of the functional. This result is known in functional analysis as Riesz Representation Theorem. (Minimax ) 15 / 38
54 IF of quadratic density functional To compute the IF of χ (p) = R p p (x) 2 dx consider χ (p t ) = R p p t (x) 2 dx and computing d dt t=0 χ (p t ) one obtains Therefore d dt t=0 χ (p t ) = P {2 (p (X ) χ (p)) g} χ p = 2 (p (X ) χ (p)) (Minimax ) 16 / 38
55 IF of quadratic density functional To compute the IF of χ (p) = R p p (x) 2 dx consider χ (p t ) = R p p t (x) 2 dx and computing d dt t=0 χ (p t ) one obtains Therefore d dt t=0 χ (p t ) = P {2 (p (X ) χ (p)) g} χ p = 2 (p (X ) χ (p)) Furthermore ˆχ = χ (ˆp n ) + P n χ (1) ˆp n = 2P n ˆp n (X ) χ (ˆp n ) so that P ( ˆχ χ (p)) = (p ˆp n ) 2 (x) dx which is quadratic in the preliminary estimator as expected. (Minimax ) 16 / 38
56 IF of quadratic density functional To compute the IF of χ (p) = R p p (x) 2 dx consider χ (p t ) = R p p t (x) 2 dx and computing d dt t=0 χ (p t ) one obtains Therefore d dt t=0 χ (p t ) = P {2 (p (X ) χ (p)) g} χ p = 2 (p (X ) χ (p)) Furthermore ˆχ = χ (ˆp n ) + P n χ (1) ˆp n = 2P n ˆp n (X ) χ (ˆp n ) so that P ( ˆχ χ (p)) = (p ˆp n ) 2 (x) dx which is quadratic in the preliminary estimator as expected. The variance of ˆχ is of order 1/n; if the preliminary estimator ˆp n can be constructed so that the squared bias is smaller than the variance, the estimator is rate optimal; otherwise the bias dominates and a higher order IF is needed. (Minimax ) 16 / 38
57 Missing Data IF To compute IF in the MAR example, consider the score functions of the one dimensional submodel p t = p at,b t,f t induced by paths of the form a t = a + ta b t = b + tb f t = f + tf given measurable functions a, b, f : Z R. (Minimax ) 17 / 38
58 Missing Data IF To compute IF in the MAR example, consider the score functions of the one dimensional submodel p t = p at,b t,f t induced by paths of the form a t = a + ta b t = b + tb f t = f + tf given measurable functions a, b, f : Z R. This gives rise to corresponding score equations Aa (Z ) 1 a(z )(a 1)(Z ) a (Z ) A (Y b(z )) b(z )(1 b)(z ) b(z ) f(z ) a score b score f score (Minimax ) 17 / 38
59 IF for Missing Data Functional Let B = a score+ b score+f score. Then, d ( ) χ dt p (p t ) = P χ p (1) B t=0 for all B in the tangent space of the model P, where χ (1) p = Aa (Z ) (Y b (Z )) + b (Z ) χ (p) (Minimax ) 18 / 38
60 IF for Missing Data Functional Let B = a score+ b score+f score. Then, d ( ) χ dt p (p t ) = P χ p (1) B t=0 for all B in the tangent space of the model P, where χ (1) p = Aa (Z ) (Y b (Z )) + b (Z ) χ (p) This is the well-known doubly robust influence function due to Robins, Rotnitzky and Zhao (1994). (Minimax ) 18 / 38
61 IF for Missing Data Functional Suppose one has obtained rate optimal nonparametric estimators â, b (optimal kernel smoothing or series estimation),e.g â converges at rate n 2α+d 2α in mean squared error. Then, the one-step estimator has 2nd order bias ˆχ = χ (ˆp n ) + P n χ (1) ˆp ( n ) = P n Aâ (Z ) Y b (Z ) + b (Z ) E ( ˆχ χ (p)) â a 2 b b 2 which is quadratic in the preliminary estimators as expected. Note that f does not enter the bias ( provided the empirical measure is used). (Minimax ) 19 / 38
62 IF for Missing Data Functional Suppose one has obtained rate optimal nonparametric estimators â, b (optimal kernel smoothing or series estimation),e.g â converges at rate n 2α+d 2α in mean squared error. Then, the one-step estimator has 2nd order bias ˆχ = χ (ˆp n ) + P n χ (1) ˆp ( n ) = P n Aâ (Z ) Y b (Z ) + b (Z ) E ( ˆχ χ (p)) â a 2 b b 2 which is quadratic in the preliminary estimators as expected. Note that f does not enter the bias ( provided the empirical measure is used). The variance of ˆχ is of order 1/n. If the preliminary estimators â and b can be constructed so that the squared bias is smaller than the variance, the estimator is rate optimal; otherwise the bias dominates and a higher order IF is needed. (Minimax ) 19 / 38
63 HOIFs Given a collection of smooth submodels t p t in P an mth order influence function χ pt of the functional χ (p) if it satisfies d j dt j t=0 P m χ p = 0 and χ (p t ) = d j = dt j t=0 d j dt j t=0 P m χ pt P m t χ p j = 1,..., m for every submodel through p in P. (Minimax ) 20 / 38
64 HOIFs Given a collection of smooth submodels t p t in P an mth order influence function χ pt of the functional χ (p) if it satisfies d j dt j t=0 P m χ p = 0 and χ (p t ) = d j = dt j t=0 d j dt j t=0 P m χ pt P m t χ p j = 1,..., m for every submodel through p in P. The third equation implies a Taylor expansion of t χ (p t ) at t = 0 of order m. but in addition requires that the derivatives of this map can be represented as expectations involving a function χ p. (Minimax ) 20 / 38
65 HOIFs Given a collection of smooth submodels t p t in P an mth order influence function χ pt of the functional χ (p) if it satisfies d j dt j t=0 P m χ p = 0 and χ (p t ) = d j = dt j t=0 d j dt j t=0 P m χ pt P m t χ p j = 1,..., m for every submodel through p in P. The third equation implies a Taylor expansion of t χ (p t ) at t = 0 of order m. but in addition requires that the derivatives of this map can be represented as expectations involving a function χ p. In order to exploit derivatives up to the mth order, groups of m observations can be used to match the expectation P m, this leads to U statistics of order m. (Minimax ) 20 / 38
66 HOIFs Thus, we must solve for an mth order U-statistic with mean zero that satisfies d j dt j t=0 χ (p t ) = d j dt j t=0 for every submodel through p in P. Pt m χ p j = 1,..., m (Minimax ) 21 / 38
67 HOIFs Thus, we must solve for an mth order U-statistic with mean zero that satisfies d j dt j t=0 χ (p t ) = d j dt j t=0 for every submodel through p in P. P m t χ p j = 1,..., m This equation involves multiple derivatives and many paths, directly solving for χ p challenging. (Minimax ) 21 / 38
68 HOIFs Thus, we must solve for an mth order U-statistic with mean zero that satisfies d j dt j t=0 χ (p t ) = d j dt j t=0 for every submodel through p in P. P m t χ p j = 1,..., m This equation involves multiple derivatives and many paths, directly solving for χ p challenging. For actual computation of influence functions we have shown that it is usually easier to derive higher order influence functions as influence functions of lower order ones. (Minimax ) 21 / 38
69 HOIFs Thus, we must solve for an mth order U-statistic with mean zero that satisfies d j dt j t=0 χ (p t ) = d j dt j t=0 for every submodel through p in P. P m t χ p j = 1,..., m This equation involves multiple derivatives and many paths, directly solving for χ p challenging. For actual computation of influence functions we have shown that it is usually easier to derive higher order influence functions as influence functions of lower order ones. In fact, the Hoeffding Decomposition Theorem states that one can decompose any mth order U-statistic as the sum of m degenerate U-statistics of orders 1, 2,..., m. (Minimax ) 21 / 38
70 HOIFs Thus by HDT U n χ p = U n χ p (1) U n 2 χ(2) p U n m! χ(m) p where χ p (j) is a degenerate (symmetric) kernel of j arguments defined uniquely as a projection of χ p (Minimax ) 22 / 38
71 HOIFs Thus by HDT U n χ p = U n χ p (1) U n 2 χ(2) p U n m! χ(m) p where χ p (j) is a degenerate (symmetric) kernel of j arguments defined uniquely as a projection of χ p Suitable functions χ (j) p can be found by the algorithm : (Minimax ) 22 / 38
72 HOIFs Thus by HDT U n χ p = U n χ p (1) U n 2 χ(2) p U n m! χ(m) p where χ p (j) is a degenerate (symmetric) kernel of j arguments defined uniquely as a projection of χ p Suitable functions χ (j) p can be found by the algorithm : 1 Let x 1 χ p (1) (x 1 ) be a first order influence function of the functional p χ (p) (Minimax ) 22 / 38
73 HOIFs Thus by HDT U n χ p = U n χ p (1) U n 2 χ(2) p U n m! χ(m) p where χ p (j) is a degenerate (symmetric) kernel of j arguments defined uniquely as a projection of χ p Suitable functions χ (j) p can be found by the algorithm : 1 Let x 1 χ p (1) (x 1 ) be a first order influence function of the functional p χ (p) 2 Let x j χ p (j) ( ) x1, x 2,..., x j be a first order influence function of the functional p χ p (j 1) ( ) x1, x 2,..., x j 1 for each x1, x 2,..., x j 1 and j = 1,..., m. (Minimax ) 22 / 38
74 HOIFs Thus by HDT U n χ p = U n χ p (1) U n 2 χ(2) p U n m! χ(m) p where χ p (j) is a degenerate (symmetric) kernel of j arguments defined uniquely as a projection of χ p Suitable functions χ (j) p can be found by the algorithm : 1 Let x 1 χ p (1) (x 1 ) be a first order influence function of the functional p χ (p) 2 Let x j χ p (j) ( ) x1, x 2,..., x j be a first order influence function of the functional p χ p (j 1) ( ) x1, x 2,..., x j 1 for each x1, x 2,..., x j 1 and j = 1,..., m. 3 Let χ p (j) be the degenerate part of χ p (j) relative to P, i.e. χ p (j) ( ) X1,..., x s,..., X j dp(xs ) = 0 for all s = 1,...j. (Minimax ) 22 / 38
75 HOIFs Thus, HIOFs are constructed as first order influence functions of an influence function. (Minimax ) 23 / 38
76 HOIFs Thus, HIOFs are constructed as first order influence functions of an influence function. Although the order m is fixed at a suitable value, we suppress it in the notation χ p (Minimax ) 23 / 38
77 HOIFs Thus, HIOFs are constructed as first order influence functions of an influence function. Although the order m is fixed at a suitable value, we suppress it in the notation χ p In abuse of language, we refer to χ (j) p as the jth order IF. (Minimax ) 23 / 38
78 HOIFs Thus, HIOFs are constructed as first order influence functions of an influence function. Although the order m is fixed at a suitable value, we suppress it in the notation χ p In abuse of language, we refer to χ (j) p as the jth order IF. The starting IF χ p (1) in step 1 of the algorithm may be any first order IF, it does not need to be an element of the tangent space (i.e. effi cient) and does not need to have mean zero. (Minimax ) 23 / 38
79 HOIFs Thus, HIOFs are constructed as first order influence functions of an influence function. Although the order m is fixed at a suitable value, we suppress it in the notation χ p In abuse of language, we refer to χ (j) p as the jth order IF. The starting IF χ p (1) in step 1 of the algorithm may be any first order IF, it does not need to be an element of the tangent space (i.e. effi cient) and does not need to have mean zero. The same remark applies in step 2 of the algorithm, it is only in step 3 that we make the IFs degenerate. (Minimax ) 23 / 38
80 HOIFs of quadratic functional Recall the uncentered (1st order) IF of p p R (x) 2 dx is p χ (1) p (x) = 2p (x), where recall p(x ) is the joint density of X. Applying step 2 for j = 2, we seek χ (2) p (x 1, x 2 ) that solves d { } dt t=0 χ (1) p t (x 1 ) = P χ (2) p (x 1, X 2 )g(x 2 ) for all x 1 and smooth paths t p t ; however d dt t=0 χ (1) p t (x 1 ) = 2 d p dt t (x 1 ) ; t=0 (Minimax ) 24 / 38
81 HOIFs of quadratic functional Recall the uncentered (1st order) IF of p p R (x) 2 dx is p χ (1) p (x) = 2p (x), where recall p(x ) is the joint density of X. Applying step 2 for j = 2, we seek χ (2) p (x 1, x 2 ) that solves d { } dt t=0 χ (1) p t (x 1 ) = P χ (2) p (x 1, X 2 )g(x 2 ) for all x 1 and smooth paths t p t ; however d dt t=0 χ (1) p t (x 1 ) = 2 d p dt t (x 1 ) ; t=0 however, p(x) as a continuous density is not pathwise differentiable so that such representation does not exist! (Minimax ) 24 / 38
82 Approximate functionals HOIF will generally fail to exist if the first order IF depends on a parameter that is not pathwise differentiable, such as a conditional expectation or a density at a point. (Minimax ) 25 / 38
83 Approximate functionals HOIF will generally fail to exist if the first order IF depends on a parameter that is not pathwise differentiable, such as a conditional expectation or a density at a point. To make progress, we propose to replace χ (p) with an approximate functional χ ( p) for a given mapping p p such that χ (p) and χ ( p) are close, and χ ( p) is higher order pathwise differentiable. (Minimax ) 25 / 38
84 Approximate functionals HOIF will generally fail to exist if the first order IF depends on a parameter that is not pathwise differentiable, such as a conditional expectation or a density at a point. To make progress, we propose to replace χ (p) with an approximate functional χ ( p) for a given mapping p p such that χ (p) and χ ( p) are close, and χ ( p) is higher order pathwise differentiable. This leads to representation bias. To ensure that this bias is negligible will require the projection of p on p involves models of high dimension # of parms n. (Minimax ) 25 / 38
85 Approximate functionals HOIF will generally fail to exist if the first order IF depends on a parameter that is not pathwise differentiable, such as a conditional expectation or a density at a point. To make progress, we propose to replace χ (p) with an approximate functional χ ( p) for a given mapping p p such that χ (p) and χ ( p) are close, and χ ( p) is higher order pathwise differentiable. This leads to representation bias. To ensure that this bias is negligible will require the projection of p on p involves models of high dimension # of parms n. This will typically result in variance of HOIF n 1 and nonparametric rates of convergence apply. (Minimax ) 25 / 38
86 Projection Kernel An orthogonal projection K p : L 2 (p) L 2 (p) on a k-dimensional linear subspace of L 2 (p) is given by a kernel operator, with kernel denoted by the same symbol as the operator: K p t (x) = K p (x, y) t (y) dp (y). The projection property K 2 p = K p holds. (Minimax ) 26 / 38
87 Projection Kernel An orthogonal projection K p : L 2 (p) L 2 (p) on a k-dimensional linear subspace of L 2 (p) is given by a kernel operator, with kernel denoted by the same symbol as the operator: K p t (x) = K p (x, y) t (y) dp (y). The projection property K 2 p = K p holds. We also require that K p (u, u) k and K p t t (Minimax ) 26 / 38
88 Projection Kernel An orthogonal projection K p : L 2 (p) L 2 (p) on a k-dimensional linear subspace of L 2 (p) is given by a kernel operator, with kernel denoted by the same symbol as the operator: K p t (x) = K p (x, y) t (y) dp (y). The projection property K 2 p = K p holds. We also require that K p (u, u) k and K p t t Moreover, it is desirable that the projection operator is suitable for approximation in Hölder space: for every k,and α a norm for H α [0, 1] d, sup t K p t t α 1 ( ) 1 α/d k (Minimax ) 26 / 38
89 Approximate quadratic functional The approximate functional χ ( p) = p (x) 2 dx p (x) = K µ p (x), µ p-fold Lebesgue measure (Minimax ) 27 / 38
90 Approximate quadratic functional The approximate functional χ ( p) = p (x) 2 dx p (x) = K µ p (x), µ p-fold Lebesgue measure Note that whereas p(x) is not pathwise differentiable p (x) is! (Minimax ) 27 / 38
91 Approximate quadratic functional The approximate functional χ ( p) = p (x) 2 dx p (x) = K µ p (x), µ p-fold Lebesgue measure Note that whereas p(x) is not pathwise differentiable p (x) is! therefore χ (2) p t d χ ( p t ) dt t=0 = P {2 ( p (X )) g} d dt t=0 χ (1) p t (x 1 ) = 2 d t (x 1 ) dt t=0 p = P { 2K µ (x 1, X 2 ) g (X 2 ) } (x 1, x 2 ) = K µ (x 1, x 2 ), χ (j) p t (x 1, x 2 ) = 0, j > 2 (Minimax ) 27 / 38
92 Approximate quadratic functional Step 3 of the algorithm yields a 2nd order IF U n χ p = U n χ p (1) 1 + U n 2 χ(2) p χ p (1)(x 1) = 2 ( p (x 1 ) χ ( p)) ; χ p (2)(x 1, x 2 ) = K µ (x 1, x 2 ) K µ p (x 1 ) K µ p (x 2 ) + χ ( p) (Minimax ) 28 / 38
93 Approximate quadratic functional Step 3 of the algorithm yields a 2nd order IF U n χ p = U n χ p (1) 1 + U n 2 χ(2) p χ p (1)(x 1) = 2 ( p (x 1 ) χ ( p)) ; χ p (2)(x 1, x 2 ) = K µ (x 1, x 2 ) K µ p (x 1 ) K µ p (x 2 ) + χ ( p) We further have and ˆχ n = χ (ˆp n ) + U n χ ˆpn P ( ˆχ n χ ( p)) = 0 ( ( var( ˆχ n ) max var ( 1 = max n, k ) n 2 U n χ (1) p ) ( )) 1, var U n 2 χ(2) p (Minimax ) 28 / 38
94 Approximate quadratic functional Step 3 of the algorithm yields a 2nd order IF U n χ p = U n χ p (1) 1 + U n 2 χ(2) p χ p (1)(x 1) = 2 ( p (x 1 ) χ ( p)) ; χ p (2)(x 1, x 2 ) = K µ (x 1, x 2 ) K µ p (x 1 ) K µ p (x 2 ) + χ ( p) We further have and ˆχ n = χ (ˆp n ) + U n χ ˆpn P ( ˆχ n χ ( p)) = 0 ( ( ) ( )) var( ˆχ n ) max var U n χ (1) 1 p, var U n 2 χ(2) p ( 1 = max n, k ) n 2 The representation bias is χ ( p) χ (p) = ( ) 1 2α/d k (Minimax ) 28 / 38
95 Approximate quadratic functional α Therefore if d 1 4, approximation bias can be made 1 n for k < n so that var( ˆχ n ) max ( 1 n, k ) n = 1 2 n, and the functional is estimable at rate n 1/2. (Minimax ) 29 / 38
96 Approximate quadratic functional α Therefore if d 1 4, approximation bias can be made 1 n for k < n so that var( ˆχ n ) max ( 1 n, k ) n = 1 2 n, and the functional is estimable at rate n 1/2. However whenever α d k opt = n 2d 4α+d < 1 4 optimal MSE is attained by > n achieving the minimax rate n 4α+d 8α (Minimax ) 29 / 38
97 Approximate quadratic functional α Therefore if d 1 4, approximation bias can be made 1 n for k < n so that var( ˆχ n ) max ( 1 n, k ) n = 1 2 n, and the functional is estimable at rate n 1/2. However whenever α d k opt = n 2d 4α+d < 1 4 optimal MSE is attained by > n achieving the minimax rate n 4α+d 8α Note that because there is no estimation bias, the optimal rate is obtained by trading off variance and representation bias. (Minimax ) 29 / 38
98 Approximate functional Interestingly, in this example the approximate functional is locally equivalent to the true functional in the sense that their first order IFs match. (Minimax ) 30 / 38
99 Approximate functional Interestingly, in this example the approximate functional is locally equivalent to the true functional in the sense that their first order IFs match. Although this is a desirable property because it leads to parsimounous HOIFs, this will not generally be the case for a general functional, but can be ensured by construction. (Minimax ) 30 / 38
100 Approximate functional Interestingly, in this example the approximate functional is locally equivalent to the true functional in the sense that their first order IFs match. Although this is a desirable property because it leads to parsimounous HOIFs, this will not generally be the case for a general functional, but can be ensured by construction. To do so, we propose to define the approximate functional to satisfy d ( ) χ ( p t ) + Pχ (1) dt p t = 0 t=0 (Minimax ) 30 / 38
101 Approximate functional Interestingly, in this example the approximate functional is locally equivalent to the true functional in the sense that their first order IFs match. Although this is a desirable property because it leads to parsimounous HOIFs, this will not generally be the case for a general functional, but can be ensured by construction. To do so, we propose to define the approximate functional to satisfy d ( ) χ ( p t ) + Pχ (1) dt p t = 0 t=0 In fact then χ (1) p = χ (1) p, therefore step 1 of the algorithm returns the same 1st order IF. (Minimax ) 30 / 38
102 Approximate functional for missing data functional To define approximate functional, for user-specified a we let ã be the function such that (ã â) /a is the orthogonal projection of (a â) /a onto a "large" linear subspace of L 2 (g), with projection kernel K g, i.e. ã a = â ( a a + K g a â ) a and define b similarly. (Minimax ) 31 / 38
103 Approximate functional for missing data functional To define approximate functional, for user-specified a we let ã be the function such that (ã â) /a is the orthogonal projection of (a â) /a onto a "large" linear subspace of L 2 (g), with projection kernel K g, i.e. ã a = â ( a a + K g a â ) a and define b similarly. The corresponding approximate functional is χ ( p) = ã bgdν satisfies χ (1) p = χ (1) p and its representation bias is ã bgdν abgdν 2 (a ã) a 2 ( ) 1 (α+β)/d k ( ) b b abgdν b 2 abgdν which can be made small by chosing the range of K g large enough. (Minimax ) 31 / 38
104 Approximate functional for missing data functional Step 1 of the algorithm ( applied ) to χ ( p) gives χ (1) p = a 1 ã (z 1 ) y 1 b (z 1 ) + b (z 1 ) (Minimax ) 32 / 38
105 Approximate functional for missing data functional Step 1 of the algorithm ( applied ) to χ ( p) gives χ (1) p = a 1 ã (z 1 ) y 1 b (z 1 ) + b (z 1 ) The corresponding bias of ˆχ n = χ (ˆp n ) + U n χ 1ˆp n ) (â a) ( b b gdν = O p (n α is 2α+d β 2β+d while the corresponding variance is order 1/n, this requires 1/2 to recover regular estimator when possible. α 2α+d + β 2β+d ) (Minimax ) 32 / 38
106 Approximate functional for missing data functional Step 1 of the algorithm ( applied ) to χ ( p) gives χ (1) p = a 1 ã (z 1 ) y 1 b (z 1 ) + b (z 1 ) The corresponding bias of ˆχ n = χ (ˆp n ) + U n χ 1ˆp n ) (â a) ( b b gdν = O p (n α is 2α+d β 2β+d while the corresponding variance is order 1/n, this requires α 1/2 to recover regular estimator when possible. 2α+d + β 2β+d One may be able to relax this somewhat by undersmoothing or exploiting the fact that the bias has integral representation. We do not wish to do so here as we assume all nuisance parameters are rate optimal. ) (Minimax ) 32 / 38
107 Approximate functional for missing data functional Whereas χ (1) p (x 1 ) is not pathwise diff., χ (1) p (x 1 ) is. (Minimax ) 33 / 38
108 Approximate functional for missing data functional Whereas χ (1) p (x 1 ) is not pathwise diff., χ (1) p (x 1 ) is. Step 2 of the algorithm applied to χ (1) p (x 1 ) gives χ (2) p (x 1, X 2 ) = 2a 1 (y 1 b (z 1 ))K g (z 1, Z 2 ) (A 2 ã(z 2 ) 1) (Minimax ) 33 / 38
109 Approximate functional for missing data functional Whereas χ (1) p (x 1 ) is not pathwise diff., χ (1) p (x 1 ) is. Step 2 of the algorithm applied to χ (1) p (x 1 ) gives χ (2) p (x 1, X 2 ) = 2a 1 (y 1 b (z 1 ))K g (z 1, Z 2 ) (A 2 ã(z 2 ) 1) Step 3 symmetrizes this IF and makes it degenerate U-statistic. (Minimax ) 33 / 38
110 Approximate functional for missing data functional Whereas χ (1) p (x 1 ) is not pathwise diff., χ (1) p (x 1 ) is. Step 2 of the algorithm applied to χ (1) p (x 1 ) gives χ (2) p (x 1, X 2 ) = 2a 1 (y 1 b (z 1 ))K g (z 1, Z 2 ) (A 2 ã(z 2 ) 1) Step 3 symmetrizes this IF and makes it degenerate U-statistic. Note that because K g will depend on g the 2nd order IF is not unbiased and therefore higher order IFs may be required to further reduce estimation bias depending on smoothness of g. (Minimax ) 33 / 38
111 Minimax estimation with 2nd order IF Let ˆχ n = χ (ˆp n ) + U n χ 1ˆp n + U n χ 2ˆp n of g. and let γ denote the Hölder coeff (Minimax ) 34 / 38
112 Minimax estimation with 2nd order IF Let ˆχ n = χ (ˆp n ) + U n χ 1ˆp n + U n χ 2ˆp n of g. and let γ denote the Hölder coeff Theorem ( γ Suppose d +2γ 2(β+α) d +2(β+α) have the following result: 1 4, then If (β+α) 2d β d +2β + ) α d +2α ( ) 1 sup ( ˆχ n χ (p)) = O p n θ Θ(β,α,γ) ; set k = n d +2(β+α), then we 2 If (β+α) 2d 1 4, then ( ) sup ( ˆχ n χ (p)) = O p n 2(β+α) d +2(β+α) θ Θ(β,α,γ) (Minimax ) 34 / 38
113 Minimax estimation This result is new!!! All existing methods break down once (β + α) /2d 1 4, i.e. once the effective smoothness (β + α) /2d is too small. (Minimax ) 35 / 38
114 Minimax estimation This result is new!!! All existing methods break down once (β + α) /2d 1 4, i.e. once the effective smoothness (β + α) /2d is too small. We have likewise constructed ( minimax ) estimators when γ d +2γ < 2(β+α) d +2(β+α) β d +2β + α d +2α (Minimax ) 35 / 38
115 Minimax estimation This result is new!!! All existing methods break down once (β + α) /2d 1 4, i.e. once the effective smoothness (β + α) /2d is too small. We have likewise constructed ( minimax ) estimators when γ d +2γ < 2(β+α) d +2(β+α) β d +2β + α d +2α These higher order IFs are higher order U statistics for further bias reduction without increasing the variance from the second order rate k, this requires tremendous care. (see Robins et al, 2016, Annals of n 2 Statistics) (Minimax ) 35 / 38
116 Honest confidence Intervals Our approach can be used to obtain honest CIs for χ (p) given by C n = ˆχ n ±m 1 α σ ( ˆχ n ) so that lim Pr {χ (p) / C n} α n uniformly in p P with appropriate constant m 1 α (Minimax ) 36 / 38
117 Honest confidence Intervals Our approach can be used to obtain honest CIs for χ (p) given by C n = ˆχ n ±m 1 α σ ( ˆχ n ) so that lim Pr {χ (p) / C n} α n uniformly in p P with appropriate constant m 1 α This is because our HOIFs are asymptotically normally distributed (Robins et al, 2016). (Minimax ) 36 / 38
118 Simulations of honest confidence Intervals Nominal coverage 90%. Sample size iterations, effective smoothnes of g =.2. (Minimax ) 37 / 38
119 Final remarks Our results appy to a general class of functionals which includes the semilinear model, several IV models, MAR monotone missing data functionals, nonignorable missing data functionals among others. (Minimax ) 38 / 38
120 Final remarks Our results appy to a general class of functionals which includes the semilinear model, several IV models, MAR monotone missing data functionals, nonignorable missing data functionals among others. One will rarely know the smoothness of nuisance parameters, ideally one would like to adapt to such smoothness. (Minimax ) 38 / 38
121 Final remarks Our results appy to a general class of functionals which includes the semilinear model, several IV models, MAR monotone missing data functionals, nonignorable missing data functionals among others. One will rarely know the smoothness of nuisance parameters, ideally one would like to adapt to such smoothness. We have recently developed such adaptive estimators for a large class of functionals by extending and applying a technique proposed by Lepski.(Mukherjee,Tchetgen Tchetgen, Robins, 2016) (Minimax ) 38 / 38
MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED HIGH-DIMENSIONAL MODEL
Submitted to the Annals of Statistics arxiv: math.pr/0000000 MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED HIGH-DIMENSIONAL MODEL By James M. Robins, Lingling Li, Rajarshi Mukherjee, Eric Tchetgen
More informationarxiv: v2 [stat.me] 8 Dec 2015
Submitted to the Annals of Statistics arxiv: math.pr/0000000 HIGHER ORDER ESTIMATING EQUATIONS FOR HIGH-DIMENSIONAL MODELS arxiv:1512.02174v2 [stat.me] 8 Dec 2015 By James Robins, Lingling Li, Rajarshi
More informationCross-fitting and fast remainder rates for semiparametric estimation
Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting
More informationSemiparametric Minimax Rates
arxiv: math.pr/0000000 Semiparametric Minimax Rates James Robins, Eric Tchetgen Tchetgen, Lingling Li, and Aad van der Vaart James Robins Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology
More informationOptimal Estimation of a Nonsmooth Functional
Optimal Estimation of a Nonsmooth Functional T. Tony Cai Department of Statistics The Wharton School University of Pennsylvania http://stat.wharton.upenn.edu/ tcai Joint work with Mark Low 1 Question Suppose
More informationOn Efficiency of the Plug-in Principle for Estimating Smooth Integrated Functionals of a Nonincreasing Density
On Efficiency of the Plug-in Principle for Estimating Smooth Integrated Functionals of a Nonincreasing Density arxiv:188.7915v2 [math.st] 12 Feb 219 Rajarshi Mukherjee and Bodhisattva Sen Department of
More informationLocally Robust Semiparametric Estimation
Locally Robust Semiparametric Estimation Victor Chernozhukov Juan Carlos Escanciano Hidehiko Ichimura Whitney K. Newey The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper
More informationarxiv: v1 [math.st] 20 May 2008
IMS Collections Probability and Statistics: Essays in Honor of David A Freedman Vol 2 2008 335 421 c Institute of Mathematical Statistics, 2008 DOI: 101214/193940307000000527 arxiv:08053040v1 mathst 20
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationStatistical inference on Lévy processes
Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationSemiparametric minimax rates
Electronic Journal of Statistics Vol. 3 (2009) 1305 1321 ISSN: 1935-7524 DOI: 10.1214/09-EJS479 Semiparametric minimax rates James Robins and Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology
More informationA note on L convergence of Neumann series approximation in missing data problems
A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West
More informationInformation Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n
Information Measure Estimation and Applications: Boosting the Effective Sample Size from n to n ln n Jiantao Jiao (Stanford EE) Joint work with: Kartik Venkat Yanjun Han Tsachy Weissman Stanford EE Tsinghua
More informationEfficient Estimation of Population Quantiles in General Semiparametric Regression Models
Efficient Estimation of Population Quantiles in General Semiparametric Regression Models Arnab Maity 1 Department of Statistics, Texas A&M University, College Station TX 77843-3143, U.S.A. amaity@stat.tamu.edu
More informationNonparametric Inference via Bootstrapping the Debiased Estimator
Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models
Introduction to Empirical Processes and Semiparametric Inference Lecture 25: Semiparametric Models Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations
More informationHarvard University. A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome. Eric Tchetgen Tchetgen
Harvard University Harvard University Biostatistics Working Paper Series Year 2014 Paper 175 A Note on the Control Function Approach with an Instrumental Variable and a Binary Outcome Eric Tchetgen Tchetgen
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationThree Papers by Peter Bickel on Nonparametric Curve Estimation
Three Papers by Peter Bickel on Nonparametric Curve Estimation Hans-Georg Müller 1 ABSTRACT The following is a brief review of three landmark papers of Peter Bickel on theoretical and methodological aspects
More informationECO Class 6 Nonparametric Econometrics
ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................
More informationThe International Journal of Biostatistics
The International Journal of Biostatistics Volume 2, Issue 1 2006 Article 2 Statistical Inference for Variable Importance Mark J. van der Laan, Division of Biostatistics, School of Public Health, University
More informationKernel Methods. Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton
Kernel Methods Lecture 4: Maximum Mean Discrepancy Thanks to Karsten Borgwardt, Malte Rasch, Bernhard Schölkopf, Jiayuan Huang, Arthur Gretton Alexander J. Smola Statistical Machine Learning Program Canberra,
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationData-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters
arxiv:1706.07408v2 [math.st] 12 Jul 2017 Data-adaptive smoothing for optimal-rate estimation of possibly non-regular parameters Aurélien F. Bibaut January 8, 2018 Abstract Mark J. van der Laan We consider
More informationFlexible Estimation of Treatment Effect Parameters
Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both
More informationASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin
The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian
More informationPlug-in Approach to Active Learning
Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X
More informationGenerated Covariates in Nonparametric Estimation: A Short Review.
Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated
More informationfinite-sample optimal estimation and inference on average treatment effects under unconfoundedness
finite-sample optimal estimation and inference on average treatment effects under unconfoundedness Timothy Armstrong (Yale University) Michal Kolesár (Princeton University) September 2017 Introduction
More informationMath Real Analysis II
Math 4 - Real Analysis II Solutions to Homework due May Recall that a function f is called even if f( x) = f(x) and called odd if f( x) = f(x) for all x. We saw that these classes of functions had a particularly
More informationSIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised March 2018
SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised March 2018 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R2 COWLES FOUNDATION
More informationFast learning rates for plug-in classifiers under the margin condition
Fast learning rates for plug-in classifiers under the margin condition Jean-Yves Audibert 1 Alexandre B. Tsybakov 2 1 Certis ParisTech - Ecole des Ponts, France 2 LPMA Université Pierre et Marie Curie,
More informationUniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
Uniform Post Selection Inference for LAD Regression and Other Z-estimation problems. ArXiv: 1304.0282 Victor MIT, Economics + Center for Statistics Co-authors: Alexandre Belloni (Duke) + Kengo Kato (Tokyo)
More informationOPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1
The Annals of Statistics 1997, Vol. 25, No. 6, 2512 2546 OPTIMAL POINTWISE ADAPTIVE METHODS IN NONPARAMETRIC ESTIMATION 1 By O. V. Lepski and V. G. Spokoiny Humboldt University and Weierstrass Institute
More informationConfidence intervals for kernel density estimation
Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting
More informationThis model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that
Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear
More informationAn Omnibus Nonparametric Test of Equality in. Distribution for Unknown Functions
An Omnibus Nonparametric Test of Equality in Distribution for Unknown Functions arxiv:151.4195v3 [math.st] 14 Jun 217 Alexander R. Luedtke 1,2, Mark J. van der Laan 3, and Marco Carone 4,1 1 Vaccine and
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 24 Paper 153 A Note on Empirical Likelihood Inference of Residual Life Regression Ying Qing Chen Yichuan
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationStatistical Properties of Numerical Derivatives
Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective
More informationBayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4
Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 290 Targeted Minimum Loss Based Estimation of an Intervention Specific Mean Outcome Mark
More informationSimple and Honest Confidence Intervals in Nonparametric Regression
Simple and Honest Confidence Intervals in Nonparametric Regression Timothy B. Armstrong Yale University Michal Kolesár Princeton University June, 206 Abstract We consider the problem of constructing honest
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2014 Paper 327 Entering the Era of Data Science: Targeted Learning and the Integration of Statistics
More informationLearning 2 -Continuous Regression Functionals via Regularized Riesz Representers
Learning 2 -Continuous Regression Functionals via Regularized Riesz Representers Victor Chernozhukov MIT Whitney K. Newey MIT Rahul Singh MIT September 10, 2018 Abstract Many objects of interest can be
More informationLeast squares under convex constraint
Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationModification and Improvement of Empirical Likelihood for Missing Response Problem
UW Biostatistics Working Paper Series 12-30-2010 Modification and Improvement of Empirical Likelihood for Missing Response Problem Kwun Chuen Gary Chan University of Washington - Seattle Campus, kcgchan@u.washington.edu
More informationDISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS. By Subhashis Ghosal North Carolina State University
Submitted to the Annals of Statistics DISCUSSION: COVERAGE OF BAYESIAN CREDIBLE SETS By Subhashis Ghosal North Carolina State University First I like to congratulate the authors Botond Szabó, Aad van der
More informationEfficient Estimation in Convex Single Index Models 1
1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)
More informationSIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION. Timothy B. Armstrong and Michal Kolesár. June 2016 Revised October 2016
SIMPLE AND HONEST CONFIDENCE INTERVALS IN NONPARAMETRIC REGRESSION By Timothy B. Armstrong and Michal Kolesár June 2016 Revised October 2016 COWLES FOUNDATION DISCUSSION PAPER NO. 2044R COWLES FOUNDATION
More informationThe Nonparametric Bootstrap
The Nonparametric Bootstrap The nonparametric bootstrap may involve inferences about a parameter, but we use a nonparametric procedure in approximating the parametric distribution using the ECDF. We use
More informationPrevious lecture. P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing.
Previous lecture P-value based combination. Fixed vs random effects models. Meta vs. pooled- analysis. New random effects testing. Interaction Outline: Definition of interaction Additive versus multiplicative
More informationEfficiency of Profile/Partial Likelihood in the Cox Model
Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2011 Paper 288 Targeted Maximum Likelihood Estimation of Natural Direct Effect Wenjing Zheng Mark J.
More informationHilbert Spaces. Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space.
Hilbert Spaces Hilbert space is a vector space with some extra structure. We start with formal (axiomatic) definition of a vector space. Vector Space. Vector space, ν, over the field of complex numbers,
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationD I S C U S S I O N P A P E R
I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive
More informationDensity estimators for the convolution of discrete and continuous random variables
Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationStructural Nested Mean Models for Assessing Time-Varying Effect Moderation. Daniel Almirall
1 Structural Nested Mean Models for Assessing Time-Varying Effect Moderation Daniel Almirall Center for Health Services Research, Durham VAMC & Dept. of Biostatistics, Duke University Medical Joint work
More informationHigher-Order von Mises Expansions, Bagging and Assumption-Lean Inference
Higher-Order von Mises Expansions, Bagging and Assumption-Lean Inference Andreas Buja joint with: Richard Berk, Lawrence Brown, Linda Zhao, Arun Kuchibhotla, Kai Zhang Werner Stützle, Ed George, Mikhail
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2006 Paper 213 Targeted Maximum Likelihood Learning Mark J. van der Laan Daniel Rubin Division of Biostatistics,
More informationGlobal Sensitivity Analysis for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach
Global for Repeated Measures Studies with Informative Drop-out: A Semi-Parametric Approach Daniel Aidan McDermott Ivan Diaz Johns Hopkins University Ibrahim Turkoz Janssen Research and Development September
More informationProblem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function
Problem 3. Give an example of a sequence of continuous functions on a compact domain converging pointwise but not uniformly to a continuous function Solution. If we does not need the pointwise limit of
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research
More informationSDS : Theoretical Statistics
SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff
More informationLecture 8: Information Theory and Statistics
Lecture 8: Information Theory and Statistics Part II: Hypothesis Testing and I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 23, 2015 1 / 50 I-Hsiang
More informationUniversity of California, Berkeley
University of California, Berkeley U.C. Berkeley Division of Biostatistics Working Paper Series Year 2015 Paper 334 Targeted Estimation and Inference for the Sample Average Treatment Effect Laura B. Balzer
More informationClass 2 & 3 Overfitting & Regularization
Class 2 & 3 Overfitting & Regularization Carlo Ciliberto Department of Computer Science, UCL October 18, 2017 Last Class The goal of Statistical Learning Theory is to find a good estimator f n : X Y, approximating
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationSmooth simultaneous confidence bands for cumulative distribution functions
Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang
More informationAsymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University
More informationMeasuring robustness
Measuring robustness 1 Introduction While in the classical approach to statistics one aims at estimates which have desirable properties at an exactly speci ed model, the aim of robust methods is loosely
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationOn variable bandwidth kernel density estimation
JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced
More informationSupplement to Quantile-Based Nonparametric Inference for First-Price Auctions
Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract
More informationMMSE Dimension. snr. 1 We use the following asymptotic notation: f(x) = O (g(x)) if and only
MMSE Dimension Yihong Wu Department of Electrical Engineering Princeton University Princeton, NJ 08544, USA Email: yihongwu@princeton.edu Sergio Verdú Department of Electrical Engineering Princeton University
More informationMissing Covariate Data in Matched Case-Control Studies
Missing Covariate Data in Matched Case-Control Studies Department of Statistics North Carolina State University Paul Rathouz Dept. of Health Studies U. of Chicago prathouz@health.bsd.uchicago.edu with
More informationMinimum Hellinger Distance Estimation in a. Semiparametric Mixture Model
Minimum Hellinger Distance Estimation in a Semiparametric Mixture Model Sijia Xiang 1, Weixin Yao 1, and Jingjing Wu 2 1 Department of Statistics, Kansas State University, Manhattan, Kansas, USA 66506-0802.
More informationL p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by
L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may
More informationStatistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation
Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence
More informationIntroduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued
Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and
More informationUnsupervised Learning Techniques Class 07, 1 March 2006 Andrea Caponnetto
Unsupervised Learning Techniques 9.520 Class 07, 1 March 2006 Andrea Caponnetto About this class Goal To introduce some methods for unsupervised learning: Gaussian Mixtures, K-Means, ISOMAP, HLLE, Laplacian
More informationNETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA
Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) Yale, May 2 2011 p. 1/35 Introduction There exist many fields where inverse problems appear Astronomy (Hubble satellite).
More informationAn Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data
An Efficient Estimation Method for Longitudinal Surveys with Monotone Missing Data Jae-Kwang Kim 1 Iowa State University June 28, 2012 1 Joint work with Dr. Ming Zhou (when he was a PhD student at ISU)
More information1. If 1, ω, ω 2, -----, ω 9 are the 10 th roots of unity, then (1 + ω) (1 + ω 2 ) (1 + ω 9 ) is A) 1 B) 1 C) 10 D) 0
4 INUTES. If, ω, ω, -----, ω 9 are the th roots of unity, then ( + ω) ( + ω ) ----- ( + ω 9 ) is B) D) 5. i If - i = a + ib, then a =, b = B) a =, b = a =, b = D) a =, b= 3. Find the integral values for
More information13 Endogeneity and Nonparametric IV
13 Endogeneity and Nonparametric IV 13.1 Nonparametric Endogeneity A nonparametric IV equation is Y i = g (X i ) + e i (1) E (e i j i ) = 0 In this model, some elements of X i are potentially endogenous,
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) Fall, 2017 1 / 40 Outline Linear Model Introduction 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear
More informationInference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms
Inference in Nonparametric Series Estimation with Data-Dependent Number of Series Terms Byunghoon ang Department of Economics, University of Wisconsin-Madison First version December 9, 204; Revised November
More informationLinear Methods for Prediction
Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we
More informationInverse problems in statistics
Inverse problems in statistics Laurent Cavalier (Université Aix-Marseille 1, France) YES, Eurandom, 10 October 2011 p. 1/32 Part II 2) Adaptation and oracle inequalities YES, Eurandom, 10 October 2011
More informationWavelet Shrinkage for Nonequispaced Samples
University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Wavelet Shrinkage for Nonequispaced Samples T. Tony Cai University of Pennsylvania Lawrence D. Brown University
More informationLecture 11: Regression Methods I (Linear Regression)
Lecture 11: Regression Methods I (Linear Regression) 1 / 43 Outline 1 Regression: Supervised Learning with Continuous Responses 2 Linear Models and Multiple Linear Regression Ordinary Least Squares Statistical
More informationTargeted Maximum Likelihood Estimation in Safety Analysis
Targeted Maximum Likelihood Estimation in Safety Analysis Sam Lendle 1 Bruce Fireman 2 Mark van der Laan 1 1 UC Berkeley 2 Kaiser Permanente ISPE Advanced Topics Session, Barcelona, August 2012 1 / 35
More informationUncertainty Quantification for Inverse Problems. November 7, 2011
Uncertainty Quantification for Inverse Problems November 7, 2011 Outline UQ and inverse problems Review: least-squares Review: Gaussian Bayesian linear model Parametric reductions for IP Bias, variance
More informationInference For High Dimensional M-estimates. Fixed Design Results
: Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and
More information