Semiparametric Minimax Rates

Size: px
Start display at page:

Download "Semiparametric Minimax Rates"

Transcription

1 arxiv: math.pr/ Semiparametric Minimax Rates James Robins, Eric Tchetgen Tchetgen, Lingling Li, and Aad van der Vaart James Robins Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology School of Public Health Harvard University Lingling Li Department of Population Medicine Harvard Medical School and Harvard Pilgrim Health Care Boston, MA, lingling Aad van der Vaart Department of Mathematics Vrie Universiteit De Boelelaan 1081a 1081 HV Amsterdam The Netherlands Abstract: We consider the minimax rate of testing (or estimation of nonlinear functionals defined on semiparametric models. Existing methods appear not capable of determining a lower bound on the minimax rate of testing (or estimation for certain functionals of interest. In particular, if the semiparametric model is indexed by several infinite-dimensional parameters. To cover these examples we extend the approach of [1], which is based on comparing a true distribution to a convex mixture of perturbed distributions to a comparison of two convex mixtures. The first mixture is obtained by perturbing a first parameter of the model, and the second by perturbing in addition a second parameter. We apply the new result to two examples of semiparametric functionals:the estimation of a mean response when response data are missing at random, and the estimation of an expected conditional covariance functional. AMS 000 subect classifications: Primary 6G05, 6G0, 6G0, 6F5. Keywords and phrases: Nonlinear functional, nonparametric estimation, Hellinger distance. 1. Introduction Let X 1, X,..., X n be a random sample from a density p relative to a measure ν on a sample space (X, A. It is known that p belongs to a collection P of densities, and we wish to estimate the value χ(p of a functional χ: P R. In 1

2 Robins et al./semiparametric Minimax Rates this setting the mimimax rate of estimation of χ(p relative to squared error loss can be defined as the root of inf T n E p T n χ(p, p P where the infimum is taken over all estimators T n = T n (X 1,..., X n. Determination of a minimax rate in a particular problem often consists of proving a lower bound, showing that the mean square error of no estimator tends to zero faster than some rate ε n, combined with the explicit construction of an estimator with mean square error ε n. The lower bound is often proved by a testing argument, which tries to separate two subsets of the set {P n : p P} of possible distributions of the observation (X 1,..., X n. Even though testing is a statistically easier problem than estimation under quadratic loss, the corresponding minimax rates are often of the same order. The testing argument can be formulated as follows. If P n and Q n are in the convex hull of the sets {P n : p P, χ(p 0} and {P n : p P, χ(p ε n } and there exist no sequence of tests of P n versus Q n with both error probabilities tending to zero, then the minimax rate is not faster than a multiple of ε n. Here existence of a sequence of tests with errors tending to zero (a perfect sequence of tests is determined by the asymptotic separation of the sequences P n and Q n and can be described, for instance, in terms of the Hellinger affinity ρ(p n, Q n = dpn dqn. If ρ(p n, Q n is bounded away from zero as n, then no perfect sequence of tests exists (see e.g. Section 14.5 in []. One difficulty in applying this simple argument is that the relevant (least favorable two sequences of measures P n and Q n need not be product measures, but can be arbitrary convex combinations of product measures. In particular, it appears that for nonlinear functionals at least one of the two sequences must be a true mixture. This complicates the computation of the affinity ρ(p n, Q n considerably. [1] derived an elegant nice lower bound on the affinity when P n is a product measure and Q n a convex mixture of product measures, and used it to determine the testing rate for functionals of the type f(p dν, for a given smooth function f: R R, the function f(x = x being the crucial example. In this paper we are interested in structured models P that are indexed by several subparameters and where the functional is defined in terms of the subparameters. It appears that testing a product versus a mixture is often not least favorable in this situation, but testing two mixtures is. Thus we extend the bound of [1] to the case that both P n and Q n are mixtures. In our examples P n is equal to a convex mixture obtained by perturbing a first parameter of the model, and Q n is obtained by perturbing in addition a second parameter. We also refine the bound in other, less essential directions. The main general results of the paper are given in Section. In Section 3 we apply these results to two examples of interest.

3 Robins et al./semiparametric Minimax Rates 3. Main result For k N let X = k X be a measurable partition of the sample space. Given a vector λ = (λ 1,..., λ k in some product measurable space Λ = Λ 1 Λ k let P λ and Q λ be probability measures on X such that (1 P λ (X = Q λ (X = p for every λ Λ, for some probability vector (p 1,..., p k. ( The restrictions of P λ and Q λ to X depend on the th coordinate λ of λ = (λ 1,..., λ k only. For and q λ densities of the measures P λ and Q λ that are ointly measurable in the parameter λ and the observation, and π a probability measure on Λ, define p = dπ(λ and q = q λ dπ(λ, and set ( p a = max λ X (q λ b = max λ X (q p d = max λ X dν p, dν p. dν p, Theorem.1. If np (1 a b A for all and B B for positive constants A, B, B, then there exists a constant C that depends only on A, B, B such that, for any product probability measure π = π 1 π k, ρ( Pλ n dπ(λ, Q n λ dπ(λ 1 Cn (max p (b + ab Cnd. Proof. The numbers a, b and d are the maxima over of the numbers a, b and d defined in Lemma., but with the measures P λ and Q λ replaced there by the measures P,λ and Q,λ given in (.1. Define a number c similarly as p max λ X dν. p Under the assumptions of the theorem c is bounded above by B /B. By applying Lemma.1 and next Lemma. we see that the left side is at least E k 1 1 N ( N 4 r r= k 1 E 1 1 N 4 b r 1 N r= ( N r N 1 r=1 b r 1 N ( N 1 a r b 1 r N c N 1 d N 1 ( N 1 a r b 1 r N c N 1 d, r=1

4 Robins et al./semiparametric Minimax Rates 4 because k (1 a 1 k a for any nonnegative numbers a 1,..., a k. The expected values on the binomial variables N can be evaluated explicitly, using the identities, for N a binomial variable with parameters n and p, r=1 E N r= ( N b r = E ( (1 + b N 1 Nb = (1 + bp n 1 npb, r EN c N 1 = np(cp + 1 p n (cnp + 1 p, N 1 ( N 1 EN a r = EN ( (1 + a N 1 1 r = np(1 + ap n (1 + nap + np p np(1 p n p. Under the assumption that np(1 a b c 1, the right sides of these expressions can be seen to be bounded by multiples of (npb, np and (np a, respectively. We substitute these bounds in the first display of the proof, and use the equality p = 1 to complete the proof. Remark.1. If min p max p 1/n 1+ε for some ε > 0, which arises for equiprobable partitions in k n 1+ε sets, then there exists a number n 0 such that P(max N > n 0 0. (Indeed, the probability is bounded by k(n max p n0+1. Under this slightly stronger assumption the computations need only address N n 0 and hence can be simplified. The proof of Theorem.1 is based on two lemmas. The first lemma factorizes the affinity into the affinities of the restrictions to the partitioning sets, which are next lower bounded using the second lemma. The reduction to the partioning sets is useful, because it reduces the n-fold products to lower order products for which the second lemma is accurate. Define probability measures P λ and Q λ on X by dp,λ = 1 X dp λ p, dq,λ = 1 X dq λ p. (.1 Lemma.1. For any product probability measure π = π 1 π k on Λ and every n N, ρ( k Pλ n dπ(λ, Q n λ dπ(λ = E ρ (N, where (N 1,..., N k is multinomially distributed on n trials with success probability vector (p 1,..., p k and ρ : {0,..., n} [0, 1] is defined by ρ (0 = 1 and ρ (m = ρ( P,λ m dπ (λ, Q m,λ dπ (λ, m 1. Proof. Set P n : = Pλ n dπ(λ and consider this as the distribution of the vector (X 1,..., X n. Then, for and q λ densities of P λ and Q λ relative to some

5 Robins et al./semiparametric Minimax Rates 5 dominating measure, the left side of the lemma can be written as ρ( k Pλ n dπ(λ, Q n λ dπ(λ = E i:x i X q λ (X i dπ(λ Pn k i:x i X (X i dπ(λ. Because by assumption on each partitioning set X the measures Q λ and P λ depend on λ only, the expressions i:x i X q λ (X i and i:x i X (X i depend on λ only through λ. In fact, within the quotient on the right side of the preceding display, they can be replaced by i:x i X q,λ (X i and i:x i X p,λ (X i for q,λ and p,λ densities of the measures Q,λ and P,λ. Because π is a product measure, we can next use Fubini s theorem and rewrite the resulting expression as k i:x i X q,λ (X i dπ (λ k i:x i X p,λ (X i dπ (λ. E Pn Here the two products over can be pulled out of the square root and replaced by a single product preceding it. A product over an empty set (if there is no X i X is interpreted as 1. Define variables I 1,..., I n that indicate the partitioning sets that contain the observations: I i = if X i X for every i and, and let N = (#1 i n: I i = be the number of X i falling in X. The measure P n arises as the distribution of (X 1,..., X n if this vector is generated in two steps. First λ is chosen from π and next given this λ the variables X 1,..., X n are generated independently from P λ. Then given λ the vector (N ( 1,..., N k is multinomially distributed on n trials and probability vector Pλ (X 1,..., P λ (X k. Because the latter vector is independent of λ and equal to (p 1,..., p k by assumption, the vector (N 1,..., N k is stochastically independent of λ and hence also unconditionally, under P n, multinomially distributed with parameters n and (p 1,..., p k. Similarly, given λ the variables I 1,..., I n are independent and the event I i = has probability P λ (X, which is independent of λ by assumption. It follows that the random elements (I 1,..., I n and λ are stochastically independent under P n. The conditional distribution of X 1,..., X n given λ and I 1,..., I n can be described as: for each partitioning set X generate N variables independently from P λ restricted and renormalized to X, i.e. from the measure P,λ ; do so independently across the partitioning sets; and attach correct labels {1,..., n} consistent with I 1,..., I n to the n realizations obtained. The conditional distribution under P n of X 1,..., X n given I n is the mixture of this distribution relative to the conditional distribution of λ given (I 1,..., I n, which was seen to be the unconditional distribution, π. Thus we obtain a sample from the conditional distribution under P n of (X 1,..., X n given (I 1,..., I n by generating for each partitioning set X a set of N variables from the measure P N,λ dπ (λ, independently across the partitioning sets, and next attaching labels consistent with I 1,..., I n.

6 Robins et al./semiparametric Minimax Rates 6 Now rewrite the right side of the last display by conditioning on I 1,..., I n as k E Pn E i:i q i=,λ (X i dπ (λ Pn i:i p I 1,..., I n. i=,λ (X i dπ (λ The product over can be pulled out of the conditional expectation by the conditional independence across the partitioning sets. The resulting expression can be seen to be of the form as claimed in the lemma. The second lemma does not use the partitioning structure, but is valid for mixtures of products of arbitrary measures on a measurable space. For λ in a measurable space Λ let P λ and Q λ be probability measures on a given sample space (X, A, with densities and q λ relative to a given dominating measure ν, which are ointly measurable. For a given (arbitrary density p define functions l λ = q λ and κ λ = p, and set κ a = λ dν, λ Λ l b = λ dν, λ Λ c = λ λ d = λ Λ p dν, ( lµ dπ(µ dν. Lemma.. For any probability measure π on Λ and every n N, ρ( Pλ n dπ(λ, Q n λ dπ(λ n r= ( n b r 1 n 1 r n r=1 ( n 1 r a r b 1 n c n 1 d. Proof. Consider the measure P n = Pλ n dπ(λ, which has density p n( x n = n (x i dπ(λ relative to ν n, as the distribution of (X 1,..., X n. Using the inequality E 1 + Y 1 EY /8, valid for any random variable Y with 1 + Y 0 and EY = 0 (see for example [1], we see that ρ( Pλ n dπ(λ, = E Pn 1 + Q n λ dπ(λ [ n q λ(x i n (X i ] dπ(λ p n (X 1,..., X n E P n [ n q λ(x i n (X i ] dπ(λ p n (X 1,..., X n. (. It suffices to upper bound the expected value on the right side. To this end we expand the difference n q λ(x i n (X i as

7 Robins et al./semiparametric Minimax Rates 7 I 1 i I (X c i i I l λ(x i, where the sum ranges over all nonempty subsets I {1,..., n}. We split this sum in two parts, consisting of the terms indexed by subsets of size 1 and the subsets that contain at least elements, and separate the square of the sum of these two parts by the inequality (A + B A + B. If n = 1, then there are no subsets with at least two elements and the second part is empty. Otherwise the sum over subsets with at least two elements contributes two times I i I (x c i i I l λ(x i dπ(λ i p dν n ( x n λ(x i dπ(λ ( l λ pλ (x i (x i dπ(λ dν n ( x n pλ i I c i I I = I (x i i I c i I l λ (x i dπ(λ dν n ( x n. To derive the first inequality we use the inequality (EU /EV E(U /V, valid for any random variables U and V 0, which can be derived from Cauchy- Schwarz or Jensen s inequality. The last step follows by writing the square of the sum as a double sum and noting that all off-diagonal terms vanish, as they contain at least one term l λ (x i and l λ dν = 0. The order of integration in the right side can be exchanged, and next the integral relative to ν n can be factorized, where the integrals dν are equal to 1. This yields the contribution I b I to the bound on the expectation in (.. The sum over sets with exactly one element contributes two times n i (x i l λ (x dπ(λ i p dν n ( x n. (.3 λ(x i dπ(λ Here we expand ( κ λ (x i (x i p(x i = (x i i i i i = (x i ( κ λ (x i, i I c i I I 1, / I where the sum is over all nonempty subsets I {1,..., n} that do not contain. Replacement of i (x i by i p(x i changes (.3 into n i p(x il λ (x dπ(λ i p dν n ( x n λ(x i dπ(λ n i n p (x i l λ (x dπ(λ i p dν n ( x n λ(x i dπ(λ n n i p p µ (x i lλ dπ(λ p µ (x dπ(µ dν n ( x n.

8 Robins et al./semiparametric Minimax Rates 8 In the last step we use that 1/EV E(1/V for any positive random variable V. The integral with respect to ν n in the right side can be factorized, and the expression bounded by n c n 1 d. Four this this must be added to the bound on the expectation in (.. Finally the remainder after substituting i p(x i for i (x i in (.3 contributes n I 1, / I i I (x c i i I ( κ λ(x i l λ (x dπ(λ i p dν n ( x n λ(x i dπ(λ ( n n = n I 1, / I n ( n I 1, / I I 1, / I pλ (x i i I c i I κ λ pλ (x i l λ pλ (x dπ(λ dν n ( x n κ λ l pλ (x i (x i λ (x dπ(λ dν n ( x n pλ i I c i I κ λ (x i (x i l λ (x dπ(λ dν n ( x n. i I c i I We exchange the order of integration and factorize the integral with respect to ν n to bound this by n I 1, / I a I b. 3. Applications 3.1. Estimating the mean response in missing data models Suppose that a typical observation is distributed as X = (Y A, A, Z for Y and A taking values in the two-point set {0, 1} and conditionally independent given Z. We think of Y as a response variable, which is observed only if the indicator A takes the value 1, and are interested in estimating the mean response EY. The covariate Z is chosen such that it contains all information on the dependence between response and missingness indicator ( missing at random. We assume that Z takes its values in Z = [0, 1] d. The model can be parameterized by the marginal density f of Z relative to Lebesgue measure measure ν on Z, and the probabilities b(z = P(Y = 1 Z = z and a(z 1 = P(A = 1 Z = z. Alternatively, the model can be parameterized by the function g = f/a, which is the conditional density of Z given A = 1 up to the norming factor P(A = 1. Under this latter parametrization which we adopt henceforth, the density p of an observation X is described by the triple (a, b, g and the functional of interest is expressed as χ(p = abg dν. Define C α M [0, 1]d as M times the unit ball of the Hölder space of α-smooth functions on [0, 1] d. For given positive constants α, β, γ, φ and M, M, we consider the models P 1 = {(a, b, g: a C α M [0, 1]d, b C β M [0, 1]d, g = 1/, a, b M}. P = {(a, b, g: a C α M [0, 1]d, b C β M [0, 1]d, g C γ [0, 1] d, a, b M}.

9 Robins et al./semiparametric Minimax Rates 9 If (α+β/ d/4, then a n-rate is attainable over P (see [3], and a standard two-point proof can show that this rate cannot be improved. Here we are interested in the case (α+β/ < d/4, when the rate becomes slower than 1/ n. The paper [3] constructs an estimator that attains the rate n (α+β/(α+β+d uniformly over P if γ ( α β ( d α β γ + d > : = γ(α, β. (3.1 d d + α + β We shall show that this result is optimal by showing that the minimax rate over the smaller model P 1 is not faster than n (α+β/(α+β+d. In the case that α = β these results can be proved using the method of [1], but in general we need a construction as in Section with P λ based on a perturbation of the smoothest parameter of the pair (a, b and Q λ constructed by perturbing in addition the coarsest of the two parameters. Theorem 3.1. If (α + β/ < d/4 the minimax rate over P 1 for estimating abg dν is at least n α β/(α+β+d. Proof. Let H: R d R be a C function ported on the cube [0, 1/] d with H dν = 0 and H dν = 1. Let k be the integer closest to n d/(α+β+d and let Z 1,..., Z k be translates of the cube k 1/d [0, 1/] d that are disoint and contained in [0, 1] d. For z 1,..., z k the bottom left corner of these cubes and λ = (λ 1,..., λ k Λ = { 1, 1} k, let a λ (z = + k α/d k λ i H ( (z z i k 1/d, b λ (z = 1/ + k β/d k λ i H ( (z z i k 1/d. These functions can be seen to be contained in C α [0, 1] d and C β [0, 1] d with norms that are uniformly bounded in k. We choose a uniform prior π on λ, so that λ 1,..., λ k are i.i.d. Rademacher variables. We partition the sample space {0, 1} {0, 1} Z into the sets {0, 1} {0, 1} Z and the remaining set. We parameterize the model by the triple (a, b, g. The likelihood can then be written as ( A. (a 1 1 A (Z b Y (Z(1 b (Z 1 Y Because H dν = 0 the values of the functional abg dν at the parameter values (a λ, 1/1, 1/ and (, b λ, 1/ are both equal to 1/, whereas the value at (a λ, b λ, 1/ is equal to a λ b λ dν = 1 + ( 1 k α/d+β/d ( k H ( (z z i k 1/d dν = ( 1 k α/d+β/d.

10 Robins et al./semiparametric Minimax Rates 10 Thus the minimax rate is not faster than (1/k α/d+β/d for k = k n such that the convex mixtures of the products of the perturbations do not separate completely as n. We choose the mixtures differently in the cases α β and α β. α β. We define by the parameter (a λ, 1/, 1/ and q λ by the parameter (a λ, b λ, 1/. Because a λ dπ(λ = and b λ dπ(λ = 1/, we have p(x: = (X dπ(λ = ( A, b Y (Z(1 b (Z 1 Y ( p(x = (1 A(a λ (Z, (q λ (X = A(b λ 1/ Y (1/ b λ 1 Y, (q p(x: = (q λ (X dπ(λ = 0. Therefore, it follows that the number d in Theorem.1 vanishes, while the numbers a and b are of the orders k α/d and k β/d times ( k max Z λ i H ( (z z i k 1/d dν 1/k 1, respectively. Theorem.1 shows that ρ( Pλ n dπ(λ, Q n λ dπ(λ 1 C n 1 ( k 4β/d + k α/d k β/d. k For k n d/(α+β+d the right side is bounded away from 0. Substitution of this number in the magnitude of separation (1/k α/d+β/d leads to the rate as claimed in the theorem. α β. We define by the parameter (, b λ, 1/ and q λ by the parameter (a λ, b λ, 1/. The computations are very similar to the ones in the case α β. 3.. Estimating an expected conditional covariance Suppose that we observe n independent and identically distributed copies of X = (Y, A, Z, where as in the previous section, Y and A are dichotomous, and Z takes its values in Z = [0, 1] d with oint density given by f. Let b(z = P (Y = 1 Z = z and a(z = P (A = 1 Z = z. We note that b(z = P (Y = 1 A = 1, Z a (Z + {1 a (Z} P (Y = 1 A = 0, Z = {P (Y = 1 A = 1, Z P (Y = 1 A = 0, Z} a (Z + P (Y = 1 A = 0, Z = {P (Y = 1 A = 0, Z P (Y = 1 A = 1, Z} {1 a (Z} + P (Y = 1 A = 1, Z so that by combining the last two equations above, we can write P (Y = 1 A, Z = (Z {A a (Z} + b (Z

11 Robins et al./semiparametric Minimax Rates 11 where (Z = P (Y = 1 A = 1, Z P (Y = 1 A = 0, Z. This allows us to parametrize the density p of an observation by (, a, b, f. The functional χ (p is given by expected conditional covariance E f {cov,p,b (Y, A Z} = E,p,b,f (Y A abf dν (3. We consider the models { } B 1 = (, a, b, f : is unrestricted, a CM α [0, 1]d, b C β M [0, 1]d, f = 1, a, b M { } B = (, a, b, f : is unrestricted, a CM α [0, 1]d, b C β M [0, 1]d, f C γ M [0, 1]d, a, b M We are mainly interested in the case (α + β / < d/4 when the rate of estimation of χ (p becomes slower than 1/ n.the paper [3] constructs and estimator that attains the rate n (α+β/(α+β+d uniformely over B if equation 3.1 of the previous section holds. We will show that this rate is optimal by showing that the minimax rate over the smaller model B 1 is not faster than n (α+β/(α+β+d. The first term of the difference on the right side of equation (3. can be estimated by the sample average n 1 n Y ia i at rate n 1/. It follows that χ (p can be estimated at the maximum of n 1/ and the rate of estimation of abfdν. In other words, to establish that the minimax rate for estimating χ (p over B 1 is n (α+β/(α+β+d,we shall show that the minimax rate for estimating abfdν over B 1 is n (α+β/(α+β+d. Theorem 3.. If (α + β / < d/4 the minimax rate over B 1 for estimating abfdν is at least n (α+β/(α+β+d. Proof. Under the parametrization (, a, b, f, the density of an observation X is given by ([ (Z {A a (Z} + b (Z] a (Z Y A (1 Y A ([1 (Z {A a (Z} b (Z] a (Z ([ (Z {A a (Z} + b (Z] {1 a (Z} Y (1 A [{1 (Z {A a (Z} b (Z} {1 a (Z}] (1 Y (1 A f (Z Suppose α < β and set a λ (z = 1/ + δa λ (z = 1/ + k α/d k b λ (z = 1/ + δb λ (z λ (Z = = 1/ + k β/d k δb λ (Z 1/ δa λ (Z λ i H ((z z i k 1/d λ i H ((z z i k 1/d

12 Robins et al./semiparametric Minimax Rates 1 then at the parameters values (0, a λ, 1/, 1, abfdv = 1/4 with a corresponding likelihood = {a λ (Z} A [{1 a λ (Z}] (1 A, whereas at parameter values ( λ, a λ, b λ, 1, abfdv = 1/4 + n (α+β/(d+(α+β and the likelihood is given by q λ (X = {a λ (Z /} Y A (1 Y A {a λ (Z /} so that ([1/ + δb λ (Z] Y (1 A (1 Y (1 A ([1/ a λ (Z δb λ (Z] (q λ (X = (1 A δb λ (Z Y { δb λ (Z} (1 Y And we conclude that (q p (X = (q λ (X dπ (λ = 0.Furthermore so that and max = k β/d max max = k α/d max ( p (X = {δa λ (Z} A [ δa λ (Z] (1 A (q λ 1 λ X P (X dv ( k λ X λ i H ( (z z i k 1/d 1 P (X dv k β/d, ( (qλ dπ (λ 1 λ X P (X dv = 0 ( p 1 λ X P (X dv ( k λ X λ i H ( (z z i k 1/d 1 dv k α/d P (X Therefore, it follows that the number d of Theorem.1 vanishes, while the numbers a and b are of order k α/d and k β/d respectively. Theorem.1. shows that ( ρ Pλ n dπ (λ, Q n λdπ (λ 1 C n 1 (k 4β/d + k β/d k α/d k which gives the desired result for the choice of k n d/(α+β+d. Next, pose α > β, set a λ (Z and b λ (Z as above, and let λ (Z = δa λ (Zb λ (Z (1/ δa λ (Z a (Z

13 Robins et al./semiparametric Minimax Rates 13 then at the parameters values (0, 1/, b λ, 1, abfdv = 1/4 with corresponding likelihood (X = [(b λ (Z] Y [(1 b λ (Z] (1 Y whereas at parameter values (0,, b λ, 1, abfdv = 1/4 + n (α+β/(d+(α+β with corresponding likelihood given by so that q λ (X = [b λ (Z /] Y (1 Y A ([a λ (Z b λ (Z /] (1 Y (1 A [1/ δa λ (Z b λ (Z /] (q λ (X = (1 Y δa (Z A [ δa λ (Z] (1 A and we conclude that (q p (X = (q λ (X dπ (λ = 0. Furthermore so that and ( p (X = δb λ (Z Y [ δb λ (Z] (1 Y max (q λ 1 λ X P (X dv k α/d, ( (qλ dπ (λ 1 P (X dv = 0 λ X max ( p 1 dv k β/d λ X P (X which yields the desired result by arguments similar to the previous case. References [1] Birge L., Massart P.(1995. Estimation of integral functionals of a density. Annals of statistics, 3,11-9 [] van der Vaart, A. (1998. Asymptotic statistics. Cambridge University Press. [3] Robins J., Li,L. Tchetgen E. and van der Vaart, A (008 Higher order influence functions and minimax estimation of nonlinear functionals. IMS Lecture Notes-Monograph Series Probability and Statistics Models:Essays in Honor of David A. Freedman,

Semiparametric minimax rates

Semiparametric minimax rates Electronic Journal of Statistics Vol. 3 (2009) 1305 1321 ISSN: 1935-7524 DOI: 10.1214/09-EJS479 Semiparametric minimax rates James Robins and Eric Tchetgen Tchetgen Department of Biostatistics and Epidemiology

More information

MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED HIGH-DIMENSIONAL MODEL

MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED HIGH-DIMENSIONAL MODEL Submitted to the Annals of Statistics arxiv: math.pr/0000000 MINIMAX ESTIMATION OF A FUNCTIONAL ON A STRUCTURED HIGH-DIMENSIONAL MODEL By James M. Robins, Lingling Li, Rajarshi Mukherjee, Eric Tchetgen

More information

arxiv: v2 [stat.me] 8 Dec 2015

arxiv: v2 [stat.me] 8 Dec 2015 Submitted to the Annals of Statistics arxiv: math.pr/0000000 HIGHER ORDER ESTIMATING EQUATIONS FOR HIGH-DIMENSIONAL MODELS arxiv:1512.02174v2 [stat.me] 8 Dec 2015 By James Robins, Lingling Li, Rajarshi

More information

Minimax Estimation of a nonlinear functional on a structured high-dimensional model

Minimax Estimation of a nonlinear functional on a structured high-dimensional model Minimax Estimation of a nonlinear functional on a structured high-dimensional model Eric Tchetgen Tchetgen Professor of Biostatistics and Epidemiologic Methods, Harvard U. (Minimax ) 1 / 38 Outline Heuristics

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1]

Lecture 17: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 31, 2016 [Ed. Apr 1] ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture 7: Density Estimation Lecturer: Yihong Wu Scribe: Jiaqi Mu, Mar 3, 06 [Ed. Apr ] In last lecture, we studied the minimax

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

A note on L convergence of Neumann series approximation in missing data problems

A note on L convergence of Neumann series approximation in missing data problems A note on L convergence of Neumann series approximation in missing data problems Hua Yun Chen Division of Epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West

More information

An introduction to some aspects of functional analysis

An introduction to some aspects of functional analysis An introduction to some aspects of functional analysis Stephen Semmes Rice University Abstract These informal notes deal with some very basic objects in functional analysis, including norms and seminorms

More information

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University

RATE-OPTIMAL GRAPHON ESTIMATION. By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Submitted to the Annals of Statistics arxiv: arxiv:0000.0000 RATE-OPTIMAL GRAPHON ESTIMATION By Chao Gao, Yu Lu and Harrison H. Zhou Yale University Network analysis is becoming one of the most active

More information

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016

Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,

More information

On Consistent Hypotheses Testing

On Consistent Hypotheses Testing Mikhail Ermakov St.Petersburg E-mail: erm2512@gmail.com History is rather distinctive. Stein (1954); Hodges (1954); Hoefding and Wolfowitz (1956), Le Cam and Schwartz (1960). special original setups: Shepp,

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

MATHS 730 FC Lecture Notes March 5, Introduction

MATHS 730 FC Lecture Notes March 5, Introduction 1 INTRODUCTION MATHS 730 FC Lecture Notes March 5, 2014 1 Introduction Definition. If A, B are sets and there exists a bijection A B, they have the same cardinality, which we write as A, #A. If there exists

More information

Nonparametric Bayesian Uncertainty Quantification

Nonparametric Bayesian Uncertainty Quantification Nonparametric Bayesian Uncertainty Quantification Lecture 1: Introduction to Nonparametric Bayes Aad van der Vaart Universiteit Leiden, Netherlands YES, Eindhoven, January 2017 Contents Introduction Recovery

More information

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS

INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS INDISTINGUISHABILITY OF ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS STEVEN P. LALLEY AND ANDREW NOBEL Abstract. It is shown that there are no consistent decision rules for the hypothesis testing problem

More information

Convexity in R N Supplemental Notes 1

Convexity in R N Supplemental Notes 1 John Nachbar Washington University November 1, 2014 Convexity in R N Supplemental Notes 1 1 Introduction. These notes provide exact characterizations of support and separation in R N. The statement of

More information

Asymptotic efficiency of simple decisions for the compound decision problem

Asymptotic efficiency of simple decisions for the compound decision problem Asymptotic efficiency of simple decisions for the compound decision problem Eitan Greenshtein and Ya acov Ritov Department of Statistical Sciences Duke University Durham, NC 27708-0251, USA e-mail: eitan.greenshtein@gmail.com

More information

Math 5210, Definitions and Theorems on Metric Spaces

Math 5210, Definitions and Theorems on Metric Spaces Math 5210, Definitions and Theorems on Metric Spaces Let (X, d) be a metric space. We will use the following definitions (see Rudin, chap 2, particularly 2.18) 1. Let p X and r R, r > 0, The ball of radius

More information

Remarks on Extremization Problems Related To Young s Inequality

Remarks on Extremization Problems Related To Young s Inequality Remarks on Extremization Problems Related To Young s Inequality Michael Christ University of California, Berkeley University of Wisconsin May 18, 2016 Part 1: Introduction Young s convolution inequality

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space. University of Bergen General Functional Analysis Problems with solutions 6 ) Prove that is unique in any normed space. Solution of ) Let us suppose that there are 2 zeros and 2. Then = + 2 = 2 + = 2. 2)

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Notes on Poisson Approximation

Notes on Poisson Approximation Notes on Poisson Approximation A. D. Barbour* Universität Zürich Progress in Stein s Method, Singapore, January 2009 These notes are a supplement to the article Topics in Poisson Approximation, which appeared

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

number j, we define the multinomial coefficient by If x = (x 1,, x n ) is an n-tuple of real numbers, we set

number j, we define the multinomial coefficient by If x = (x 1,, x n ) is an n-tuple of real numbers, we set . Taylor s Theorem in R n Definition.. An n-dimensional multi-index is an n-tuple of nonnegative integer = (,, n. The absolute value of is defined ( to be = + + n. For each natural j number j, we define

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation:

The Dirichlet s P rinciple. In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: Oct. 1 The Dirichlet s P rinciple In this lecture we discuss an alternative formulation of the Dirichlet problem for the Laplace equation: 1. Dirichlet s Principle. u = in, u = g on. ( 1 ) If we multiply

More information

JUHA KINNUNEN. Harmonic Analysis

JUHA KINNUNEN. Harmonic Analysis JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes

More information

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get

If g is also continuous and strictly increasing on J, we may apply the strictly increasing inverse function g 1 to this inequality to get 18:2 1/24/2 TOPIC. Inequalities; measures of spread. This lecture explores the implications of Jensen s inequality for g-means in general, and for harmonic, geometric, arithmetic, and related means in

More information

MATH 131A: REAL ANALYSIS (BIG IDEAS)

MATH 131A: REAL ANALYSIS (BIG IDEAS) MATH 131A: REAL ANALYSIS (BIG IDEAS) Theorem 1 (The Triangle Inequality). For all x, y R we have x + y x + y. Proposition 2 (The Archimedean property). For each x R there exists an n N such that n > x.

More information

Topological properties of Z p and Q p and Euclidean models

Topological properties of Z p and Q p and Euclidean models Topological properties of Z p and Q p and Euclidean models Samuel Trautwein, Esther Röder, Giorgio Barozzi November 3, 20 Topology of Q p vs Topology of R Both R and Q p are normed fields and complete

More information

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II

NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II NATIONAL UNIVERSITY OF SINGAPORE Department of Mathematics MA4247 Complex Analysis II Lecture Notes Part II Chapter 2 Further properties of analytic functions 21 Local/Global behavior of analytic functions;

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

Deviation Measures and Normals of Convex Bodies

Deviation Measures and Normals of Convex Bodies Beiträge zur Algebra und Geometrie Contributions to Algebra Geometry Volume 45 (2004), No. 1, 155-167. Deviation Measures Normals of Convex Bodies Dedicated to Professor August Florian on the occasion

More information

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45

Division of the Humanities and Social Sciences. Supergradients. KC Border Fall 2001 v ::15.45 Division of the Humanities and Social Sciences Supergradients KC Border Fall 2001 1 The supergradient of a concave function There is a useful way to characterize the concavity of differentiable functions.

More information

Lebesgue-Radon-Nikodym Theorem

Lebesgue-Radon-Nikodym Theorem Lebesgue-Radon-Nikodym Theorem Matt Rosenzweig 1 Lebesgue-Radon-Nikodym Theorem In what follows, (, A) will denote a measurable space. We begin with a review of signed measures. 1.1 Signed Measures Definition

More information

2. Function spaces and approximation

2. Function spaces and approximation 2.1 2. Function spaces and approximation 2.1. The space of test functions. Notation and prerequisites are collected in Appendix A. Let Ω be an open subset of R n. The space C0 (Ω), consisting of the C

More information

2. The Concept of Convergence: Ultrafilters and Nets

2. The Concept of Convergence: Ultrafilters and Nets 2. The Concept of Convergence: Ultrafilters and Nets NOTE: AS OF 2008, SOME OF THIS STUFF IS A BIT OUT- DATED AND HAS A FEW TYPOS. I WILL REVISE THIS MATE- RIAL SOMETIME. In this lecture we discuss two

More information

Least squares under convex constraint

Least squares under convex constraint Stanford University Questions Let Z be an n-dimensional standard Gaussian random vector. Let µ be a point in R n and let Y = Z + µ. We are interested in estimating µ from the data vector Y, under the assumption

More information

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY

OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY published in IMA Journal of Numerical Analysis (IMAJNA), Vol. 23, 1-9, 23. OPTIMAL SCALING FOR P -NORMS AND COMPONENTWISE DISTANCE TO SINGULARITY SIEGFRIED M. RUMP Abstract. In this note we give lower

More information

Convexity of marginal functions in the discrete case

Convexity of marginal functions in the discrete case Convexity of marginal functions in the discrete case Christer O. Kiselman and Shiva Samieinia Abstract We define, using difference operators, a class of functions on the set of points with integer coordinates

More information

List coloring hypergraphs

List coloring hypergraphs List coloring hypergraphs Penny Haxell Jacques Verstraete Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada pehaxell@uwaterloo.ca Department of Mathematics University

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

Plug-in Approach to Active Learning

Plug-in Approach to Active Learning Plug-in Approach to Active Learning Stanislav Minsker Stanislav Minsker (Georgia Tech) Plug-in approach to active learning 1 / 18 Prediction framework Let (X, Y ) be a random couple in R d { 1, +1}. X

More information

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam. aad. Bayesian Adaptation p. 1/4

Bayesian Adaptation. Aad van der Vaart. Vrije Universiteit Amsterdam.  aad. Bayesian Adaptation p. 1/4 Bayesian Adaptation Aad van der Vaart http://www.math.vu.nl/ aad Vrije Universiteit Amsterdam Bayesian Adaptation p. 1/4 Joint work with Jyri Lember Bayesian Adaptation p. 2/4 Adaptation Given a collection

More information

Math 564 Homework 1. Solutions.

Math 564 Homework 1. Solutions. Math 564 Homework 1. Solutions. Problem 1. Prove Proposition 0.2.2. A guide to this problem: start with the open set S = (a, b), for example. First assume that a >, and show that the number a has the properties

More information

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland.

Measures. 1 Introduction. These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. Measures These preliminary lecture notes are partly based on textbooks by Athreya and Lahiri, Capinski and Kopp, and Folland. 1 Introduction Our motivation for studying measure theory is to lay a foundation

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define

(1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define Homework, Real Analysis I, Fall, 2010. (1) Consider the space S consisting of all continuous real-valued functions on the closed interval [0, 1]. For f, g S, define ρ(f, g) = 1 0 f(x) g(x) dx. Show that

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Continuity. Matt Rosenzweig

Continuity. Matt Rosenzweig Continuity Matt Rosenzweig Contents 1 Continuity 1 1.1 Rudin Chapter 4 Exercises........................................ 1 1.1.1 Exercise 1............................................. 1 1.1.2 Exercise

More information

Conditional expectation

Conditional expectation Chapter II Conditional expectation II.1 Introduction Let X be a square integrable real-valued random variable. The constant c which minimizes E[(X c) 2 ] is the expectation of X. Indeed, we have, with

More information

BALANCING GAUSSIAN VECTORS. 1. Introduction

BALANCING GAUSSIAN VECTORS. 1. Introduction BALANCING GAUSSIAN VECTORS KEVIN P. COSTELLO Abstract. Let x 1,... x n be independent normally distributed vectors on R d. We determine the distribution function of the minimum norm of the 2 n vectors

More information

FUNDAMENTALS OF REAL ANALYSIS by. IV.1. Differentiation of Monotonic Functions

FUNDAMENTALS OF REAL ANALYSIS by. IV.1. Differentiation of Monotonic Functions FUNDAMNTALS OF RAL ANALYSIS by Doğan Çömez IV. DIFFRNTIATION AND SIGND MASURS IV.1. Differentiation of Monotonic Functions Question: Can we calculate f easily? More eplicitly, can we hope for a result

More information

An inverse of Sanov s theorem

An inverse of Sanov s theorem An inverse of Sanov s theorem Ayalvadi Ganesh and Neil O Connell BRIMS, Hewlett-Packard Labs, Bristol Abstract Let X k be a sequence of iid random variables taking values in a finite set, and consider

More information

Introduction to Proofs

Introduction to Proofs Real Analysis Preview May 2014 Properties of R n Recall Oftentimes in multivariable calculus, we looked at properties of vectors in R n. If we were given vectors x =< x 1, x 2,, x n > and y =< y1, y 2,,

More information

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Problem Set 2: Solutions Math 201A: Fall 2016

Problem Set 2: Solutions Math 201A: Fall 2016 Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that

More information

A NICE PROOF OF FARKAS LEMMA

A NICE PROOF OF FARKAS LEMMA A NICE PROOF OF FARKAS LEMMA DANIEL VICTOR TAUSK Abstract. The goal of this short note is to present a nice proof of Farkas Lemma which states that if C is the convex cone spanned by a finite set and if

More information

Integral Jensen inequality

Integral Jensen inequality Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

Maths 212: Homework Solutions

Maths 212: Homework Solutions Maths 212: Homework Solutions 1. The definition of A ensures that x π for all x A, so π is an upper bound of A. To show it is the least upper bound, suppose x < π and consider two cases. If x < 1, then

More information

arxiv: v1 [math.st] 20 May 2008

arxiv: v1 [math.st] 20 May 2008 IMS Collections Probability and Statistics: Essays in Honor of David A Freedman Vol 2 2008 335 421 c Institute of Mathematical Statistics, 2008 DOI: 101214/193940307000000527 arxiv:08053040v1 mathst 20

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

< k 2n. 2 1 (n 2). + (1 p) s) N (n < 1

< k 2n. 2 1 (n 2). + (1 p) s) N (n < 1 List of Problems jacques@ucsd.edu Those question with a star next to them are considered slightly more challenging. Problems 9, 11, and 19 from the book The probabilistic method, by Alon and Spencer. Question

More information

Chapter 4. Measure Theory. 1. Measure Spaces

Chapter 4. Measure Theory. 1. Measure Spaces Chapter 4. Measure Theory 1. Measure Spaces Let X be a nonempty set. A collection S of subsets of X is said to be an algebra on X if S has the following properties: 1. X S; 2. if A S, then A c S; 3. if

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation Instructor: Moritz Hardt Email: hardt+ee227c@berkeley.edu Graduate Instructor: Max Simchowitz Email: msimchow+ee227c@berkeley.edu

More information

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October

Finding normalized and modularity cuts by spectral clustering. Ljubjana 2010, October Finding normalized and modularity cuts by spectral clustering Marianna Bolla Institute of Mathematics Budapest University of Technology and Economics marib@math.bme.hu Ljubjana 2010, October Outline Find

More information

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by

L p Functions. Given a measure space (X, µ) and a real number p [1, ), recall that the L p -norm of a measurable function f : X R is defined by L p Functions Given a measure space (, µ) and a real number p [, ), recall that the L p -norm of a measurable function f : R is defined by f p = ( ) /p f p dµ Note that the L p -norm of a function f may

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

PCA sets and convexity

PCA sets and convexity F U N D A M E N T A MATHEMATICAE 163 (2000) PCA sets and convexity by Robert K a u f m a n (Urbana, IL) Abstract. Three sets occurring in functional analysis are shown to be of class PCA (also called Σ

More information

The weak topology of locally convex spaces and the weak-* topology of their duals

The weak topology of locally convex spaces and the weak-* topology of their duals The weak topology of locally convex spaces and the weak-* topology of their duals Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto April 3, 2014 1 Introduction These notes

More information

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty V. Jeyakumar, G. M. Lee and G. Li Communicated by Sándor Zoltán Németh Abstract This paper deals with convex optimization problems

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

Note: all spaces are assumed to be path connected and locally path connected.

Note: all spaces are assumed to be path connected and locally path connected. Homework 2 Note: all spaces are assumed to be path connected and locally path connected. 1: Let X be the figure 8 space. Precisely define a space X and a map p : X X which is the universal cover. Check

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Foundations of Nonparametric Bayesian Methods

Foundations of Nonparametric Bayesian Methods 1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models

More information

Random sets of isomorphism of linear operators on Hilbert space

Random sets of isomorphism of linear operators on Hilbert space IMS Lecture Notes Monograph Series Random sets of isomorphism of linear operators on Hilbert space Roman Vershynin University of California, Davis Abstract: This note deals with a problem of the probabilistic

More information

SPECIAL CASES OF THE CLASS NUMBER FORMULA

SPECIAL CASES OF THE CLASS NUMBER FORMULA SPECIAL CASES OF THE CLASS NUMBER FORMULA What we know from last time regarding general theory: Each quadratic extension K of Q has an associated discriminant D K (which uniquely determines K), and an

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Notes on Distributions

Notes on Distributions Notes on Distributions Functional Analysis 1 Locally Convex Spaces Definition 1. A vector space (over R or C) is said to be a topological vector space (TVS) if it is a Hausdorff topological space and the

More information

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space. Chapter 1 Preliminaries The purpose of this chapter is to provide some basic background information. Linear Space Hilbert Space Basic Principles 1 2 Preliminaries Linear Space The notion of linear space

More information

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012

Discrete Geometry. Problem 1. Austin Mohr. April 26, 2012 Discrete Geometry Austin Mohr April 26, 2012 Problem 1 Theorem 1 (Linear Programming Duality). Suppose x, y, b, c R n and A R n n, Ax b, x 0, A T y c, and y 0. If x maximizes c T x and y minimizes b T

More information

Classical regularity conditions

Classical regularity conditions Chapter 3 Classical regularity conditions Preliminary draft. Please do not distribute. The results from classical asymptotic theory typically require assumptions of pointwise differentiability of a criterion

More information

STUDY PLAN MASTER IN (MATHEMATICS) (Thesis Track)

STUDY PLAN MASTER IN (MATHEMATICS) (Thesis Track) STUDY PLAN MASTER IN (MATHEMATICS) (Thesis Track) I. GENERAL RULES AND CONDITIONS: 1- This plan conforms to the regulations of the general frame of the Master programs. 2- Areas of specialty of admission

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

Information Theory and Statistics Lecture 3: Stationary ergodic processes

Information Theory and Statistics Lecture 3: Stationary ergodic processes Information Theory and Statistics Lecture 3: Stationary ergodic processes Łukasz Dębowski ldebowsk@ipipan.waw.pl Ph. D. Programme 2013/2014 Measurable space Definition (measurable space) Measurable space

More information

arxiv: v1 [math.pr] 11 Dec 2017

arxiv: v1 [math.pr] 11 Dec 2017 Local limits of spatial Gibbs random graphs Eric Ossami Endo Department of Applied Mathematics eric@ime.usp.br Institute of Mathematics and Statistics - IME USP - University of São Paulo Johann Bernoulli

More information