generalized modes in bayesian inverse problems
|
|
- Aleesha Newton
- 5 years ago
- Views:
Transcription
1 generalized modes in bayesian inverse problems Christian Clason Tapio Helin Remo Kretschmann Petteri Piiroinen June 1, 2018 Abstract This work is concerned with non-parametric modes and MAP estimates for priors that do not admit continuous densities, for which previous approaches based on small ball probabilities fail. We propose a novel definition of generalized modes based on the concept of qualifying sequences, which reduce to the classical mode in certain situations that include Gaussian priors but also exist for a more general class of priors. The latter includes the case of priors that impose strict bounds on the admissible parameters and in particular of uniform priors. For uniform priors defined by random series with uniformly distributed coefficients, we show that the generalized modes but not the classical modes as well as the corresponding generalized MAP estimates can be characterized as minimizers of a suitable functional. This is then used to show consistency of nonlinear Bayesian inverse problems with uniform priors and Gaussian noise. 1 introduction The Bayesian approach to inverse problems is attractive as it provides direct means to quantify uncertainty in the solution. Developments in modern sampling methodologies and increases in computational resources have made the Bayesian approach feasible in various large-scale inverse problems, and the approach has gained wide attention in recent years; we refer to the early work [11], the books [15, 31], and the more recent works [9, 20, 30] as well as to the references therein. To motivate our contribution, we recall that the Bayesian approach produces statistical inference based on the likelihood and prior probability distributions of the system, and that all expert information independent of the observation enter the analysis through the prior model. Given some prior distribution µ 0 of the probable values of x, the Bayesian inverse problem is then to quantify the uncertainty in x given some noisy observation y = y δ. It is well-known, e.g., from [30] that under quite general conditions, the posterior µ post is absolutely continuous with respect to the prior and is given by the Bayes formula (1.1) dµ post dµ 0 (x y) = exp( Φ(x; y)), Faculty of Mathematics, University Duisburg-Essen, Thea-Leymann-Strasse 9, Essen, Germany (christian.clason@uni-due.de, remo.kretschmann@uni-due.de) Department of Mathematics and Statistics, University of Helsinki, Gustaf Hallströmin katu 2b, FI Helsinki, Finland (tapio.helin@helsinki.fi,petteri.piiroinen@helsinki.fi) 1
2 where Φ is the negative log-likelihood. Due to the frequent high dimensionality of x in inverse problems, there is a fundamental need to understand whether the problem scales well to the infinite-dimensional case and whether the related estimators have well-behaving its; this leads to the study of non-parametric Bayesian inverse problems. Due to the computational effort required by Markov chain Monte Carlo (MCMC) algorithms in large-scale problems, there has been interest to study how approximate Bayesian methods (see, e.g., [7, 24 26, 29]) can be used effectively in inverse problems. A central estimator used in this research effort is the maximum a posteriori (MAP) estimate, which can be formulated in a finite-dimensional setting as an optimization problem (1.2) x MAP = arg min {Φ(x; y) + R(x)}, x where R is the negative log-prior density. As an optimization problem, (1.2) is relatively fast to solve and is widely used in applications as the basis of, e.g., Laplace approximation [32]. The study of non-parametric MAP estimates or more generally, non-parametric modes was only recently initiated by Dashti et al. in [8] in the context of nonlinear Bayesian inverse problems with Gaussian prior and noise distribution. In the Gaussian case, one has an explicit relation between the right-hand side functional in (1.2) and the Onsager Machlup functional defined via the it of small ball probabilities, which can be used to give a statistical justification for the objective functional that is consistent with the finite-dimensional definition [8]. The difficulty related to non-parametric MAP theory is that the posterior does not have a natural density representation as in the finite-dimensional case, and consequently, the MAP estimate does not have a clear correspondence to an optimization problem such as (1.2). More recently, there have been a series of results [1, 10, 12] that extend the definition and scope of non-parametric MAP estimation. For example, weak MAP estimates were proposed and studied in [12] in the context of linear Bayesian inverse problems with a general class of priors. The authors used the tools from the differentiation and quasi-invariance theory of measures (developed by Fomin and Skorokhod and discussed in detail in [4]) to connect the zero points of the logarithmic derivative of a measure to the minimizers of the Onsager Machlup functional. This program of generalization is motivated by the fact that in practice, due to the ill-posedness and often high dimensionality of the problem, the prior plays a key role in successful stabilization [30] and posterior modeling for the inverse problem. Therefore, the prior needs to be carefully designed to reflect the best possible subjective information available. Here, we only mention priors that reflect the sparsity [19, 21], hierarchical structure [5, 33], or anisotropic features [16] of the unknown. However, in the non-parametric case, all results regarding the MAP estimate known to the authors require continuity of the prior; but even simple one-dimensional discontinuous density can fail to have a mode in terms of the definition given in [8] (see Example 2.2). This is iting, since suitable prior modeling may involve imposing strict bounds on the admissible values of x emerging from some fundamental properties of the application. As a simple example, consider, e.g., the classical inverse problem of X-ray medical imaging [27]. Since it is reasonable to assume that there are no radiation sources inside the patient, the attenuation of X-rays is positive throughout the body. A related situation occurs in electrical impedance tomography and more generally, in parameter estimation problems for partial differential 2
3 equations where the pointwise upper and lower bounds need to be imposed on the parameter to ensure well-posedness of the forward problem; reasonable bounds are often available from a priori information on, e.g., the kinds of tissue or material expected in the region of interest. Furthermore, it is known in the deterministic theory that restriction of the unknown to a compact set can stabilize an inverse problem without additional (possibly undesired) regularization terms; this is sometimes referred to as quasi-solution or Ivanov regularization [13, 14, 23, 28], and the use of pointwise bounds for this purpose has recently been studied in [6, 17]. In the Bayesian approach, this would correspond to priors with densities whose mass is contained in a compact set. Together, this motivates the study of imposing such hard bounds as part of the prior in Bayesian inverse problems. The main contribution of our work in this direction is two-fold: First, we introduce a novel definition of generalized modes for probability measures and characterize conditions under which our definition coincides with the previous definition of modes given in [8]. More precisely, we show that if the Radon Nikodym derivative dµ( h)/dµ of the translated measure µ has certain equicontinuity properties (described in Section 3.1) over a dense set of translations h, any generalized mode is also a mode in the sense of [8]. In particular, we demonstrate that our definition for generalized modes does not introduce pathological modes in the case of continuous densities and that strong and generalized modes coincide for Gaussian measures. Second, we consider uniform priors defined via the random series (1.3) ξ = γ k ξ k ϕ k, k=1 where ξ k U[ 1, 1] are uniformly distributed and γ k are suitable weights; such priors clearly have a discontinuous density. We further show that the uniform prior (1.3) does not have any mode that touches the bounds (i.e., where ξ k { 1, 1} for some k N) and, therefore, it is not guaranteed that the posterior would have any mode. However, we prove that such points are in fact generalized modes of the prior. Furthermore, we show that for the uniform prior defined via (1.3), the definition of the generalized mode characterizes in a natural manner the MAP estimates of (1.1) as minimizers of (1.2). We also provide a weak consistency result regarding the MAP estimates in line of the previous work [1, 8]. This paper is organized as follows. In Section 2, we give our definition of generalized modes and illustrate the definition for the examples of a measure with discontinuous density (which does not admit a strong mode) and of Gaussian measures (where generalized and strong modes coincide). A general investigation of conditions for the generalized and strong modes to coincide is carried out in Section 3. We next construct uniform priors on an infinite-dimensional Banach space and characterize its generalized modes in Section 4. In Section 5, we study Bayesian inverse problems with such priors and derive a variational characterization of generalized modes that play the role of a generalized Onsager Machlup functional. This is used in Section 6 to show consistency of nonlinear Bayesian inverse problems with uniform priors and Gaussian noise. 3
4 2 generalized modes Let X be a separable Banach space and µ be a probability measure on X. Throughout, let B δ (x) X denote the open ball around x X with radius δ, and set M δ := sup µ(b δ (x)) x X for δ > 0. We first recall the definition of a mode introduced in [8]. Definition 2.1. A point ˆx X is called a (strong) mode of µ if µ(b δ ( ˆx)) δ 0 M δ = 1. The modes of the posterior measure µ y are called maximum a posteriori (MAP) estimates. It is straightforward to construct a probability distribution with discontinuous Lebesgue density which does not have a mode according to Definition 2.1. Example 2.2. Let µ be a probability measure on R with density p with respect to the Lebesgue measure, defined via { 1 x if x [0, 1], p(x) p(x) := p(x) := 0 otherwise, R p(x)dx. Clearly, ˆx = 0 maximizes p; however, this is not a strong mode. First, note that for every δ > 0, µ(b δ (δ)) = sup µ(b δ (x)) = M δ. x R Hence of all balls with radius δ, the one around x δ := δ has the highest probability. However, although x δ 0 for δ we have for δ small enough that and hence Definition 2.1 is not satisfied. µ(b δ (0)) δ 0 µ(b δ (δ)) = δ(1 1 2 δ) δ 0 2δ(1 δ) = 1 2 < 1 The example above illustrates that as density functions are considered in more generality (e.g., as elements in L 1 (R)), the concept of maximum point should be chosen carefully. This line of thought motivates the following definition. Definition 2.3. A point ˆx X is called a generalized mode of µ if for every sequence {δ n } n N (0, ) with δ n 0 there exists a qualifying sequence {w n } n N X with w n ˆx in X and (w n )) (2.1) n M δ = 1. n We call generalized modes of the posterior measure µ y generalized MAP estimates. 4
5 Note that by Definition 2.3, every strong mode ˆx X is also a generalized mode with the qualifying sequence w n := ˆx. Furthermore, in Example 2.2, ˆx = 0 is a generalized mode with the qualifying sequence w n := δ n for any positive sequence {δ n } n N with δ n 0. In fact, we can use the following stronger condition to show that a point is a generalized mode. Lemma 2.4. Let ˆx X. If there is a family {w δ } δ >0 X such that w δ ˆx in X as δ 0 and then ˆx is a generalized mode of µ. µ(b δ (w δ )) δ 0 M δ = 1, In Example 2.2, this condition is obviously satisfied for ˆx = 0 and w δ := δ. Remark 2.5. Definition 2.3 can be further generalized by allowing for weakly converging qualifying sequences. Conversely, we can restrict Definition 2.3 by coupling the convergence of w n to that of δ n (cf. Lemma 2.4). As we will show below, our definition has the advantage of not introducing pathological modes in the case of continuous densities (such as Gaussian or Besov distributions) while for the important case of a uniform prior, it introduces modes that are natural in terms of the variational characterization (1.2). We now illustrate some of the key ideas of the generalized mode by considering the case of a Gaussian measure µ on X. We assume that µ is centered and note that the results below trivially generalize to the non-centered case. We will require the following quantitative estimate for Gaussian ball probabilities. Lemma 2.6 ([8, Lem. 3.6]). Let x X and δ > 0 be given. Then there exists a constant a 1 > 0 independent of x and δ such that µ(b δ (x)) µ(b δ (0)) e a1 2 δ 2 e a 1 2 ( x X δ ) 2. Now we can show that for a centered Gaussian measure µ, Definitions 2.1 and 2.3 do indeed coincide. Theorem 2.7. The origin 0 X is both a strong mode and a generalized mode of µ, whereas all x X \ {0} are neither a strong mode nor a generalized mode. Proof. First, by Anderson s inequality (see, e.g., [3, Thm ]) we have that µ(b δ (x)) µ(b δ (0)) for all x X and δ > 0 and hence that M δ = µ(b δ (0)) for all δ > 0. This immediately yields that ˆx = 0 is a strong mode and hence also a generalized mode for µ. Now assume that x X \ {0} is a generalized mode of µ. Let {δ n } n N (0, ) with δ 0 be arbitrary and let {w n } n N X be the respective qualifying sequence. Then, Lemma 2.6 yields µ(b δ (w)) µ(b δ (0)) a 1 e 2 w X ( w X 2δ ) e a 1 2 ( 4 3 x X )( 4 1 x X ) = e 3a 1 32 x 2 X =: A < 1 5
6 for all δ (0, δ 0 ] and w B δ 0 (x) with δ 0 := 1 4 x X. We choose n 0 N large enough such that w n x X < δ 0 and δ n < δ 0 for all n n 0. Now taking the it in the above yields (w n )) (w n )) n M δ = n n µ(b δ A < 1, n (0)) which is a contradiction. Hence, x is not a generalized mode and thus cannot be a strong mode, either. The explicit bound in Lemma 2.6 plays an important role in Theorem 2.7. Compare this to the alternative approach using the Onsager Machlup functional I(x) = 1 2 x 2 E, defined as satisfying µ(b δ ( (x 1 )) 1 δ 0 µ(b δ (x 2 )) = exp 2 x 2 E 2 1 ) 2 x 1 E 2, which holds for all x 1, x 2 from the Cameron Martin space E X of µ by [22, Prop. 18.3]. Although this relation can be used to show that 0 is the only generalized MAP estimate in E, it does not yield any information regarding X \ E. In the next section, we will for general measures µ under additional continuity assumptions extend results from a dense space such as E to the whole space X. 3 relation between generalized and strong modes In this section, we derive conditions under which our definition of generalized modes coincides with the standard notion of strong modes. We do this by based on further characterizations of the convergence of the qualifying sequence in the definition of generalized modes. 3.1 characterization by rate of convergence Consider a general probability measure µ on X. Let us first make the fundamental observation that for δ n > 0 and ˆx, w n X, we have that (3.1) ( ˆx)) sup x X µ(b δ = µ(bδn ( ˆx)) n (x)) (wn )) (w n )) sup x X µ(b δ. n (x)) Clearly, if ˆx is a generalized mode, we have control over the right-most ratio in (3.1). On the other hand, convergence of the ratio on the left-hand side in (3.1) is related to the definition of a strong mode. This leads to the following equivalence. Theorem 3.1. Let ˆx X be a generalized mode of µ. Then ˆx is a strong mode if and only if for every sequence {δ n } n N (0, ) with δ n 0, there exists a qualifying sequence {w n } n N X with w n ˆx and ( ˆx)) (3.2) n (wn )) = 1. 6
7 Proof. Let {δ n } n N be a positive sequence with δ n 0 and let {w n } n N be the corresponding qualifying sequence that satisfies (3.2). Then n ( ˆx)) sup x X (x)) = n ( ˆx)) (wn )) n (w n )) sup x X (x)) = 1 and hence ˆx is a strong mode. Conversely, assume that ˆx is not a strong mode. Then there exists a positive sequence {δ n } n N such that δ n 0 and This, in turn, implies that n n ( ˆx)) (wn )) = n ( ˆx)) sup x X (x)) < 1. ( ˆx)) sup sup x X µ(b δ x X (x)) n (x)) n µ(b δ < 1 n (wn )) for every qualifying sequence {w n } n N. Hence (3.2) does not hold. We can use this theorem to restrict the balls over which the supremum in the definition of M δ is taken. Corollary 3.2. Let ˆx X be a generalized mode of µ. If there exists an r > 0 such that then ˆx is a strong mode. δ 0 µ(b δ ( ˆx)) sup w B r ( ˆx) µ(bδ (w)) = 1, Proof. Let {δ n } n N be a positive sequence such that δ n corresponding qualifying sequence. Then 0, and let {w n } n N X be the n ( ˆx)) (wn )) n ( ˆx)) sup w B r ( ˆx) µ(bδ n (w)) δ 0 µ(b δ ( ˆx)) sup w B r ( ˆx) µ(bδ (w)) = 1 since w n ˆx. Hence, (3.2) holds, and ˆx is therefore a strong mode by Theorem 3.1. We illustrate the possibility of satisfying (3.2) with the following example. Example 3.3. For a probability measure µ on R with continuous density p with respect to the Lebesgue measure, (3.2) is satisfied in every point ˆx R with p( ˆx) > 0. To see this, let ε > 0. Then there exists a δ 0 > 0 such that for all x B δ 0 ( ˆx), p(x) p( ˆx) ε. Therefore, for all δ (0, δ 0 2 ) and w B δ 0 2 ( ˆx), µ(b δ (w)) p( ˆx) 2δ 1 2δ w+δ w δ p(x) p( ˆx) dx ε. 7
8 So, for every positive sequence δ n 0 and every real sequence w n ˆx, (w n )) = p( ˆx) > 0. n 2δ n This holds true in particular for the constant sequence w n = ˆx, so that n ( ˆx)) p( ˆx) µ(b δ = n (wn )) p( ˆx) = 1. The property (3.2) can be seen as the requirement to have sufficiently fast convergence of the qualifying sequences. In other words, we require that the balls B δ ( ˆx) and B δ n (w n ) have asymptotically the same measure. This idea can further quantified by the following theorem. Theorem 3.4. Let ˆx X be a generalized mode of a Borel probability measure µ. If (i) for every positive sequence {δ n } n N with δ 0, there exists a qualifying sequence {w n } n N such that w n ˆx X = 0, n δ n (ii) the family of functions {f n } n N on [0, 1] defined by f n : [0, 1] R, f n (r) := µ(br (δ n+ w n ˆx X ) ( ˆx)) + w n ˆx X ( ˆx)), is equicontinuous at r = 1, then ˆx is a strong mode. Proof. Notice first that by monotonicity of probability, the function ϕ x,s (r) := µ(brs (x)) µ(b s (x)) is a left-continuous increasing function with right its on [0, 1] for every x X and every s > 0 and hence so is f n = ϕ ˆx,δn + w n ˆx X. Let now {δ n } n N be a positive sequence with δ n 0 and let {w n } n N be a qualifying sequence satisfying assumption (i). Setting by the triangle inequality we have r n := w n ˆx X for all n N, ( ˆx)) (w n )) +r n ( ˆx)) r n ( ˆx)) for n large enough. Furthermore, assumption (ii) implies that for every ε > 0, ( ) δn r n f n (1) f n < ε δ n + r n 8
9 for every n large enough, since We thus obtain that r n δ n δ n r n = 1 δ n + r n 1 + r 1 as n. n δn sup n ( ˆx)) (w n )) +r n ( ˆx)) which implies that ( ˆx)) (w n )) ( ( ˆx)) (3.3) n µ(b δ = n+r n ( ˆx)) n µ(b δ 1 µ(bδ n ) (w n )) n+r n ( ˆx)) µ(b δ = 0. n ( ˆx)) Since for every ε > 0 the assumption (ii) yields f n (1) f n (δ n /(δ n + r n )) < ε for every n large, it also follows that ( ˆx)) +r n ( ˆx)) sup n µ(b δ ε n+r n ( ˆx)) and in particular that ( ˆx)) (3.4) n µ(b δ = 1. n+r n ( ˆx)) The two its (3.3) and (3.4) together yield that (w n )) n µ(b δ = 1 n ( ˆx)) ε and, therefore, ˆx is a strong mode by Theorem 3.1. Remark 3.5. The equicontinuity condition (ii) is implied by the η-annular decay property (η-ad) at the generalized mode point (see [2] for the definition). In the finite-dimensional case, the 1-AD property at x is equivalent to f (r) = µ(b r (x)) being locally absolutely continuous on (0, ) and f (r)r c f (r) for some c > 0 and almost every r > 0. For example, it can be seen by direct computation that the function f in Example 2.2 has the 1-AD property at the generalized mode ˆx = characterization by convergence on a dense subspace We next consider the case when the qualifying sequences {w n } n N are restricted to a dense subspace of X where a certain continuity of the ratios holds. Proposition 3.6. Let ˆx X be a generalized mode of a Borel probability measure µ and let E be a dense subset of X. Then for every {δ n } n N (0, ) with δ 0 there exists a sequence { w n } n N E with w n ˆx in X such that n ( w n )) sup x X (x)) = 1. 9
10 Proof. We first show that for every δ > 0, the mapping x µ(b δ (x)) is lower semi-continuous. To this end, let {x n } n N X be a sequence converging to x X and let { 1, if x A, χ A (x) = 0, otherwise, denote the characteristic function of a set A X. Then for every x X, we have and Fatou s Lemma yields µ(b δ (x)) = X inf n χ B δ (x) (x) inf χ n B δ (x n ) (x), χ B δ (x) (x)µ(dx) X X inf n χ B δ (x n ) (x)µ(dx) χ B δ (x n )(x)µ(dx) = inf n µ(bδ (x n )). Now let {δ n } (0, ) with δ n 0 and let {w n } n N X be a corresponding qualifying sequence. Consider a fixed n N. By the lower semi-continuity of x (x)), we can choose an R > 0 such that (v)) (w n )) 1 n Mδ n for all v B R (w n ). Set r = min{r, 1 n }. Because E is dense in X, we can choose a w n B r (w n ), which therefore satisfies both 1 µ(bδ n ( w n )) M δ µ(bδ n (w n )) n M δ 1 n n and w n ˆx X w n w n X + w n ˆx X 1 n + w n ˆx X. As n N was arbitrary, we obtain the desired sequence { w n } n N E with w n ˆx in X and ( w n )) 1 n M δ n (w n )) 1 n M δ n n n = 1. In order to give a sufficient condition for the coincidence of generalized and strong modes, we consider a more specific class of probability measures. Let µ be a Borel probability measure that possesses a dense continuously embedded subspace (E, E ) X of admissible shifts, i.e., the shifted measure µ h := µ( h) is equivalent to µ for every h E, and assume that for every h E the density of µ h with respect to µ has a continuous representative dµ h dµ C(X). Theorem 3.7. Let ˆx X be a generalized mode of µ. If (i) for every {δ n } n N (0, ) with δ n 0 there is a qualifying sequence {w n } n N ˆx + E with w n ˆx E 0, 10
11 (ii) there is an R > 0 such that is continuous in ˆx, then ˆx is a strong mode. f R : (E, E ) R, f R (w) := sup dµ ˆx w (x) 1 x B R ( ˆx) dµ Proof. Choosing N large enough such that δ n R for all n N, we have for all n N that µ(b δ n ( ) (w n )) µ(b δ 1 n ( ˆx)) = 1 dµ ˆx wn µ(b δ (x) 1 µ(dx) n ( ˆx)) B δn ( ˆx) dµ sup dµ ˆx wn (x) 1 1 dµ µ(b δ µ(dx) f R (w n ). n ( ˆx)) x B δn ( ˆx) B δn ( ˆx) However, f R (w n ) f R ( ˆx) = 0 by the continuity of f R and the convergence w n ˆx in E, so that (w n )) n µ(b δ = 1. n ( ˆx)) Hence, ˆx is a strong mode by Theorem 3.1. For a nondegenerate Gaussian measure µ = N(a, Q) on a separable Hilbert space X with mean a X and covariance operator Q L(X ), the assumptions of Theorem 3.7 are fulfilled for E = Q(X) and ˆx = w n := a for all n N as well as any R > 0. Corollary 3.8. Let ˆx X be a generalized mode of µ that satisfies condition (i) of Theorem 3.7. If additionally dµ ˆx w ( ˆx) = 1 w E ˆx dµ and there is an r > 0 such that the family { } dµ ˆx w : w B r E dµ ( ˆx), B r E ( ˆx) := {x ˆx + E : x ˆx E < r } ˆx + E, is equicontinuous in ˆx, then ˆx is a strong mode Proof. We show that condition (ii) of Theorem 3.7 is satisfied as well, from which the claim then follows. For a given ε > 0 we choose 0 < r 0 r small enough such that for all w B r 0 E ( ˆx) we have that dµ ˆx w ( ˆx) 1 dµ ε 2. If we also choose 0 < R r 0 such that for all x BX R ( ˆx) and all w Br E (w) we have that dµ ˆx w dµ (x) dµ ˆx w dµ ( ˆx) ε 2, 11
12 then the triangle inequality yields that f R (w) f R ( ˆx) = sup x B R ( ˆx) dµ ˆx w dµ (x) 1 0 ε for all w B r 0 E ( ˆx) and therefore the equicontinuity of f at ˆx. 4 modes of uniform priors We now demonstrate the usefulness of the concept of generalized modes for a class of uniform probability measures on infinite-dimensional spaces that can serve as priors in Bayesian inverse problems. We first discuss the rigorous construction of the uniform probability measure and then show that such measures admit generalized but in general not strong modes. 4.1 construction of the probability measure We proceed similar as in [9], with the difference that we define a probability measure on a subspace of l rather than of L. Let us first fix some notation. For x := {x k } k N l, we write x = sup k N x k. Furthermore, let e j l for j N denote the standard unit vector in l, i.e., [e k ] j = 1 for j = k and 0 else. We then define (4.1) X := span{e k } k N l and note that X = {x l : k x k = 0} =: c 0. We thus have that (X, ) is a separable Banach space. We now construct a class of probability measures on X whose mass is concentrated on a set of sequences with strictly bounded components. First, we define a random variable ξ according to the random series (4.2) ξ := where γ k ξ k e k, (i) ξ k U[ 1, 1] (i.e., uniformly distributed on [ 1, 1]) and (ii) γ k 0 for all k N with γ k 0. k=1 Note that the partial sums ξ n := n k=1 γ kξ k e k almost surely form a Cauchy sequence in X, since for all N,m, n N with N m n we have that ξ n ξ m = n k=m+1 γ kξ k e k = sup m+1 k n γ k ξ k sup k N γ k, and the right hand side tends to zero as N. Since X is complete, the series (4.2) therefore converges almost surely. We can thus define the probability measure µ γ on X by (4.3) µ γ (A) := P [ξ A] for every A B(X ), 12
13 The following sets will be important for our study of the generalized and strong modes of µ γ. We define and for every δ > 0, E γ := {x X : x k γ k for all k N}, as well as Eγ δ := {x X : x k max{γ k δ, 0} for all k N} E γ Eγ 0 := Eγ δ. δ >0 We first collect some basic properties of E γ and E δ γ. Proposition 4.1. The sets E γ and E δ γ are convex, compact, and have empty interior. Proof. We only consider the case of E γ ; the case of E δ γ follows analogously. First, convexity follows directly from the definition since for every x, y E γ and λ [0, 1] we have that {λx + (1 λ)y} k = λx k + (1 λ)y k λ x k + (1 λ) y k γ k for all k N, and hence that λx + (1 λ)y E γ. For the compactness of E γ, we first show that X \ E γ is open. Let x X \ E γ be arbitrary. Then there exists an m N with x m > γ m, so that for ε := x m γ m we have B ε (x) X \ E γ as claimed. Hence as a closed subset of a complete space, E γ is itself complete, which allows us to show compactness by constructing a finite covering of E γ by balls of radius ε. Let ε > 0. As γ k 0, there is an N N such that for every x E γ and all k N + 1, x k γ k < ε. Now choose M N with Mε max{γ k : k = 1,..., N }. For every x E γ and k {1,..., N }, there is a λ k { M,..., M} such that x k λ k ε < ε. Hence, x z N < ε forz N := N k=1 λ kεe k, which implies that N E γ B ε (z N ) k=1 is the desired finite covering. Finally, to show that E γ has empty interior, let x E γ and ε > 0 be arbitrary. We choose m N such that γ m < ε 3. Then, y := x + 3γ mϕ m B ε (x), because y x = 3γ m ϕ m = 3γ m < ε, but y E γ. Consequently, x E γ and hence the interior of E γ is empty. Of particular use for finding both generalized and strong modes will be the metric projections onto E δ γ. For any δ > 0, let P δ : X Eγ δ, P δ (x) := x with x x = inf z x, z Eγ δ 13
14 which is well-defined because E δ γ is closed and convex by Proposition 4.1. It is straightforward to show by case distinction that this projection can be characterized componentwise for every k N via 0 if γ k < δ, (4.4) [P δ γ k δ if x k > γ k δ > 0, (x)] k = x k if x k [ γ k + δ,γ k δ], γ k + δ if x k < γ k + δ < 0. The projection satisfies the following properties. Lemma 4.2. Let x E γ. Then, δ 0 P δ x = x in X. Proof. This follows directly from the definition since {P δ x} k x k δ for all k N. Finally, we can give a more explicit characterization of E 0 γ. Lemma 4.3. We have that E 0 γ = {x X : x k < γ k for all k N, x k 0 for finitely many k N}. Proof. By definition, for every x E 0 γ there exists a δ > 0 with x E δ γ This implies that x k max{γ k δ, 0} < γ k for all k N. Moreover, since γ k 0 there exists an N N with x k max{γ k δ, 0} = 0 for all k N. This yields γ k δ 0 for all k N. Now let x X be an element from the set on the right hand side. As x k < γ k for all k N and because there are only finitely many k N with x k 0, we can choose δ 0 := min{γ k x k : k N with x k 0} > 0 such that x E δ 0 γ E 0 γ. 4.2 small ball probabilities We next study the behavior of small balls under the probability measure µ γ, which is crucial for determining modes. For the sake of conciseness, let (4.5) J δ γ (x) := µ γ (B δ (x)) for all x X, for which we have the following straightforward characterization. Lemma 4.4. For every x X and δ > 0, it holds that µ γ (B δ (x)) = J δ γ (x) = where A denotes the closure of a set A X. P [γ k ξ k (x k δ, x k + δ)], k=1 14
15 Proof. First of all, by definition J δ γ (x) = µ γ (B δ (x)) = P [ ] ξ B δ (x) = P [ γ k ξ k x k < δ for all k N]. As both γ k 0 and x k 0, there exists an N N such that γ k + x k δ 2 for all k N + 1. This implies that since ξ k [ 1, 1] almost surely. Consequently, P [ γ k ξ k x k < δ for all k N + 1] = 1 µ γ (B δ (x)) = P [ γ k ξ k x k < δ for k = 1,..., N ] P [ γ k ξ k x k < δ for all k N + 1] N = P [ γ k ξ k x k < δ] k=1 by the independence of the ξ k. This yields the second identity. The first identity now follows from P [γ k ξ k [x k δ, x k + δ]] = P [γ k ξ k (x k δ, x k + δ)] for all k N. We can use this characterization to show that for every δ > 0, the origin maximizes J δ γ. Proposition 4.5. Let δ > 0. Then J δ γ (0) = max x X J δ γ (x) > 0. Proof. From the second equality in Lemma 4.4, we have that µ γ (B δ (x)) = P [γ k ξ k (x k δ, x k + δ)] k=1 P [γ k ξ k ( δ, δ)] = µ γ (B δ (0)) k=1 for every x = {x k } k N X. On the other hand, µ γ (B δ (0)) sup x X µ γ (B δ (x)), which shows that x = 0 is a maximizer. For the positivity, note that as γ k 0, there exists an N N such that µ γ (B δ (0)) = N P [γ k ξ k ( δ, δ)] > 0 k=1 since P [γ k ξ k ( δ, δ)] > 0 for all k N. Crucially, J δ γ is constant on E δ γ. Proposition 4.6. For every δ > 0, J δ γ (x 1 ) = J δ γ (x 2 ) for all x 1, x 2 E δ γ. 15
16 Proof. Let x 1, x 2 Eγ δ. For every k N, we distinguish between the following two cases: (i) δ γ k : In this case, (x 1,k δ, x 1,k + δ) ( γ k,γ k ) and (x 2,k δ, x 2,k + δ) ( γ k,γ k ), so that P [ γ k ξ k (x 1,k δ, x 1,k + δ) ] = δ = P [ γ k ξ k (x 2,k δ, x 2,k + δ) ]. γ k (ii) δ > γ k : In this case, x 1,k = x 2,k = 0 by definition of E δ γ, and hence Together we obtain that P [ γ k ξ k (x 1,k δ, x 1,k + δ) ] = 1 = P [ γ k ξ k (x 2,k δ, x 2,k + δ) ]. J δ γ (x 1 ) = P = [ ] ξ B δ (x 1 ) = P [ γ k ξ k (x 1,k δ, x 1,k + δ) ] k=1 P [ γ k ξ k (x 2,k δ, x 2,k + δ) ] = P k=1 [ ] ξ B δ (x 2 ) = Jγ δ (x 2 )) as claimed. Since 0 E δ γ, combining Propositions 4.5 and 4.6 immediately yields that every x E δ γ maximizes J δ γ. Corollary 4.7. For δ > 0 and every x E δ γ, 4.3 modes of the probability measure J δ γ ( x) = max x X J δ γ (x) > 0. Finally, we characterize both the strong and the generalized modes of the uniform probability measure and show that these do not coincide. Theorem 4.8. Every point ˆx E 0 γ is a strong mode of µ γ. Proof. By definition of E 0 γ there is a δ > 0 such that ˆx E δ γ. Then by Corollary 4.7, and therefore M δ = max x X µ γ (B δ (x)) = µ γ (B δ ( ˆx)) µ γ (B δ ( ˆx)) µ γ (B δ ( ˆx)) δ 0 M δ = δ 0 µ γ (B δ ( ˆx)) = 1. 16
17 However, the following results shows that Definition 2.1 is too restrictive in this case and in fact can be unintuitive. Proposition 4.9. The following claims hold: (i) If for x E γ there is an m N with x m = γ m > 0, then x is not a strong mode of µ γ. (ii) There are γ, x X with x k < γ k for all k N such that x is not a strong mode of µ γ. Proof. Ad (i): First, note that for δ > 0 small enough, Moreover, for all k N and δ > 0, so that P [γ m ξ m (x m δ, x m + δ)] = 1 2 P [γ mξ m ( δ, δ)]. P [γ k ξ k (x k δ, x k + δ)] P [γ k ξ k ( δ, δ)] µ γ (B δ (x)) µ γ (B δ (0)) = k=1 P [γ k ξ k (x k δ, x k + δ)] P [γ k ξ k ( δ, δ)] for δ > 0 small enough. But by Proposition 4.5, we also have that sup δ 0 µ γ (B δ (x)) M δ = sup δ µ γ (B δ (x)) µ γ (B δ (0)) 1 2 < 1. Hence, x cannot be a strong mode. Ad (ii): We take γ k = 1 k and x k = 1 2 γ k = 1 2k for all k N. For given δ (0, 1 4 ) we also choose n N such that 1 n δ > 1 n+1, i.e., 1 2 γ n < δ γ n. Hence, as well as for n 4. In addition, Consequently, P [γ n ξ n (x n δ, x n + δ)] = γ n (x n δ) 2γ n γ n ( 1 2 γ n) 2γ n = 3 4, P [γ n ξ n ( δ, δ)] = δ γ n > n n P [γ k x k (x k δ, x k + δ)] P [γ k x k ( δ, δ)] for all k N. µ γ (B δ (x)) µ γ (B δ (0)) = k=1 P [γ k ξ k (x k δ, x k + δ)] P [γ k ξ k ( δ, δ)] P [γ nξ n (x n δ, x n + δ)] P [γ n ξ n ( δ, δ)] Proposition 4.5 now yields that sup δ 0 µ γ (B δ (x)) M δ and hence that x is not a strong mode. = sup δ = µ γ (B δ (x)) µ γ (B δ (0)) < 1. 17
18 On the other hand, every point in E γ is a generalized mode and vice versa. Theorem A point x X is a generalized mode of µ γ if and only if x E γ. Proof. Assume first that x E γ and set w δ := P δ x E δ γ for all δ > 0. Then, w δ x as δ 0 by Lemma 4.2 and µ γ (B δ (w δ )) = µ γ (B δ (0)) by Proposition 4.6, so that µ γ (B δ (w δ )) µ γ (B δ (0)) = 1 for all δ > 0. Since M δ := max x X µ γ (B δ (x)) = µ γ (B δ (0)) by Proposition 4.5, it follows from Lemma 2.4 that x is a generalized mode. Conversely, assume that x X \ E γ. Then there exists an m N with x m > γ m. Taking now δ 0 := x m γ m > 0, we have that P [γ m ξ m (x m δ, x m + δ)] = 0 for all δ (0, δ 0 ) and hence that This implies that µ γ (B δ (x)) = P [γ k ξ k (x k δ, x k + δ)] = 0 for all δ (0, δ 0 ). k=1 µ γ (B δ ( (w)) M δ = 0 for all δ 0, δ ) 0 2 and hence x cannot be a generalized mode. and w B δ 0 2 (x), 5 variational characterization of generalized map estimates In this section, we consider Bayesian inverse problems with uniform priors and characterize the corresponding generalized MAP estimates i.e., the generalized modes of the posterior distribution as minimizers of an appropriate objective functional. Let X be the separable Banach space defined by (4.1) and choose as prior µ 0 the uniform probability distribution µ γ defined by (4.3) for some non-negative sequence of weights γ k 0. We assume that for given data y from a Banach space Y the posterior distribution µ y satisfies µ y µ 0 and can be expressed as (5.1) µ y (A) = 1 Z(y) for all A B(X ), where Z(y) := X A exp( Φ(x; y))µ 0 (dx) exp( Φ(x; y))µ 0 (dx) is a positive and finite normalization constant and Φ: X Y R is the likelihood. This way, µ y constitutes a probability measure on X. Throughout this section, we fix y Y and abbreviate Φ(x; y) by Φ(x). We make the following assumption on the likelihood. 18
19 Assumption 5.1. The function Φ : X R is Lipschitz continuous on bounded sets, i.e., for every r > 0, there exists L = L r > 0 such that for all x 1, x 2 X with x 1 X, x 2 X r we have Φ(x 1 ) Φ(x 2 ) L x 1 x 2 X. In Theorem 4.10, we have seen that the indicator function ι Eγ of E γ in the sense of convex analysis, i.e., { 0, if x E γ, ι Eγ : X R := R { }ι Eγ (x) :=, otherwise. is the suitable functional to minimize in order to find the generalized modes of the prior measure µ 0. In contrast, the common Onsager Machlup functional is not defined for µ 0, as µ 0 is not quasi-invariant with respect to shifts along any direction x X \ {0}. The goal of this section is to relate a similar functional that characterizes generalized MAP estimates to a suitably generalized it of small ball probabilities. Specifically, we define the functional { Φ(x), if x E γ, (5.2) I : X R, I(x) := Φ(x) + ι Eγ (x) =, otherwise. Furthermore, for δ > 0 we denote by J δ (x) := µ y (B δ (x)) for all x X the small ball probabilites of the posterior measure. Let x δ denote a (not necessarily unique) maximizer of J δ in Eγ δ, i.e., J(x δ ) = max J δ (x). x Eγ δ What we prove below (in Theorem 5.9) is the following: (i) {x δ } δ >0 contains a convergent subsequence (because E γ is compact) and the it of the convergent subsequence minimizes I. (ii) Any minimizer of I is a generalized MAP estimate and vice versa. We first demonstrate the necessity to work with subsequences by an example which shows that the sequence {x δ } δ >0 may not have a unique it. Example 5.2. Define д(0) = 0, д(x) = n, if 1 < x 2n+1 ( ) n 1, n Z 2 and consider a probability measure µ on R with density f with respect to the Lebesgue measure, defined via { f (x) := max 1 д(x 1), 1 ( ) } x + 1 f (x) 2д, 0, f (x) := 2 f (x)dx. R 19
20 Figure 1: probability density f constructed in Example 5.2 for which the family of maximizers {x δ } δ >0 has two cluster points ˆx 1 = 1 and ˆx 2 = 1 To find the set of points x for which the probability of B δ (x) is maximal under µ for given 1 δ (0, 1), first let n N be such that < δ 1 2 n+1 2. By a case distinction, one can then verify n that [ ] ( ] δ, δ if δ 1, 2, arg max µ(b δ 2 (x)) = n+1 2 n+1 2 n+1 2 n+1 [ x R δ, n 2 δ ] ( ] if δ 2 1, n 2 n+1 2, n Hence, any family {x δ } δ >0 of maximizers x δ arg max x R µ(b δ (x)) has the two cluster points 1 and 1; see Figure 1. We begin our analysis by collecting some auxiliary results on Φ and I. Lemma 5.3. If Assumption 5.1 holds, then there exists a constant M > 0 such that (5.3) Φ(x) M for all x E γ. Proof. Since E γ is compact and therefore bounded by Proposition 4.1, there exists some R 0 such that x R for all x E γ. The Lipschitz continuity of Φ on bounded sets then implies that for any x E γ, Φ(x) Φ(x) Φ(0) + Φ(0) L x + Φ(0) LR + Φ(0) =: M. Lemma 5.4. If Assumption 5.1 holds, then there exists an x E γ such that I( x) = min x X I(x). Proof. As I(x) = for all x X \ E γ, we only need to consider I Eγ = Φ Eγ, which is Lipschitz continuous on the bounded set E γ by Assumption 5.1. Furthermore, the set E γ is nonempty and by Proposition 4.1 closed and compact. The Weierstrass Theorem therefore implies that I attains its minimum in a point x E γ. We also need the following results on the small ball probabilities of the posterior. 20
21 Lemma 5.5. Assume that Assumption 5.1 holds. Then for every δ > 0 there exists a constant c(δ) > 0 such that J δ (x) c(δ) > 0 for all x E δ γ. Proof. First, µ 0 (B δ (x)) = µ 0 (B δ (0)) > 0 for all x Eγ δ by Proposition 4.6 and Corollary 4.7. Hence we can estimate using Lemma 5.3 that J δ (x) = 1 exp( Φ(x))µ 0 (dx) 1 Z(y) Z(y) e M µ 0 (B δ (0)) =: c(δ) > 0, B δ (x) because Z(y) is positive and finite by assumption. We next show show that J δ is maximized within E δ γ by making use of the metric projection P δ : X E δ γ defined in (4.4). Lemma 5.6. For all δ > 0, we have that (5.4) J δ (P δ x) J δ (x) for all x X. Proof. We note that B δ (x) \ B δ (P δ x) X \ E γ, so that µ 0 (B δ (x) \ B δ (P δ x)) = 0. Moreover, µ 0 (B δ (P δ x) \ B δ (P δ x)) = 0 by Lemma 4.4, so that µ 0 (B δ (x) \ B δ (P δ x)) = 0 as well. With this knowledge we estimate J δ (x) = 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (x) 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (x) B δ (P δ x) = 1 exp( Φ(x))µ 0 (dx) = J δ (P δ x). Z(y) Corollary 5.7. For all δ > 0, B δ (P δ x) J(x δ ) := max J δ (x) = max J δ (x). x Eγ δ x X Proof. If x X is a maximizer of J δ, then P δ x E δ γ is a maximizer as well. The following proposition indicates that the functional I plays the role of a generalized Onsager Machlup functional for the posterior distribution µ y associated with the uniform prior µ 0 defined by (4.3). Proposition 5.8. Suppose that Assumption 5.1 holds. Let x 1, x 2 E γ and {w δ 1 } δ >0, {w δ 2 } δ >0 E γ with (i) w δ 1, wδ 2 Eδ γ for all δ > 0, (ii) w δ 1 x 1 and w δ 2 x 2 as δ 0. 21
22 Then, (5.5) δ 0 J δ (w δ 1 ) J δ (w δ 2 ) = exp(i(x 2) I(x 1 )). Proof. Let δ > 0. We consider J δ (w 1 δ ) J δ (w 2 δ ) = B δ (w1 δ ) exp( Φ(x))µ 0(dx) B δ (w2 δ ) exp( Φ(x))µ 0(dx) B = exp(φ(x 2 ) Φ(x 1 )) δ (w1 δ ) exp(φ(x 1) Φ(x))µ 0 (dx) B δ (w δ2 ) exp(φ(x 2) Φ(x))µ 0 (dx), which is well-defined by Lemma 5.5. By Assumption 5.1, ( ) Φ(x 1 ) Φ(x) L x 1 x L x 1 w1 δ + w1 δ x L ( ) x 1 w1 δ + δ for all x B δ (w δ 1 ) and Φ(x 2 ) Φ(x) L x 2 x L ( ) x 2 w2 δ + δ for all x B δ (w δ 2 ), where L is the Lipschitz constant of Φ on the bounded set x E γ B δ (x). It follows that exp(φ(x 2 ) Φ(x 1 ))e L( x 1 w1 δ + x 2 w2 δ +2δ) µ 0(B δ (w 1 δ )) µ 0 (B δ (w 2 δ )) J δ (w 1 δ ) J δ (w 2 δ ) exp(φ(x 2 ) Φ(x 1 ))e L( x 1 w δ 1 + x 2 w δ 2 +2δ) µ 0(B δ (w δ 1 )) µ 0 (B δ (w δ 2 )). Now Proposition 4.6 implies that µ 0 (B δ (w 1 δ )) = µ 0(B δ (w 2 δ )) for all δ > 0, and hence taking the it δ 0 and using (ii) together with the fact that I(x) = Φ(x) for all x E γ yields the claim. Note that Lemma 4.2 immediately implies that (5.5) also holds true for the prior distribution µ 0 if I is replaced by ι Eγ. However, it is an open question how Proposition 5.8 can be generalized to cover a wider class of distributions while sustaining the connection to generalized MAP estimates described in the following theorem, which is the main result of this section. Theorem 5.9. Suppose Assumption 5.1 holds. Then the following hold: (i) For every {δ n } n N (0, ), the sequence {x δ n } n N contains a subsequence that converges strongly in X to some x E γ. (ii) Any cluster point x E γ of {x δ n } n N is a minimizer of I. 22
23 (iii) A point ˆx X is a minimizer of I if and only if it is a generalized MAP estimate for µ y. Proof. Ad (i): By definition, x δ E δ γ E γ for all δ > 0. Since E γ is compact and closed by Proposition 4.1, there exists a convergent subsequence, again denoted by {x δ n } n N, with it x E γ. Ad (ii): Let now x E γ be the it of an arbitrary convergent subsequence still denoted by {x δ n } n N and assume that x is not a minimizer of I. By Lemma 5.4, a minimizer x E γ of I exists and by assumption satisfies I( x) > I(x ). Moreover, P δ x x as δ 0 by Lemma 4.2. Now the definition of x δ and Proposition 5.8 yield µ y (B δ n (x δ n )) 1 n µ y (B δ n (P δ nx )) = exp(i(x ) I( x)) < 1, a contradiction. So x is in fact a minimizer of I. Ad (iii): Now let x X be a minimizer of I which by definition satisfies x E γ. To see that x is a generalized MAP estimate, we choose w δ := P δ x E δ γ for every δ > 0, which implies that w δ x δ. Furthermore, by definition of x δ and M δ, µ y (B δ (x δ )) = max x X µy (B δ (x)) = M δ for all δ > 0. By Assumption 5.1 and the boundedness of x E γ B δ (x), we can use the Lipschitz continuity of Φ to obtain the estimate ) Φ(x) Φ(x ) + L ( x w δ + w δ x Φ(x ) + 2Lδ for all x B δ (w δ ). On the other hand, a similar argument shows that Consequently, and Φ(x) Φ(x δ ) L x x δ Φ(x δ ) Lδ for all x B δ (x δ ). µ y (B δ (w δ )) = 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (w δ ) exp( Φ(x 1 ) 2Lδ) Z(y) µ 0(B δ (w δ )), M δ = µ y (B δ (x δ )) = 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (x δ ) exp( Φ(x δ 1 ) + Lδ) Z(y) µ 0(B δ (x δ )). As µ 0 (B δ (w δ ) = µ 0 (B δ (x δ )) by Proposition 4.6, combining these two estimates yields µ y (B δ (w δ )) µ y (B δ (x δ )) exp(φ(xδ ) Φ(x ) 3Lδ). Now the minimizing property of x implies that µ y (B δ (w δ )) δ 0 µ y (B δ (x δ )) exp( 3Lδ) = 1 δ 0 23
24 and hence that x is a generalized MAP estimate. Conversely, let ˆx be a generalized MAP estimate and {δ n } (0, ) such that δ n 0 with corresponding qualifying sequence {w n } n N X, i.e., w n ˆx and µ y (B δ n (w n )) n M δ = 1. n From Lemma 4.2, it follows that P δ n w n ˆx as well. We then have ˆx E γ, because otherwise the closedness of E γ by Proposition 4.1 would imply that µ y (w n ) = 0 for n large enough. Also, by definition of x δ, µ y (B δ (x δ )) = max x X µy (B δ (x)) = M δ for all δ > 0. Now by (i) and (ii), we may extract a subsequence, again denoted by {x δ n } n N, such that x δ n x E γ and x is a minimizer of I. Proposition 5.8 and Lemma 5.6 then yield that µ y (B δ n (P δ n w n )) µ y (B δ n (w n )) exp(i( x) I( ˆx)) = n µ y (B δ n (x δ n )) n µ y (B δ = 1. n (x δ n )) Hence, I( x) I( ˆx), i.e., ˆx is a minimizer of I. Remark Theorem 5.9 shows that for inverse problems subject to Gaussian noise as in the following section, the generalized MAP estimate coincides with Ivanov regularization with the specific choice of the compact set E γ ; compare (6.1) below with, e.g., [14, 23, 28]. Hence, Ivanov regularization can be considered as a non-parametric MAP estimate for a suitable choice of the compact set the solution is restricted to. Furthermore, we point out that for linear inverse problems where the Ivanov functional is convex, the minimizers generically lie on the boundary of the compact set; see, e.g., [28, Prop. 2.2 (iii)] and [6, Cor. 2.6]. However, this is not possible for strong modes due to Proposition 4.9 (i), which further indicates the need to consider generalized modes in this setting. 6 consistency in nonlinear inverse problems with gaussian noise Finally, we show consistency in the small noise it of the generalized MAP estimate from Section 6 for the sepcial case of finite-dimensional data and additive Gaussian noise. Let Y = R K for a fixed K N, endowed with the Euclidean norm, and F : X Y be a closed nonlinear operator. Assuming x µ 0 for the prior measure µ 0 defined via (4.3) and y Y is a corresponding measurement of F(x) corrupted by additive Gaussian noise with mean 0 and positive definite covariance operator Σ R K K scaled by δ > 0, the corresponding posterior measure µ y is given by dµ y dµ 0 (x) = 1 Z(y) exp( Φ(x; y)) for µ 0-almost all x X, where Φ(x; y) := 1 2δ 2 Σ 1 2 (F(x) y) 2 Y 24
25 for all x X and y Y. Furthermore, we assume that for every R > 0, the restriction of F to B R (0) is Lipschitz continuous, so that Assumption 5.1 is satisfied. Then Theorem 5.9 yields that the generalized MAP estimates for µ y are given by the minimizers of the functional (6.1) I(x) = 1 2δ 2 Σ 1 2 (F(x) y) 2 Y + ι Eγ (x). Now let {δ n } n N (0, ) with δ n 0 and consider a frequentist setup where we have a true solution x E γ and a sequence {y n } n N Y of measurements given by y n = F(x ) + δ n η n, where η n N(0, Σ) are independently and identically distributed. Moreover, define (6.2) I n (x) := 1 2δn 2 Σ 2 1 (F(x) yn ) Y 2 + ι E γ (x) for all n N and x X. Let x n E γ denote a minimizer of I n for all n N. Then we have the following consistency result. Theorem 6.1. Suppose that F: X Y is closed and x E γ. Then any sequence {x n } n N of minimizers of (6.2) contains a convergent subsequence whose it x E γ satisfies F( x) = F(x ) almost surely. Proof. Let e k (x) := x k for all x X and k N. As x n E γ for all n N, the sequence {x n } n N is almost surely bounded with e k, x n X = [x n ] k γ k almost surely for all k N. We can thus extract for every k N a subsequence with E [ e k, x n X ] xk and x k γ k. Here and in the following, we pass to the subsequence without adapting the notation. By a diagonal sequence argument, we can thus construct a sequence, again denoted by {x n } n N, satisfying E [ e k, x n X ] xk for all k N and x := k N x k e k E γ. We now write for every v X l 1 and N N E [ v, x n x X ] = [ E k=1 v, e k X e k, x ] n x X N v, e k X [ E e k, x ] n x X + k=1 v X k=n +1 v, e k X 2γ k sup [ E e k, x ] n x X + 2 v X sup γ k. k {1,..., N } k N +1 Since γ k, we can for every ε > 0 choose N large enough such that 2 v X sup γ k ε k N +1 2, 25
26 and then choose n 0 large enough such that also v X E [ e k, x n x X ] ε 2 since E [ e k, x n X ] e k, x X. Consequently, for all n n 0 and k {1,..., N } E [ v, x n x X ] ε for all n n 0, i.e., E [ v, x n x X ] 0. From this we conclude that v, x n x X 0 in probability as n for all v X. This implies the existence of a subsequence that converges weakly to x almost surely. Therefore, by compactness of E γ, a further subsequence converges strongly to x almost surely. Furthermore, inserting the definition of y n yields that I n (x) = 1 2δn 2 Σ 2 1 (F(x ) F(x) + δ n η n ) Y 2 = 1 2δn 2 Σ 2 1 (F(x ) F(x)) Y Σ 2 1 (F(x ) F(x)), Σ 2 1 ηj Y + 1 δ n 2 n Σ 2 1 ηj Y 2 for all x E γ. Note that the first two terms vanish for x = x E γ and that I n (x n ) I n (x ) by definition of x n. Rearranging this inequality and using Cauchy s and Young s inequalities, we thus obtain that Σ 1 2 (F(x ) F(x n )) 2 Y 2δ n Σ 1 2 (F(x ) F(x n )), Σ 1 2 ηj Y 2δ n Σ 1 2 (F(x ) F(x n )) Y Σ 1 2 ηj Y j=1 1 2 Σ 1 2 (F(x ) F(x n )) 2 Y + 2δ 2 n Σ 1 2 ηj 2 Y. Consequently, and hence E Σ 1 2 (F(xn ) F(x )) 2 Y 4δ 2 n Σ 1 2 ηj 2 Y, [ ] [ ] Σ 2 1 (F(xn ) F(x )) Y 2 4δnE 2 Σ 2 1 ηj Y 2 = 4Kδn 2 0, which also implies that Σ 1 2 (F(xn ) F(x )) Y 0 in probability. After passing to a further subsequence, we thus obtain that F(x n ) F(x ) almost surely. The closedness of F then yields that F( x) = F(x ) almost surely. Note that although Theorem 6.1 does not require the Lipschitz continuity of F on bounded sets, in general without it we do not know if the minimizers of I n exist or if they are generalized MAP estimates for µ y. By a standard subsequence-subsequence argument, we can obtain convergence of the full sequence under the usual assumption on F. 26
27 Corollary 6.2. If F is injective, then x n x in probability as n. Proof. We can apply the proof of Theorem 6.1 to every subsequence of {x n } n N to obtain a further subsequence that converges to x = x almost surely. This in turn is equivalent to the convergence of the whole sequence to x in probability by [18, Cor. 6.13]. Remark 6.3. Consistency in the large sample size it can be shown analogously. Specifically, assuming a fixed noise level δ > 0 (which without loss of generality we can fix at δ = 1) and n independent measurements y 1,..., y n Y, the posterior measure µ n on X has the density dµ n dµ 0 (x) := ( n j=1 ) 1 Z(y j ) exp ( 1 2 ) n Σ 2 1 (F(x) yj ) Y 2 Again we consider a frequentist setup with a true solution x E γ and j=1 y j = F(x ) + η j for all j N. for µ 0 -almost all x X Then it can be shown as in Theorem 6.1 that any sequence of minimizers x n E γ of I n (x) := 1 2 n Σ 2 1 (F(x) yj ) Y 2 + ι E γ (x) j=1 contains a convergent subsequence whose it x E γ satisfies F( x) = F(x ) almost surely and that the full sequence converges to x in probability if F is injective. 7 conclusion We have proposed a novel definition of generalized modes and corresponding MAP estimates for non-parametric Bayesian statistics. Our approach extends the construction of [8] by replacing the fixed base point with a qualifying sequence in the convergence of small ball probabilities. This allows covering cases where the previous approach fails, such as that of priors having discontinuous densities. Our definition coincides with the definition from [8] for Gaussian priors as well as for priors which admit a qualifying sequence satisfying additional convergence properties. For uniform priors defined via a random series with bounded coefficients, such generalized modes can be shown to exist. Furthermore, the corresponding generalized MAP estimates in Bayesian inverse problems (i.e., generalized modes of the posterior distribution) can be characterized as minimizers of a functional that can be seen as a generalization of the Onsager Machlup functional in the Gaussian case. For inverse problems with finite-dimensional Gaussian noise, the generalized MAP estimates are consistent in the small noise as well as in the large sample it. This work can be extended in several directions. The first question is whether qualifying sequences can be constructed from MAP estimates for a sequence of finite-dimensional problems, i.e., whether our generalized MAP estimates arise as Γ-its of finite-dimensional MAP estimates. Also, it would be of interest to study Gaussian priors with non-negativity constraints, where 27
Bayesian inverse problems with Laplacian noise
Bayesian inverse problems with Laplacian noise Remo Kretschmann Faculty of Mathematics, University of Duisburg-Essen Applied Inverse Problems 2017, M27 Hangzhou, 1 June 2017 1 / 33 Outline 1 Inverse heat
More informationl(y j ) = 0 for all y j (1)
Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that
More informationLecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University
Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................
More informationFunctional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...
Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................
More informationMATH 4200 HW: PROBLEM SET FOUR: METRIC SPACES
MATH 4200 HW: PROBLEM SET FOUR: METRIC SPACES PETE L. CLARK 4. Metric Spaces (no more lulz) Directions: This week, please solve any seven problems. Next week, please solve seven more. Starred parts of
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More informationON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION
ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION CHRISTIAN GÜNTHER AND CHRISTIANE TAMMER Abstract. In this paper, we consider multi-objective optimization problems involving not necessarily
More informationTHEOREMS, ETC., FOR MATH 515
THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every
More informationLECTURE 15: COMPLETENESS AND CONVEXITY
LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other
More informationReal Analysis Notes. Thomas Goller
Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................
More informationOverview of normed linear spaces
20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural
More informationProof. We indicate by α, β (finite or not) the end-points of I and call
C.6 Continuous functions Pag. 111 Proof of Corollary 4.25 Corollary 4.25 Let f be continuous on the interval I and suppose it admits non-zero its (finite or infinite) that are different in sign for x tending
More informationMAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9
MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended
More informationIntroduction to Real Analysis Alternative Chapter 1
Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces
More informationAnalysis Finite and Infinite Sets The Real Numbers The Cantor Set
Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered
More informationarxiv: v2 [math.st] 23 May 2017
SPARSITY-PROMOTING AND EDGE-PRESERVING MAXIMUM A POSTERIORI ESTIMATORS IN NON-PARAMETRIC BAYESIAN INVERSE PROBLEMS arxiv:705.03286v2 [math.st] 23 May 207 By Sergios Agapiou Martin Burger Masoumeh Dashti
More informationg 2 (x) (1/3)M 1 = (1/3)(2/3)M.
COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is
More informationContinuity of convex functions in normed spaces
Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationMidterm 1. Every element of the set of functions is continuous
Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions
More informationChapter 2 Metric Spaces
Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics
More informationSet, functions and Euclidean space. Seungjin Han
Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,
More information1. Let A R be a nonempty set that is bounded from above, and let a be the least upper bound of A. Show that there exists a sequence {a n } n N
Applied Analysis prelim July 15, 216, with solutions Solve 4 of the problems 1-5 and 2 of the problems 6-8. We will only grade the first 4 problems attempted from1-5 and the first 2 attempted from problems
More information1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3
Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,
More informationBest approximations in normed vector spaces
Best approximations in normed vector spaces Mike de Vries 5699703 a thesis submitted to the Department of Mathematics at Utrecht University in partial fulfillment of the requirements for the degree of
More informationPart III. 10 Topological Space Basics. Topological Spaces
Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.
More informationMetric Spaces and Topology
Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies
More information3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable
More informationNotions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy
Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.
More informationContinuous Functions on Metric Spaces
Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0
More informationCONTENTS. 4 Hausdorff Measure Introduction The Cantor Set Rectifiable Curves Cantor Set-Like Objects...
Contents 1 Functional Analysis 1 1.1 Hilbert Spaces................................... 1 1.1.1 Spectral Theorem............................. 4 1.2 Normed Vector Spaces.............................. 7 1.2.1
More information1 Stochastic Dynamic Programming
1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future
More informationTopological properties
CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological
More informationThe Arzelà-Ascoli Theorem
John Nachbar Washington University March 27, 2016 The Arzelà-Ascoli Theorem The Arzelà-Ascoli Theorem gives sufficient conditions for compactness in certain function spaces. Among other things, it helps
More informationIntroduction to Functional Analysis
Introduction to Functional Analysis Carnegie Mellon University, 21-640, Spring 2014 Acknowledgements These notes are based on the lecture course given by Irene Fonseca but may differ from the exact lecture
More informationMcGill University Math 354: Honors Analysis 3
Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The
More informationMAA6617 COURSE NOTES SPRING 2014
MAA6617 COURSE NOTES SPRING 2014 19. Normed vector spaces Let X be a vector space over a field K (in this course we always have either K = R or K = C). Definition 19.1. A norm on X is a function : X K
More informationFunctional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32
Functional Analysis Martin Brokate Contents 1 Normed Spaces 2 2 Hilbert Spaces 2 3 The Principle of Uniform Boundedness 32 4 Extension, Reflexivity, Separation 37 5 Compact subsets of C and L p 46 6 Weak
More informationContents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.
Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological
More information(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε
1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional
More informationRobust error estimates for regularization and discretization of bang-bang control problems
Robust error estimates for regularization and discretization of bang-bang control problems Daniel Wachsmuth September 2, 205 Abstract We investigate the simultaneous regularization and discretization of
More informationarxiv: v1 [math.st] 26 Mar 2012
POSTERIOR CONSISTENCY OF THE BAYESIAN APPROACH TO LINEAR ILL-POSED INVERSE PROBLEMS arxiv:103.5753v1 [math.st] 6 Mar 01 By Sergios Agapiou Stig Larsson and Andrew M. Stuart University of Warwick and Chalmers
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationConvex Optimization Notes
Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =
More informationProblem Set 2: Solutions Math 201A: Fall 2016
Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that
More informationYour first day at work MATH 806 (Fall 2015)
Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies
More information2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?
MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is
More informationFUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets
FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES CHRISTOPHER HEIL 1. Compact Sets Definition 1.1 (Compact and Totally Bounded Sets). Let X be a metric space, and let E X be
More informationSome Background Material
Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important
More informationReal Analysis Problems
Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.
More informationChapter 1. Introduction
Chapter 1 Introduction Functional analysis can be seen as a natural extension of the real analysis to more general spaces. As an example we can think at the Heine - Borel theorem (closed and bounded is
More informationContents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3
Analysis Math Notes Study Guide Real Analysis Contents Ordered Fields 2 Ordered sets and fields 2 Construction of the Reals 1: Dedekind Cuts 2 Metric Spaces 3 Metric Spaces 3 Definitions 4 Separability
More informationMeasurable functions are approximately nice, even if look terrible.
Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............
More informationMeasure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond
Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................
More information16 1 Basic Facts from Functional Analysis and Banach Lattices
16 1 Basic Facts from Functional Analysis and Banach Lattices 1.2.3 Banach Steinhaus Theorem Another fundamental theorem of functional analysis is the Banach Steinhaus theorem, or the Uniform Boundedness
More information2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.
University of Bergen General Functional Analysis Problems with solutions 6 ) Prove that is unique in any normed space. Solution of ) Let us suppose that there are 2 zeros and 2. Then = + 2 = 2 + = 2. 2)
More informationFunctional Analysis F3/F4/NVP (2005) Homework assignment 3
Functional Analysis F3/F4/NVP (005 Homework assignment 3 All students should solve the following problems: 1. Section 4.8: Problem 8.. Section 4.9: Problem 4. 3. Let T : l l be the operator defined by
More informationCourse 212: Academic Year Section 1: Metric Spaces
Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........
More information2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.
Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following
More information3 Integration and Expectation
3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ
More informationMeasure and Integration: Solutions of CW2
Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost
More informationON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS
Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)
More information4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan
The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan Wir müssen wissen, wir werden wissen. David Hilbert We now continue to study a special class of Banach spaces,
More informationTopology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:
Topology Xiaolong Han Department of Mathematics, California State University, Northridge, CA 91330, USA E-mail address: Xiaolong.Han@csun.edu Remark. You are entitled to a reward of 1 point toward a homework
More informationAnalysis Comprehensive Exam Questions Fall 2008
Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)
More information08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms
(February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops
More information1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),
Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer
More informationImmerse Metric Space Homework
Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps
More informationEconomics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011
Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure
More informationProblem 1: Compactness (12 points, 2 points each)
Final exam Selected Solutions APPM 5440 Fall 2014 Applied Analysis Date: Tuesday, Dec. 15 2014, 10:30 AM to 1 PM You may assume all vector spaces are over the real field unless otherwise specified. Your
More informationANALYSIS QUALIFYING EXAM FALL 2017: SOLUTIONS. 1 cos(nx) lim. n 2 x 2. g n (x) = 1 cos(nx) n 2 x 2. x 2.
ANALYSIS QUALIFYING EXAM FALL 27: SOLUTIONS Problem. Determine, with justification, the it cos(nx) n 2 x 2 dx. Solution. For an integer n >, define g n : (, ) R by Also define g : (, ) R by g(x) = g n
More informationLocally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem
56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi
More informationChapter 3: Baire category and open mapping theorems
MA3421 2016 17 Chapter 3: Baire category and open mapping theorems A number of the major results rely on completeness via the Baire category theorem. 3.1 The Baire category theorem 3.1.1 Definition. A
More information7 Complete metric spaces and function spaces
7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m
More information( f ^ M _ M 0 )dµ (5.1)
47 5. LEBESGUE INTEGRAL: GENERAL CASE Although the Lebesgue integral defined in the previous chapter is in many ways much better behaved than the Riemann integral, it shares its restriction to bounded
More informationIf Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.
20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this
More informationWeek 5 Lectures 13-15
Week 5 Lectures 13-15 Lecture 13 Definition 29 Let Y be a subset X. A subset A Y is open in Y if there exists an open set U in X such that A = U Y. It is not difficult to show that the collection of all
More informationExamples of Dual Spaces from Measure Theory
Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an
More informationL p Spaces and Convexity
L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function
More informationNormed and Banach Spaces
(August 30, 2005) Normed and Banach Spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ We have seen that many interesting spaces of functions have natural structures of Banach spaces:
More informationANALYSIS QUALIFYING EXAM FALL 2016: SOLUTIONS. = lim. F n
ANALYSIS QUALIFYING EXAM FALL 206: SOLUTIONS Problem. Let m be Lebesgue measure on R. For a subset E R and r (0, ), define E r = { x R: dist(x, E) < r}. Let E R be compact. Prove that m(e) = lim m(e /n).
More informationFrom now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.
Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x
More informationFUNDAMENTALS OF REAL ANALYSIS by. II.1. Prelude. Recall that the Riemann integral of a real-valued function f on an interval [a, b] is defined as
FUNDAMENTALS OF REAL ANALYSIS by Doğan Çömez II. MEASURES AND MEASURE SPACES II.1. Prelude Recall that the Riemann integral of a real-valued function f on an interval [a, b] is defined as b n f(xdx :=
More informationII - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define
1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1
More informationThe small ball property in Banach spaces (quantitative results)
The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence
More informationA LITTLE REAL ANALYSIS AND TOPOLOGY
A LITTLE REAL ANALYSIS AND TOPOLOGY 1. NOTATION Before we begin some notational definitions are useful. (1) Z = {, 3, 2, 1, 0, 1, 2, 3, }is the set of integers. (2) Q = { a b : aεz, bεz {0}} is the set
More informationMeasures. Chapter Some prerequisites. 1.2 Introduction
Lecture notes Course Analysis for PhD students Uppsala University, Spring 2018 Rostyslav Kozhan Chapter 1 Measures 1.1 Some prerequisites I will follow closely the textbook Real analysis: Modern Techniques
More informationExercise 1. Let f be a nonnegative measurable function. Show that. where ϕ is taken over all simple functions with ϕ f. k 1.
Real Variables, Fall 2014 Problem set 3 Solution suggestions xercise 1. Let f be a nonnegative measurable function. Show that f = sup ϕ, where ϕ is taken over all simple functions with ϕ f. For each n
More informationTools from Lebesgue integration
Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given
More informationB. Appendix B. Topological vector spaces
B.1 B. Appendix B. Topological vector spaces B.1. Fréchet spaces. In this appendix we go through the definition of Fréchet spaces and their inductive limits, such as they are used for definitions of function
More informationMathematics for Economists
Mathematics for Economists Victor Filipe Sao Paulo School of Economics FGV Metric Spaces: Basic Definitions Victor Filipe (EESP/FGV) Mathematics for Economists Jan.-Feb. 2017 1 / 34 Definitions and Examples
More informationPARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION
PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION ALESSIO FIGALLI AND YOUNG-HEON KIM Abstract. Given Ω, Λ R n two bounded open sets, and f and g two probability densities concentrated
More informationThe Heine-Borel and Arzela-Ascoli Theorems
The Heine-Borel and Arzela-Ascoli Theorems David Jekel October 29, 2016 This paper explains two important results about compactness, the Heine- Borel theorem and the Arzela-Ascoli theorem. We prove them
More information01. Review of metric spaces and point-set topology. 1. Euclidean spaces
(October 3, 017) 01. Review of metric spaces and point-set topology Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 017-18/01
More informationEmpirical Processes: General Weak Convergence Theory
Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated
More informationContinuity. Matt Rosenzweig
Continuity Matt Rosenzweig Contents 1 Continuity 1 1.1 Rudin Chapter 4 Exercises........................................ 1 1.1.1 Exercise 1............................................. 1 1.1.2 Exercise
More informationExtreme points of compact convex sets
Extreme points of compact convex sets In this chapter, we are going to show that compact convex sets are determined by a proper subset, the set of its extreme points. Let us start with the main definition.
More informationSpectral theory for compact operators on Banach spaces
68 Chapter 9 Spectral theory for compact operators on Banach spaces Recall that a subset S of a metric space X is precompact if its closure is compact, or equivalently every sequence contains a Cauchy
More information2 Sequences, Continuity, and Limits
2 Sequences, Continuity, and Limits In this chapter, we introduce the fundamental notions of continuity and limit of a real-valued function of two variables. As in ACICARA, the definitions as well as proofs
More informationJUHA KINNUNEN. Harmonic Analysis
JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes
More information