generalized modes in bayesian inverse problems

Size: px
Start display at page:

Download "generalized modes in bayesian inverse problems"

Transcription

1 generalized modes in bayesian inverse problems Christian Clason Tapio Helin Remo Kretschmann Petteri Piiroinen June 1, 2018 Abstract This work is concerned with non-parametric modes and MAP estimates for priors that do not admit continuous densities, for which previous approaches based on small ball probabilities fail. We propose a novel definition of generalized modes based on the concept of qualifying sequences, which reduce to the classical mode in certain situations that include Gaussian priors but also exist for a more general class of priors. The latter includes the case of priors that impose strict bounds on the admissible parameters and in particular of uniform priors. For uniform priors defined by random series with uniformly distributed coefficients, we show that the generalized modes but not the classical modes as well as the corresponding generalized MAP estimates can be characterized as minimizers of a suitable functional. This is then used to show consistency of nonlinear Bayesian inverse problems with uniform priors and Gaussian noise. 1 introduction The Bayesian approach to inverse problems is attractive as it provides direct means to quantify uncertainty in the solution. Developments in modern sampling methodologies and increases in computational resources have made the Bayesian approach feasible in various large-scale inverse problems, and the approach has gained wide attention in recent years; we refer to the early work [11], the books [15, 31], and the more recent works [9, 20, 30] as well as to the references therein. To motivate our contribution, we recall that the Bayesian approach produces statistical inference based on the likelihood and prior probability distributions of the system, and that all expert information independent of the observation enter the analysis through the prior model. Given some prior distribution µ 0 of the probable values of x, the Bayesian inverse problem is then to quantify the uncertainty in x given some noisy observation y = y δ. It is well-known, e.g., from [30] that under quite general conditions, the posterior µ post is absolutely continuous with respect to the prior and is given by the Bayes formula (1.1) dµ post dµ 0 (x y) = exp( Φ(x; y)), Faculty of Mathematics, University Duisburg-Essen, Thea-Leymann-Strasse 9, Essen, Germany (christian.clason@uni-due.de, remo.kretschmann@uni-due.de) Department of Mathematics and Statistics, University of Helsinki, Gustaf Hallströmin katu 2b, FI Helsinki, Finland (tapio.helin@helsinki.fi,petteri.piiroinen@helsinki.fi) 1

2 where Φ is the negative log-likelihood. Due to the frequent high dimensionality of x in inverse problems, there is a fundamental need to understand whether the problem scales well to the infinite-dimensional case and whether the related estimators have well-behaving its; this leads to the study of non-parametric Bayesian inverse problems. Due to the computational effort required by Markov chain Monte Carlo (MCMC) algorithms in large-scale problems, there has been interest to study how approximate Bayesian methods (see, e.g., [7, 24 26, 29]) can be used effectively in inverse problems. A central estimator used in this research effort is the maximum a posteriori (MAP) estimate, which can be formulated in a finite-dimensional setting as an optimization problem (1.2) x MAP = arg min {Φ(x; y) + R(x)}, x where R is the negative log-prior density. As an optimization problem, (1.2) is relatively fast to solve and is widely used in applications as the basis of, e.g., Laplace approximation [32]. The study of non-parametric MAP estimates or more generally, non-parametric modes was only recently initiated by Dashti et al. in [8] in the context of nonlinear Bayesian inverse problems with Gaussian prior and noise distribution. In the Gaussian case, one has an explicit relation between the right-hand side functional in (1.2) and the Onsager Machlup functional defined via the it of small ball probabilities, which can be used to give a statistical justification for the objective functional that is consistent with the finite-dimensional definition [8]. The difficulty related to non-parametric MAP theory is that the posterior does not have a natural density representation as in the finite-dimensional case, and consequently, the MAP estimate does not have a clear correspondence to an optimization problem such as (1.2). More recently, there have been a series of results [1, 10, 12] that extend the definition and scope of non-parametric MAP estimation. For example, weak MAP estimates were proposed and studied in [12] in the context of linear Bayesian inverse problems with a general class of priors. The authors used the tools from the differentiation and quasi-invariance theory of measures (developed by Fomin and Skorokhod and discussed in detail in [4]) to connect the zero points of the logarithmic derivative of a measure to the minimizers of the Onsager Machlup functional. This program of generalization is motivated by the fact that in practice, due to the ill-posedness and often high dimensionality of the problem, the prior plays a key role in successful stabilization [30] and posterior modeling for the inverse problem. Therefore, the prior needs to be carefully designed to reflect the best possible subjective information available. Here, we only mention priors that reflect the sparsity [19, 21], hierarchical structure [5, 33], or anisotropic features [16] of the unknown. However, in the non-parametric case, all results regarding the MAP estimate known to the authors require continuity of the prior; but even simple one-dimensional discontinuous density can fail to have a mode in terms of the definition given in [8] (see Example 2.2). This is iting, since suitable prior modeling may involve imposing strict bounds on the admissible values of x emerging from some fundamental properties of the application. As a simple example, consider, e.g., the classical inverse problem of X-ray medical imaging [27]. Since it is reasonable to assume that there are no radiation sources inside the patient, the attenuation of X-rays is positive throughout the body. A related situation occurs in electrical impedance tomography and more generally, in parameter estimation problems for partial differential 2

3 equations where the pointwise upper and lower bounds need to be imposed on the parameter to ensure well-posedness of the forward problem; reasonable bounds are often available from a priori information on, e.g., the kinds of tissue or material expected in the region of interest. Furthermore, it is known in the deterministic theory that restriction of the unknown to a compact set can stabilize an inverse problem without additional (possibly undesired) regularization terms; this is sometimes referred to as quasi-solution or Ivanov regularization [13, 14, 23, 28], and the use of pointwise bounds for this purpose has recently been studied in [6, 17]. In the Bayesian approach, this would correspond to priors with densities whose mass is contained in a compact set. Together, this motivates the study of imposing such hard bounds as part of the prior in Bayesian inverse problems. The main contribution of our work in this direction is two-fold: First, we introduce a novel definition of generalized modes for probability measures and characterize conditions under which our definition coincides with the previous definition of modes given in [8]. More precisely, we show that if the Radon Nikodym derivative dµ( h)/dµ of the translated measure µ has certain equicontinuity properties (described in Section 3.1) over a dense set of translations h, any generalized mode is also a mode in the sense of [8]. In particular, we demonstrate that our definition for generalized modes does not introduce pathological modes in the case of continuous densities and that strong and generalized modes coincide for Gaussian measures. Second, we consider uniform priors defined via the random series (1.3) ξ = γ k ξ k ϕ k, k=1 where ξ k U[ 1, 1] are uniformly distributed and γ k are suitable weights; such priors clearly have a discontinuous density. We further show that the uniform prior (1.3) does not have any mode that touches the bounds (i.e., where ξ k { 1, 1} for some k N) and, therefore, it is not guaranteed that the posterior would have any mode. However, we prove that such points are in fact generalized modes of the prior. Furthermore, we show that for the uniform prior defined via (1.3), the definition of the generalized mode characterizes in a natural manner the MAP estimates of (1.1) as minimizers of (1.2). We also provide a weak consistency result regarding the MAP estimates in line of the previous work [1, 8]. This paper is organized as follows. In Section 2, we give our definition of generalized modes and illustrate the definition for the examples of a measure with discontinuous density (which does not admit a strong mode) and of Gaussian measures (where generalized and strong modes coincide). A general investigation of conditions for the generalized and strong modes to coincide is carried out in Section 3. We next construct uniform priors on an infinite-dimensional Banach space and characterize its generalized modes in Section 4. In Section 5, we study Bayesian inverse problems with such priors and derive a variational characterization of generalized modes that play the role of a generalized Onsager Machlup functional. This is used in Section 6 to show consistency of nonlinear Bayesian inverse problems with uniform priors and Gaussian noise. 3

4 2 generalized modes Let X be a separable Banach space and µ be a probability measure on X. Throughout, let B δ (x) X denote the open ball around x X with radius δ, and set M δ := sup µ(b δ (x)) x X for δ > 0. We first recall the definition of a mode introduced in [8]. Definition 2.1. A point ˆx X is called a (strong) mode of µ if µ(b δ ( ˆx)) δ 0 M δ = 1. The modes of the posterior measure µ y are called maximum a posteriori (MAP) estimates. It is straightforward to construct a probability distribution with discontinuous Lebesgue density which does not have a mode according to Definition 2.1. Example 2.2. Let µ be a probability measure on R with density p with respect to the Lebesgue measure, defined via { 1 x if x [0, 1], p(x) p(x) := p(x) := 0 otherwise, R p(x)dx. Clearly, ˆx = 0 maximizes p; however, this is not a strong mode. First, note that for every δ > 0, µ(b δ (δ)) = sup µ(b δ (x)) = M δ. x R Hence of all balls with radius δ, the one around x δ := δ has the highest probability. However, although x δ 0 for δ we have for δ small enough that and hence Definition 2.1 is not satisfied. µ(b δ (0)) δ 0 µ(b δ (δ)) = δ(1 1 2 δ) δ 0 2δ(1 δ) = 1 2 < 1 The example above illustrates that as density functions are considered in more generality (e.g., as elements in L 1 (R)), the concept of maximum point should be chosen carefully. This line of thought motivates the following definition. Definition 2.3. A point ˆx X is called a generalized mode of µ if for every sequence {δ n } n N (0, ) with δ n 0 there exists a qualifying sequence {w n } n N X with w n ˆx in X and (w n )) (2.1) n M δ = 1. n We call generalized modes of the posterior measure µ y generalized MAP estimates. 4

5 Note that by Definition 2.3, every strong mode ˆx X is also a generalized mode with the qualifying sequence w n := ˆx. Furthermore, in Example 2.2, ˆx = 0 is a generalized mode with the qualifying sequence w n := δ n for any positive sequence {δ n } n N with δ n 0. In fact, we can use the following stronger condition to show that a point is a generalized mode. Lemma 2.4. Let ˆx X. If there is a family {w δ } δ >0 X such that w δ ˆx in X as δ 0 and then ˆx is a generalized mode of µ. µ(b δ (w δ )) δ 0 M δ = 1, In Example 2.2, this condition is obviously satisfied for ˆx = 0 and w δ := δ. Remark 2.5. Definition 2.3 can be further generalized by allowing for weakly converging qualifying sequences. Conversely, we can restrict Definition 2.3 by coupling the convergence of w n to that of δ n (cf. Lemma 2.4). As we will show below, our definition has the advantage of not introducing pathological modes in the case of continuous densities (such as Gaussian or Besov distributions) while for the important case of a uniform prior, it introduces modes that are natural in terms of the variational characterization (1.2). We now illustrate some of the key ideas of the generalized mode by considering the case of a Gaussian measure µ on X. We assume that µ is centered and note that the results below trivially generalize to the non-centered case. We will require the following quantitative estimate for Gaussian ball probabilities. Lemma 2.6 ([8, Lem. 3.6]). Let x X and δ > 0 be given. Then there exists a constant a 1 > 0 independent of x and δ such that µ(b δ (x)) µ(b δ (0)) e a1 2 δ 2 e a 1 2 ( x X δ ) 2. Now we can show that for a centered Gaussian measure µ, Definitions 2.1 and 2.3 do indeed coincide. Theorem 2.7. The origin 0 X is both a strong mode and a generalized mode of µ, whereas all x X \ {0} are neither a strong mode nor a generalized mode. Proof. First, by Anderson s inequality (see, e.g., [3, Thm ]) we have that µ(b δ (x)) µ(b δ (0)) for all x X and δ > 0 and hence that M δ = µ(b δ (0)) for all δ > 0. This immediately yields that ˆx = 0 is a strong mode and hence also a generalized mode for µ. Now assume that x X \ {0} is a generalized mode of µ. Let {δ n } n N (0, ) with δ 0 be arbitrary and let {w n } n N X be the respective qualifying sequence. Then, Lemma 2.6 yields µ(b δ (w)) µ(b δ (0)) a 1 e 2 w X ( w X 2δ ) e a 1 2 ( 4 3 x X )( 4 1 x X ) = e 3a 1 32 x 2 X =: A < 1 5

6 for all δ (0, δ 0 ] and w B δ 0 (x) with δ 0 := 1 4 x X. We choose n 0 N large enough such that w n x X < δ 0 and δ n < δ 0 for all n n 0. Now taking the it in the above yields (w n )) (w n )) n M δ = n n µ(b δ A < 1, n (0)) which is a contradiction. Hence, x is not a generalized mode and thus cannot be a strong mode, either. The explicit bound in Lemma 2.6 plays an important role in Theorem 2.7. Compare this to the alternative approach using the Onsager Machlup functional I(x) = 1 2 x 2 E, defined as satisfying µ(b δ ( (x 1 )) 1 δ 0 µ(b δ (x 2 )) = exp 2 x 2 E 2 1 ) 2 x 1 E 2, which holds for all x 1, x 2 from the Cameron Martin space E X of µ by [22, Prop. 18.3]. Although this relation can be used to show that 0 is the only generalized MAP estimate in E, it does not yield any information regarding X \ E. In the next section, we will for general measures µ under additional continuity assumptions extend results from a dense space such as E to the whole space X. 3 relation between generalized and strong modes In this section, we derive conditions under which our definition of generalized modes coincides with the standard notion of strong modes. We do this by based on further characterizations of the convergence of the qualifying sequence in the definition of generalized modes. 3.1 characterization by rate of convergence Consider a general probability measure µ on X. Let us first make the fundamental observation that for δ n > 0 and ˆx, w n X, we have that (3.1) ( ˆx)) sup x X µ(b δ = µ(bδn ( ˆx)) n (x)) (wn )) (w n )) sup x X µ(b δ. n (x)) Clearly, if ˆx is a generalized mode, we have control over the right-most ratio in (3.1). On the other hand, convergence of the ratio on the left-hand side in (3.1) is related to the definition of a strong mode. This leads to the following equivalence. Theorem 3.1. Let ˆx X be a generalized mode of µ. Then ˆx is a strong mode if and only if for every sequence {δ n } n N (0, ) with δ n 0, there exists a qualifying sequence {w n } n N X with w n ˆx and ( ˆx)) (3.2) n (wn )) = 1. 6

7 Proof. Let {δ n } n N be a positive sequence with δ n 0 and let {w n } n N be the corresponding qualifying sequence that satisfies (3.2). Then n ( ˆx)) sup x X (x)) = n ( ˆx)) (wn )) n (w n )) sup x X (x)) = 1 and hence ˆx is a strong mode. Conversely, assume that ˆx is not a strong mode. Then there exists a positive sequence {δ n } n N such that δ n 0 and This, in turn, implies that n n ( ˆx)) (wn )) = n ( ˆx)) sup x X (x)) < 1. ( ˆx)) sup sup x X µ(b δ x X (x)) n (x)) n µ(b δ < 1 n (wn )) for every qualifying sequence {w n } n N. Hence (3.2) does not hold. We can use this theorem to restrict the balls over which the supremum in the definition of M δ is taken. Corollary 3.2. Let ˆx X be a generalized mode of µ. If there exists an r > 0 such that then ˆx is a strong mode. δ 0 µ(b δ ( ˆx)) sup w B r ( ˆx) µ(bδ (w)) = 1, Proof. Let {δ n } n N be a positive sequence such that δ n corresponding qualifying sequence. Then 0, and let {w n } n N X be the n ( ˆx)) (wn )) n ( ˆx)) sup w B r ( ˆx) µ(bδ n (w)) δ 0 µ(b δ ( ˆx)) sup w B r ( ˆx) µ(bδ (w)) = 1 since w n ˆx. Hence, (3.2) holds, and ˆx is therefore a strong mode by Theorem 3.1. We illustrate the possibility of satisfying (3.2) with the following example. Example 3.3. For a probability measure µ on R with continuous density p with respect to the Lebesgue measure, (3.2) is satisfied in every point ˆx R with p( ˆx) > 0. To see this, let ε > 0. Then there exists a δ 0 > 0 such that for all x B δ 0 ( ˆx), p(x) p( ˆx) ε. Therefore, for all δ (0, δ 0 2 ) and w B δ 0 2 ( ˆx), µ(b δ (w)) p( ˆx) 2δ 1 2δ w+δ w δ p(x) p( ˆx) dx ε. 7

8 So, for every positive sequence δ n 0 and every real sequence w n ˆx, (w n )) = p( ˆx) > 0. n 2δ n This holds true in particular for the constant sequence w n = ˆx, so that n ( ˆx)) p( ˆx) µ(b δ = n (wn )) p( ˆx) = 1. The property (3.2) can be seen as the requirement to have sufficiently fast convergence of the qualifying sequences. In other words, we require that the balls B δ ( ˆx) and B δ n (w n ) have asymptotically the same measure. This idea can further quantified by the following theorem. Theorem 3.4. Let ˆx X be a generalized mode of a Borel probability measure µ. If (i) for every positive sequence {δ n } n N with δ 0, there exists a qualifying sequence {w n } n N such that w n ˆx X = 0, n δ n (ii) the family of functions {f n } n N on [0, 1] defined by f n : [0, 1] R, f n (r) := µ(br (δ n+ w n ˆx X ) ( ˆx)) + w n ˆx X ( ˆx)), is equicontinuous at r = 1, then ˆx is a strong mode. Proof. Notice first that by monotonicity of probability, the function ϕ x,s (r) := µ(brs (x)) µ(b s (x)) is a left-continuous increasing function with right its on [0, 1] for every x X and every s > 0 and hence so is f n = ϕ ˆx,δn + w n ˆx X. Let now {δ n } n N be a positive sequence with δ n 0 and let {w n } n N be a qualifying sequence satisfying assumption (i). Setting by the triangle inequality we have r n := w n ˆx X for all n N, ( ˆx)) (w n )) +r n ( ˆx)) r n ( ˆx)) for n large enough. Furthermore, assumption (ii) implies that for every ε > 0, ( ) δn r n f n (1) f n < ε δ n + r n 8

9 for every n large enough, since We thus obtain that r n δ n δ n r n = 1 δ n + r n 1 + r 1 as n. n δn sup n ( ˆx)) (w n )) +r n ( ˆx)) which implies that ( ˆx)) (w n )) ( ( ˆx)) (3.3) n µ(b δ = n+r n ( ˆx)) n µ(b δ 1 µ(bδ n ) (w n )) n+r n ( ˆx)) µ(b δ = 0. n ( ˆx)) Since for every ε > 0 the assumption (ii) yields f n (1) f n (δ n /(δ n + r n )) < ε for every n large, it also follows that ( ˆx)) +r n ( ˆx)) sup n µ(b δ ε n+r n ( ˆx)) and in particular that ( ˆx)) (3.4) n µ(b δ = 1. n+r n ( ˆx)) The two its (3.3) and (3.4) together yield that (w n )) n µ(b δ = 1 n ( ˆx)) ε and, therefore, ˆx is a strong mode by Theorem 3.1. Remark 3.5. The equicontinuity condition (ii) is implied by the η-annular decay property (η-ad) at the generalized mode point (see [2] for the definition). In the finite-dimensional case, the 1-AD property at x is equivalent to f (r) = µ(b r (x)) being locally absolutely continuous on (0, ) and f (r)r c f (r) for some c > 0 and almost every r > 0. For example, it can be seen by direct computation that the function f in Example 2.2 has the 1-AD property at the generalized mode ˆx = characterization by convergence on a dense subspace We next consider the case when the qualifying sequences {w n } n N are restricted to a dense subspace of X where a certain continuity of the ratios holds. Proposition 3.6. Let ˆx X be a generalized mode of a Borel probability measure µ and let E be a dense subset of X. Then for every {δ n } n N (0, ) with δ 0 there exists a sequence { w n } n N E with w n ˆx in X such that n ( w n )) sup x X (x)) = 1. 9

10 Proof. We first show that for every δ > 0, the mapping x µ(b δ (x)) is lower semi-continuous. To this end, let {x n } n N X be a sequence converging to x X and let { 1, if x A, χ A (x) = 0, otherwise, denote the characteristic function of a set A X. Then for every x X, we have and Fatou s Lemma yields µ(b δ (x)) = X inf n χ B δ (x) (x) inf χ n B δ (x n ) (x), χ B δ (x) (x)µ(dx) X X inf n χ B δ (x n ) (x)µ(dx) χ B δ (x n )(x)µ(dx) = inf n µ(bδ (x n )). Now let {δ n } (0, ) with δ n 0 and let {w n } n N X be a corresponding qualifying sequence. Consider a fixed n N. By the lower semi-continuity of x (x)), we can choose an R > 0 such that (v)) (w n )) 1 n Mδ n for all v B R (w n ). Set r = min{r, 1 n }. Because E is dense in X, we can choose a w n B r (w n ), which therefore satisfies both 1 µ(bδ n ( w n )) M δ µ(bδ n (w n )) n M δ 1 n n and w n ˆx X w n w n X + w n ˆx X 1 n + w n ˆx X. As n N was arbitrary, we obtain the desired sequence { w n } n N E with w n ˆx in X and ( w n )) 1 n M δ n (w n )) 1 n M δ n n n = 1. In order to give a sufficient condition for the coincidence of generalized and strong modes, we consider a more specific class of probability measures. Let µ be a Borel probability measure that possesses a dense continuously embedded subspace (E, E ) X of admissible shifts, i.e., the shifted measure µ h := µ( h) is equivalent to µ for every h E, and assume that for every h E the density of µ h with respect to µ has a continuous representative dµ h dµ C(X). Theorem 3.7. Let ˆx X be a generalized mode of µ. If (i) for every {δ n } n N (0, ) with δ n 0 there is a qualifying sequence {w n } n N ˆx + E with w n ˆx E 0, 10

11 (ii) there is an R > 0 such that is continuous in ˆx, then ˆx is a strong mode. f R : (E, E ) R, f R (w) := sup dµ ˆx w (x) 1 x B R ( ˆx) dµ Proof. Choosing N large enough such that δ n R for all n N, we have for all n N that µ(b δ n ( ) (w n )) µ(b δ 1 n ( ˆx)) = 1 dµ ˆx wn µ(b δ (x) 1 µ(dx) n ( ˆx)) B δn ( ˆx) dµ sup dµ ˆx wn (x) 1 1 dµ µ(b δ µ(dx) f R (w n ). n ( ˆx)) x B δn ( ˆx) B δn ( ˆx) However, f R (w n ) f R ( ˆx) = 0 by the continuity of f R and the convergence w n ˆx in E, so that (w n )) n µ(b δ = 1. n ( ˆx)) Hence, ˆx is a strong mode by Theorem 3.1. For a nondegenerate Gaussian measure µ = N(a, Q) on a separable Hilbert space X with mean a X and covariance operator Q L(X ), the assumptions of Theorem 3.7 are fulfilled for E = Q(X) and ˆx = w n := a for all n N as well as any R > 0. Corollary 3.8. Let ˆx X be a generalized mode of µ that satisfies condition (i) of Theorem 3.7. If additionally dµ ˆx w ( ˆx) = 1 w E ˆx dµ and there is an r > 0 such that the family { } dµ ˆx w : w B r E dµ ( ˆx), B r E ( ˆx) := {x ˆx + E : x ˆx E < r } ˆx + E, is equicontinuous in ˆx, then ˆx is a strong mode Proof. We show that condition (ii) of Theorem 3.7 is satisfied as well, from which the claim then follows. For a given ε > 0 we choose 0 < r 0 r small enough such that for all w B r 0 E ( ˆx) we have that dµ ˆx w ( ˆx) 1 dµ ε 2. If we also choose 0 < R r 0 such that for all x BX R ( ˆx) and all w Br E (w) we have that dµ ˆx w dµ (x) dµ ˆx w dµ ( ˆx) ε 2, 11

12 then the triangle inequality yields that f R (w) f R ( ˆx) = sup x B R ( ˆx) dµ ˆx w dµ (x) 1 0 ε for all w B r 0 E ( ˆx) and therefore the equicontinuity of f at ˆx. 4 modes of uniform priors We now demonstrate the usefulness of the concept of generalized modes for a class of uniform probability measures on infinite-dimensional spaces that can serve as priors in Bayesian inverse problems. We first discuss the rigorous construction of the uniform probability measure and then show that such measures admit generalized but in general not strong modes. 4.1 construction of the probability measure We proceed similar as in [9], with the difference that we define a probability measure on a subspace of l rather than of L. Let us first fix some notation. For x := {x k } k N l, we write x = sup k N x k. Furthermore, let e j l for j N denote the standard unit vector in l, i.e., [e k ] j = 1 for j = k and 0 else. We then define (4.1) X := span{e k } k N l and note that X = {x l : k x k = 0} =: c 0. We thus have that (X, ) is a separable Banach space. We now construct a class of probability measures on X whose mass is concentrated on a set of sequences with strictly bounded components. First, we define a random variable ξ according to the random series (4.2) ξ := where γ k ξ k e k, (i) ξ k U[ 1, 1] (i.e., uniformly distributed on [ 1, 1]) and (ii) γ k 0 for all k N with γ k 0. k=1 Note that the partial sums ξ n := n k=1 γ kξ k e k almost surely form a Cauchy sequence in X, since for all N,m, n N with N m n we have that ξ n ξ m = n k=m+1 γ kξ k e k = sup m+1 k n γ k ξ k sup k N γ k, and the right hand side tends to zero as N. Since X is complete, the series (4.2) therefore converges almost surely. We can thus define the probability measure µ γ on X by (4.3) µ γ (A) := P [ξ A] for every A B(X ), 12

13 The following sets will be important for our study of the generalized and strong modes of µ γ. We define and for every δ > 0, E γ := {x X : x k γ k for all k N}, as well as Eγ δ := {x X : x k max{γ k δ, 0} for all k N} E γ Eγ 0 := Eγ δ. δ >0 We first collect some basic properties of E γ and E δ γ. Proposition 4.1. The sets E γ and E δ γ are convex, compact, and have empty interior. Proof. We only consider the case of E γ ; the case of E δ γ follows analogously. First, convexity follows directly from the definition since for every x, y E γ and λ [0, 1] we have that {λx + (1 λ)y} k = λx k + (1 λ)y k λ x k + (1 λ) y k γ k for all k N, and hence that λx + (1 λ)y E γ. For the compactness of E γ, we first show that X \ E γ is open. Let x X \ E γ be arbitrary. Then there exists an m N with x m > γ m, so that for ε := x m γ m we have B ε (x) X \ E γ as claimed. Hence as a closed subset of a complete space, E γ is itself complete, which allows us to show compactness by constructing a finite covering of E γ by balls of radius ε. Let ε > 0. As γ k 0, there is an N N such that for every x E γ and all k N + 1, x k γ k < ε. Now choose M N with Mε max{γ k : k = 1,..., N }. For every x E γ and k {1,..., N }, there is a λ k { M,..., M} such that x k λ k ε < ε. Hence, x z N < ε forz N := N k=1 λ kεe k, which implies that N E γ B ε (z N ) k=1 is the desired finite covering. Finally, to show that E γ has empty interior, let x E γ and ε > 0 be arbitrary. We choose m N such that γ m < ε 3. Then, y := x + 3γ mϕ m B ε (x), because y x = 3γ m ϕ m = 3γ m < ε, but y E γ. Consequently, x E γ and hence the interior of E γ is empty. Of particular use for finding both generalized and strong modes will be the metric projections onto E δ γ. For any δ > 0, let P δ : X Eγ δ, P δ (x) := x with x x = inf z x, z Eγ δ 13

14 which is well-defined because E δ γ is closed and convex by Proposition 4.1. It is straightforward to show by case distinction that this projection can be characterized componentwise for every k N via 0 if γ k < δ, (4.4) [P δ γ k δ if x k > γ k δ > 0, (x)] k = x k if x k [ γ k + δ,γ k δ], γ k + δ if x k < γ k + δ < 0. The projection satisfies the following properties. Lemma 4.2. Let x E γ. Then, δ 0 P δ x = x in X. Proof. This follows directly from the definition since {P δ x} k x k δ for all k N. Finally, we can give a more explicit characterization of E 0 γ. Lemma 4.3. We have that E 0 γ = {x X : x k < γ k for all k N, x k 0 for finitely many k N}. Proof. By definition, for every x E 0 γ there exists a δ > 0 with x E δ γ This implies that x k max{γ k δ, 0} < γ k for all k N. Moreover, since γ k 0 there exists an N N with x k max{γ k δ, 0} = 0 for all k N. This yields γ k δ 0 for all k N. Now let x X be an element from the set on the right hand side. As x k < γ k for all k N and because there are only finitely many k N with x k 0, we can choose δ 0 := min{γ k x k : k N with x k 0} > 0 such that x E δ 0 γ E 0 γ. 4.2 small ball probabilities We next study the behavior of small balls under the probability measure µ γ, which is crucial for determining modes. For the sake of conciseness, let (4.5) J δ γ (x) := µ γ (B δ (x)) for all x X, for which we have the following straightforward characterization. Lemma 4.4. For every x X and δ > 0, it holds that µ γ (B δ (x)) = J δ γ (x) = where A denotes the closure of a set A X. P [γ k ξ k (x k δ, x k + δ)], k=1 14

15 Proof. First of all, by definition J δ γ (x) = µ γ (B δ (x)) = P [ ] ξ B δ (x) = P [ γ k ξ k x k < δ for all k N]. As both γ k 0 and x k 0, there exists an N N such that γ k + x k δ 2 for all k N + 1. This implies that since ξ k [ 1, 1] almost surely. Consequently, P [ γ k ξ k x k < δ for all k N + 1] = 1 µ γ (B δ (x)) = P [ γ k ξ k x k < δ for k = 1,..., N ] P [ γ k ξ k x k < δ for all k N + 1] N = P [ γ k ξ k x k < δ] k=1 by the independence of the ξ k. This yields the second identity. The first identity now follows from P [γ k ξ k [x k δ, x k + δ]] = P [γ k ξ k (x k δ, x k + δ)] for all k N. We can use this characterization to show that for every δ > 0, the origin maximizes J δ γ. Proposition 4.5. Let δ > 0. Then J δ γ (0) = max x X J δ γ (x) > 0. Proof. From the second equality in Lemma 4.4, we have that µ γ (B δ (x)) = P [γ k ξ k (x k δ, x k + δ)] k=1 P [γ k ξ k ( δ, δ)] = µ γ (B δ (0)) k=1 for every x = {x k } k N X. On the other hand, µ γ (B δ (0)) sup x X µ γ (B δ (x)), which shows that x = 0 is a maximizer. For the positivity, note that as γ k 0, there exists an N N such that µ γ (B δ (0)) = N P [γ k ξ k ( δ, δ)] > 0 k=1 since P [γ k ξ k ( δ, δ)] > 0 for all k N. Crucially, J δ γ is constant on E δ γ. Proposition 4.6. For every δ > 0, J δ γ (x 1 ) = J δ γ (x 2 ) for all x 1, x 2 E δ γ. 15

16 Proof. Let x 1, x 2 Eγ δ. For every k N, we distinguish between the following two cases: (i) δ γ k : In this case, (x 1,k δ, x 1,k + δ) ( γ k,γ k ) and (x 2,k δ, x 2,k + δ) ( γ k,γ k ), so that P [ γ k ξ k (x 1,k δ, x 1,k + δ) ] = δ = P [ γ k ξ k (x 2,k δ, x 2,k + δ) ]. γ k (ii) δ > γ k : In this case, x 1,k = x 2,k = 0 by definition of E δ γ, and hence Together we obtain that P [ γ k ξ k (x 1,k δ, x 1,k + δ) ] = 1 = P [ γ k ξ k (x 2,k δ, x 2,k + δ) ]. J δ γ (x 1 ) = P = [ ] ξ B δ (x 1 ) = P [ γ k ξ k (x 1,k δ, x 1,k + δ) ] k=1 P [ γ k ξ k (x 2,k δ, x 2,k + δ) ] = P k=1 [ ] ξ B δ (x 2 ) = Jγ δ (x 2 )) as claimed. Since 0 E δ γ, combining Propositions 4.5 and 4.6 immediately yields that every x E δ γ maximizes J δ γ. Corollary 4.7. For δ > 0 and every x E δ γ, 4.3 modes of the probability measure J δ γ ( x) = max x X J δ γ (x) > 0. Finally, we characterize both the strong and the generalized modes of the uniform probability measure and show that these do not coincide. Theorem 4.8. Every point ˆx E 0 γ is a strong mode of µ γ. Proof. By definition of E 0 γ there is a δ > 0 such that ˆx E δ γ. Then by Corollary 4.7, and therefore M δ = max x X µ γ (B δ (x)) = µ γ (B δ ( ˆx)) µ γ (B δ ( ˆx)) µ γ (B δ ( ˆx)) δ 0 M δ = δ 0 µ γ (B δ ( ˆx)) = 1. 16

17 However, the following results shows that Definition 2.1 is too restrictive in this case and in fact can be unintuitive. Proposition 4.9. The following claims hold: (i) If for x E γ there is an m N with x m = γ m > 0, then x is not a strong mode of µ γ. (ii) There are γ, x X with x k < γ k for all k N such that x is not a strong mode of µ γ. Proof. Ad (i): First, note that for δ > 0 small enough, Moreover, for all k N and δ > 0, so that P [γ m ξ m (x m δ, x m + δ)] = 1 2 P [γ mξ m ( δ, δ)]. P [γ k ξ k (x k δ, x k + δ)] P [γ k ξ k ( δ, δ)] µ γ (B δ (x)) µ γ (B δ (0)) = k=1 P [γ k ξ k (x k δ, x k + δ)] P [γ k ξ k ( δ, δ)] for δ > 0 small enough. But by Proposition 4.5, we also have that sup δ 0 µ γ (B δ (x)) M δ = sup δ µ γ (B δ (x)) µ γ (B δ (0)) 1 2 < 1. Hence, x cannot be a strong mode. Ad (ii): We take γ k = 1 k and x k = 1 2 γ k = 1 2k for all k N. For given δ (0, 1 4 ) we also choose n N such that 1 n δ > 1 n+1, i.e., 1 2 γ n < δ γ n. Hence, as well as for n 4. In addition, Consequently, P [γ n ξ n (x n δ, x n + δ)] = γ n (x n δ) 2γ n γ n ( 1 2 γ n) 2γ n = 3 4, P [γ n ξ n ( δ, δ)] = δ γ n > n n P [γ k x k (x k δ, x k + δ)] P [γ k x k ( δ, δ)] for all k N. µ γ (B δ (x)) µ γ (B δ (0)) = k=1 P [γ k ξ k (x k δ, x k + δ)] P [γ k ξ k ( δ, δ)] P [γ nξ n (x n δ, x n + δ)] P [γ n ξ n ( δ, δ)] Proposition 4.5 now yields that sup δ 0 µ γ (B δ (x)) M δ and hence that x is not a strong mode. = sup δ = µ γ (B δ (x)) µ γ (B δ (0)) < 1. 17

18 On the other hand, every point in E γ is a generalized mode and vice versa. Theorem A point x X is a generalized mode of µ γ if and only if x E γ. Proof. Assume first that x E γ and set w δ := P δ x E δ γ for all δ > 0. Then, w δ x as δ 0 by Lemma 4.2 and µ γ (B δ (w δ )) = µ γ (B δ (0)) by Proposition 4.6, so that µ γ (B δ (w δ )) µ γ (B δ (0)) = 1 for all δ > 0. Since M δ := max x X µ γ (B δ (x)) = µ γ (B δ (0)) by Proposition 4.5, it follows from Lemma 2.4 that x is a generalized mode. Conversely, assume that x X \ E γ. Then there exists an m N with x m > γ m. Taking now δ 0 := x m γ m > 0, we have that P [γ m ξ m (x m δ, x m + δ)] = 0 for all δ (0, δ 0 ) and hence that This implies that µ γ (B δ (x)) = P [γ k ξ k (x k δ, x k + δ)] = 0 for all δ (0, δ 0 ). k=1 µ γ (B δ ( (w)) M δ = 0 for all δ 0, δ ) 0 2 and hence x cannot be a generalized mode. and w B δ 0 2 (x), 5 variational characterization of generalized map estimates In this section, we consider Bayesian inverse problems with uniform priors and characterize the corresponding generalized MAP estimates i.e., the generalized modes of the posterior distribution as minimizers of an appropriate objective functional. Let X be the separable Banach space defined by (4.1) and choose as prior µ 0 the uniform probability distribution µ γ defined by (4.3) for some non-negative sequence of weights γ k 0. We assume that for given data y from a Banach space Y the posterior distribution µ y satisfies µ y µ 0 and can be expressed as (5.1) µ y (A) = 1 Z(y) for all A B(X ), where Z(y) := X A exp( Φ(x; y))µ 0 (dx) exp( Φ(x; y))µ 0 (dx) is a positive and finite normalization constant and Φ: X Y R is the likelihood. This way, µ y constitutes a probability measure on X. Throughout this section, we fix y Y and abbreviate Φ(x; y) by Φ(x). We make the following assumption on the likelihood. 18

19 Assumption 5.1. The function Φ : X R is Lipschitz continuous on bounded sets, i.e., for every r > 0, there exists L = L r > 0 such that for all x 1, x 2 X with x 1 X, x 2 X r we have Φ(x 1 ) Φ(x 2 ) L x 1 x 2 X. In Theorem 4.10, we have seen that the indicator function ι Eγ of E γ in the sense of convex analysis, i.e., { 0, if x E γ, ι Eγ : X R := R { }ι Eγ (x) :=, otherwise. is the suitable functional to minimize in order to find the generalized modes of the prior measure µ 0. In contrast, the common Onsager Machlup functional is not defined for µ 0, as µ 0 is not quasi-invariant with respect to shifts along any direction x X \ {0}. The goal of this section is to relate a similar functional that characterizes generalized MAP estimates to a suitably generalized it of small ball probabilities. Specifically, we define the functional { Φ(x), if x E γ, (5.2) I : X R, I(x) := Φ(x) + ι Eγ (x) =, otherwise. Furthermore, for δ > 0 we denote by J δ (x) := µ y (B δ (x)) for all x X the small ball probabilites of the posterior measure. Let x δ denote a (not necessarily unique) maximizer of J δ in Eγ δ, i.e., J(x δ ) = max J δ (x). x Eγ δ What we prove below (in Theorem 5.9) is the following: (i) {x δ } δ >0 contains a convergent subsequence (because E γ is compact) and the it of the convergent subsequence minimizes I. (ii) Any minimizer of I is a generalized MAP estimate and vice versa. We first demonstrate the necessity to work with subsequences by an example which shows that the sequence {x δ } δ >0 may not have a unique it. Example 5.2. Define д(0) = 0, д(x) = n, if 1 < x 2n+1 ( ) n 1, n Z 2 and consider a probability measure µ on R with density f with respect to the Lebesgue measure, defined via { f (x) := max 1 д(x 1), 1 ( ) } x + 1 f (x) 2д, 0, f (x) := 2 f (x)dx. R 19

20 Figure 1: probability density f constructed in Example 5.2 for which the family of maximizers {x δ } δ >0 has two cluster points ˆx 1 = 1 and ˆx 2 = 1 To find the set of points x for which the probability of B δ (x) is maximal under µ for given 1 δ (0, 1), first let n N be such that < δ 1 2 n+1 2. By a case distinction, one can then verify n that [ ] ( ] δ, δ if δ 1, 2, arg max µ(b δ 2 (x)) = n+1 2 n+1 2 n+1 2 n+1 [ x R δ, n 2 δ ] ( ] if δ 2 1, n 2 n+1 2, n Hence, any family {x δ } δ >0 of maximizers x δ arg max x R µ(b δ (x)) has the two cluster points 1 and 1; see Figure 1. We begin our analysis by collecting some auxiliary results on Φ and I. Lemma 5.3. If Assumption 5.1 holds, then there exists a constant M > 0 such that (5.3) Φ(x) M for all x E γ. Proof. Since E γ is compact and therefore bounded by Proposition 4.1, there exists some R 0 such that x R for all x E γ. The Lipschitz continuity of Φ on bounded sets then implies that for any x E γ, Φ(x) Φ(x) Φ(0) + Φ(0) L x + Φ(0) LR + Φ(0) =: M. Lemma 5.4. If Assumption 5.1 holds, then there exists an x E γ such that I( x) = min x X I(x). Proof. As I(x) = for all x X \ E γ, we only need to consider I Eγ = Φ Eγ, which is Lipschitz continuous on the bounded set E γ by Assumption 5.1. Furthermore, the set E γ is nonempty and by Proposition 4.1 closed and compact. The Weierstrass Theorem therefore implies that I attains its minimum in a point x E γ. We also need the following results on the small ball probabilities of the posterior. 20

21 Lemma 5.5. Assume that Assumption 5.1 holds. Then for every δ > 0 there exists a constant c(δ) > 0 such that J δ (x) c(δ) > 0 for all x E δ γ. Proof. First, µ 0 (B δ (x)) = µ 0 (B δ (0)) > 0 for all x Eγ δ by Proposition 4.6 and Corollary 4.7. Hence we can estimate using Lemma 5.3 that J δ (x) = 1 exp( Φ(x))µ 0 (dx) 1 Z(y) Z(y) e M µ 0 (B δ (0)) =: c(δ) > 0, B δ (x) because Z(y) is positive and finite by assumption. We next show show that J δ is maximized within E δ γ by making use of the metric projection P δ : X E δ γ defined in (4.4). Lemma 5.6. For all δ > 0, we have that (5.4) J δ (P δ x) J δ (x) for all x X. Proof. We note that B δ (x) \ B δ (P δ x) X \ E γ, so that µ 0 (B δ (x) \ B δ (P δ x)) = 0. Moreover, µ 0 (B δ (P δ x) \ B δ (P δ x)) = 0 by Lemma 4.4, so that µ 0 (B δ (x) \ B δ (P δ x)) = 0 as well. With this knowledge we estimate J δ (x) = 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (x) 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (x) B δ (P δ x) = 1 exp( Φ(x))µ 0 (dx) = J δ (P δ x). Z(y) Corollary 5.7. For all δ > 0, B δ (P δ x) J(x δ ) := max J δ (x) = max J δ (x). x Eγ δ x X Proof. If x X is a maximizer of J δ, then P δ x E δ γ is a maximizer as well. The following proposition indicates that the functional I plays the role of a generalized Onsager Machlup functional for the posterior distribution µ y associated with the uniform prior µ 0 defined by (4.3). Proposition 5.8. Suppose that Assumption 5.1 holds. Let x 1, x 2 E γ and {w δ 1 } δ >0, {w δ 2 } δ >0 E γ with (i) w δ 1, wδ 2 Eδ γ for all δ > 0, (ii) w δ 1 x 1 and w δ 2 x 2 as δ 0. 21

22 Then, (5.5) δ 0 J δ (w δ 1 ) J δ (w δ 2 ) = exp(i(x 2) I(x 1 )). Proof. Let δ > 0. We consider J δ (w 1 δ ) J δ (w 2 δ ) = B δ (w1 δ ) exp( Φ(x))µ 0(dx) B δ (w2 δ ) exp( Φ(x))µ 0(dx) B = exp(φ(x 2 ) Φ(x 1 )) δ (w1 δ ) exp(φ(x 1) Φ(x))µ 0 (dx) B δ (w δ2 ) exp(φ(x 2) Φ(x))µ 0 (dx), which is well-defined by Lemma 5.5. By Assumption 5.1, ( ) Φ(x 1 ) Φ(x) L x 1 x L x 1 w1 δ + w1 δ x L ( ) x 1 w1 δ + δ for all x B δ (w δ 1 ) and Φ(x 2 ) Φ(x) L x 2 x L ( ) x 2 w2 δ + δ for all x B δ (w δ 2 ), where L is the Lipschitz constant of Φ on the bounded set x E γ B δ (x). It follows that exp(φ(x 2 ) Φ(x 1 ))e L( x 1 w1 δ + x 2 w2 δ +2δ) µ 0(B δ (w 1 δ )) µ 0 (B δ (w 2 δ )) J δ (w 1 δ ) J δ (w 2 δ ) exp(φ(x 2 ) Φ(x 1 ))e L( x 1 w δ 1 + x 2 w δ 2 +2δ) µ 0(B δ (w δ 1 )) µ 0 (B δ (w δ 2 )). Now Proposition 4.6 implies that µ 0 (B δ (w 1 δ )) = µ 0(B δ (w 2 δ )) for all δ > 0, and hence taking the it δ 0 and using (ii) together with the fact that I(x) = Φ(x) for all x E γ yields the claim. Note that Lemma 4.2 immediately implies that (5.5) also holds true for the prior distribution µ 0 if I is replaced by ι Eγ. However, it is an open question how Proposition 5.8 can be generalized to cover a wider class of distributions while sustaining the connection to generalized MAP estimates described in the following theorem, which is the main result of this section. Theorem 5.9. Suppose Assumption 5.1 holds. Then the following hold: (i) For every {δ n } n N (0, ), the sequence {x δ n } n N contains a subsequence that converges strongly in X to some x E γ. (ii) Any cluster point x E γ of {x δ n } n N is a minimizer of I. 22

23 (iii) A point ˆx X is a minimizer of I if and only if it is a generalized MAP estimate for µ y. Proof. Ad (i): By definition, x δ E δ γ E γ for all δ > 0. Since E γ is compact and closed by Proposition 4.1, there exists a convergent subsequence, again denoted by {x δ n } n N, with it x E γ. Ad (ii): Let now x E γ be the it of an arbitrary convergent subsequence still denoted by {x δ n } n N and assume that x is not a minimizer of I. By Lemma 5.4, a minimizer x E γ of I exists and by assumption satisfies I( x) > I(x ). Moreover, P δ x x as δ 0 by Lemma 4.2. Now the definition of x δ and Proposition 5.8 yield µ y (B δ n (x δ n )) 1 n µ y (B δ n (P δ nx )) = exp(i(x ) I( x)) < 1, a contradiction. So x is in fact a minimizer of I. Ad (iii): Now let x X be a minimizer of I which by definition satisfies x E γ. To see that x is a generalized MAP estimate, we choose w δ := P δ x E δ γ for every δ > 0, which implies that w δ x δ. Furthermore, by definition of x δ and M δ, µ y (B δ (x δ )) = max x X µy (B δ (x)) = M δ for all δ > 0. By Assumption 5.1 and the boundedness of x E γ B δ (x), we can use the Lipschitz continuity of Φ to obtain the estimate ) Φ(x) Φ(x ) + L ( x w δ + w δ x Φ(x ) + 2Lδ for all x B δ (w δ ). On the other hand, a similar argument shows that Consequently, and Φ(x) Φ(x δ ) L x x δ Φ(x δ ) Lδ for all x B δ (x δ ). µ y (B δ (w δ )) = 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (w δ ) exp( Φ(x 1 ) 2Lδ) Z(y) µ 0(B δ (w δ )), M δ = µ y (B δ (x δ )) = 1 exp( Φ(x))µ 0 (dx) Z(y) B δ (x δ ) exp( Φ(x δ 1 ) + Lδ) Z(y) µ 0(B δ (x δ )). As µ 0 (B δ (w δ ) = µ 0 (B δ (x δ )) by Proposition 4.6, combining these two estimates yields µ y (B δ (w δ )) µ y (B δ (x δ )) exp(φ(xδ ) Φ(x ) 3Lδ). Now the minimizing property of x implies that µ y (B δ (w δ )) δ 0 µ y (B δ (x δ )) exp( 3Lδ) = 1 δ 0 23

24 and hence that x is a generalized MAP estimate. Conversely, let ˆx be a generalized MAP estimate and {δ n } (0, ) such that δ n 0 with corresponding qualifying sequence {w n } n N X, i.e., w n ˆx and µ y (B δ n (w n )) n M δ = 1. n From Lemma 4.2, it follows that P δ n w n ˆx as well. We then have ˆx E γ, because otherwise the closedness of E γ by Proposition 4.1 would imply that µ y (w n ) = 0 for n large enough. Also, by definition of x δ, µ y (B δ (x δ )) = max x X µy (B δ (x)) = M δ for all δ > 0. Now by (i) and (ii), we may extract a subsequence, again denoted by {x δ n } n N, such that x δ n x E γ and x is a minimizer of I. Proposition 5.8 and Lemma 5.6 then yield that µ y (B δ n (P δ n w n )) µ y (B δ n (w n )) exp(i( x) I( ˆx)) = n µ y (B δ n (x δ n )) n µ y (B δ = 1. n (x δ n )) Hence, I( x) I( ˆx), i.e., ˆx is a minimizer of I. Remark Theorem 5.9 shows that for inverse problems subject to Gaussian noise as in the following section, the generalized MAP estimate coincides with Ivanov regularization with the specific choice of the compact set E γ ; compare (6.1) below with, e.g., [14, 23, 28]. Hence, Ivanov regularization can be considered as a non-parametric MAP estimate for a suitable choice of the compact set the solution is restricted to. Furthermore, we point out that for linear inverse problems where the Ivanov functional is convex, the minimizers generically lie on the boundary of the compact set; see, e.g., [28, Prop. 2.2 (iii)] and [6, Cor. 2.6]. However, this is not possible for strong modes due to Proposition 4.9 (i), which further indicates the need to consider generalized modes in this setting. 6 consistency in nonlinear inverse problems with gaussian noise Finally, we show consistency in the small noise it of the generalized MAP estimate from Section 6 for the sepcial case of finite-dimensional data and additive Gaussian noise. Let Y = R K for a fixed K N, endowed with the Euclidean norm, and F : X Y be a closed nonlinear operator. Assuming x µ 0 for the prior measure µ 0 defined via (4.3) and y Y is a corresponding measurement of F(x) corrupted by additive Gaussian noise with mean 0 and positive definite covariance operator Σ R K K scaled by δ > 0, the corresponding posterior measure µ y is given by dµ y dµ 0 (x) = 1 Z(y) exp( Φ(x; y)) for µ 0-almost all x X, where Φ(x; y) := 1 2δ 2 Σ 1 2 (F(x) y) 2 Y 24

25 for all x X and y Y. Furthermore, we assume that for every R > 0, the restriction of F to B R (0) is Lipschitz continuous, so that Assumption 5.1 is satisfied. Then Theorem 5.9 yields that the generalized MAP estimates for µ y are given by the minimizers of the functional (6.1) I(x) = 1 2δ 2 Σ 1 2 (F(x) y) 2 Y + ι Eγ (x). Now let {δ n } n N (0, ) with δ n 0 and consider a frequentist setup where we have a true solution x E γ and a sequence {y n } n N Y of measurements given by y n = F(x ) + δ n η n, where η n N(0, Σ) are independently and identically distributed. Moreover, define (6.2) I n (x) := 1 2δn 2 Σ 2 1 (F(x) yn ) Y 2 + ι E γ (x) for all n N and x X. Let x n E γ denote a minimizer of I n for all n N. Then we have the following consistency result. Theorem 6.1. Suppose that F: X Y is closed and x E γ. Then any sequence {x n } n N of minimizers of (6.2) contains a convergent subsequence whose it x E γ satisfies F( x) = F(x ) almost surely. Proof. Let e k (x) := x k for all x X and k N. As x n E γ for all n N, the sequence {x n } n N is almost surely bounded with e k, x n X = [x n ] k γ k almost surely for all k N. We can thus extract for every k N a subsequence with E [ e k, x n X ] xk and x k γ k. Here and in the following, we pass to the subsequence without adapting the notation. By a diagonal sequence argument, we can thus construct a sequence, again denoted by {x n } n N, satisfying E [ e k, x n X ] xk for all k N and x := k N x k e k E γ. We now write for every v X l 1 and N N E [ v, x n x X ] = [ E k=1 v, e k X e k, x ] n x X N v, e k X [ E e k, x ] n x X + k=1 v X k=n +1 v, e k X 2γ k sup [ E e k, x ] n x X + 2 v X sup γ k. k {1,..., N } k N +1 Since γ k, we can for every ε > 0 choose N large enough such that 2 v X sup γ k ε k N +1 2, 25

26 and then choose n 0 large enough such that also v X E [ e k, x n x X ] ε 2 since E [ e k, x n X ] e k, x X. Consequently, for all n n 0 and k {1,..., N } E [ v, x n x X ] ε for all n n 0, i.e., E [ v, x n x X ] 0. From this we conclude that v, x n x X 0 in probability as n for all v X. This implies the existence of a subsequence that converges weakly to x almost surely. Therefore, by compactness of E γ, a further subsequence converges strongly to x almost surely. Furthermore, inserting the definition of y n yields that I n (x) = 1 2δn 2 Σ 2 1 (F(x ) F(x) + δ n η n ) Y 2 = 1 2δn 2 Σ 2 1 (F(x ) F(x)) Y Σ 2 1 (F(x ) F(x)), Σ 2 1 ηj Y + 1 δ n 2 n Σ 2 1 ηj Y 2 for all x E γ. Note that the first two terms vanish for x = x E γ and that I n (x n ) I n (x ) by definition of x n. Rearranging this inequality and using Cauchy s and Young s inequalities, we thus obtain that Σ 1 2 (F(x ) F(x n )) 2 Y 2δ n Σ 1 2 (F(x ) F(x n )), Σ 1 2 ηj Y 2δ n Σ 1 2 (F(x ) F(x n )) Y Σ 1 2 ηj Y j=1 1 2 Σ 1 2 (F(x ) F(x n )) 2 Y + 2δ 2 n Σ 1 2 ηj 2 Y. Consequently, and hence E Σ 1 2 (F(xn ) F(x )) 2 Y 4δ 2 n Σ 1 2 ηj 2 Y, [ ] [ ] Σ 2 1 (F(xn ) F(x )) Y 2 4δnE 2 Σ 2 1 ηj Y 2 = 4Kδn 2 0, which also implies that Σ 1 2 (F(xn ) F(x )) Y 0 in probability. After passing to a further subsequence, we thus obtain that F(x n ) F(x ) almost surely. The closedness of F then yields that F( x) = F(x ) almost surely. Note that although Theorem 6.1 does not require the Lipschitz continuity of F on bounded sets, in general without it we do not know if the minimizers of I n exist or if they are generalized MAP estimates for µ y. By a standard subsequence-subsequence argument, we can obtain convergence of the full sequence under the usual assumption on F. 26

27 Corollary 6.2. If F is injective, then x n x in probability as n. Proof. We can apply the proof of Theorem 6.1 to every subsequence of {x n } n N to obtain a further subsequence that converges to x = x almost surely. This in turn is equivalent to the convergence of the whole sequence to x in probability by [18, Cor. 6.13]. Remark 6.3. Consistency in the large sample size it can be shown analogously. Specifically, assuming a fixed noise level δ > 0 (which without loss of generality we can fix at δ = 1) and n independent measurements y 1,..., y n Y, the posterior measure µ n on X has the density dµ n dµ 0 (x) := ( n j=1 ) 1 Z(y j ) exp ( 1 2 ) n Σ 2 1 (F(x) yj ) Y 2 Again we consider a frequentist setup with a true solution x E γ and j=1 y j = F(x ) + η j for all j N. for µ 0 -almost all x X Then it can be shown as in Theorem 6.1 that any sequence of minimizers x n E γ of I n (x) := 1 2 n Σ 2 1 (F(x) yj ) Y 2 + ι E γ (x) j=1 contains a convergent subsequence whose it x E γ satisfies F( x) = F(x ) almost surely and that the full sequence converges to x in probability if F is injective. 7 conclusion We have proposed a novel definition of generalized modes and corresponding MAP estimates for non-parametric Bayesian statistics. Our approach extends the construction of [8] by replacing the fixed base point with a qualifying sequence in the convergence of small ball probabilities. This allows covering cases where the previous approach fails, such as that of priors having discontinuous densities. Our definition coincides with the definition from [8] for Gaussian priors as well as for priors which admit a qualifying sequence satisfying additional convergence properties. For uniform priors defined via a random series with bounded coefficients, such generalized modes can be shown to exist. Furthermore, the corresponding generalized MAP estimates in Bayesian inverse problems (i.e., generalized modes of the posterior distribution) can be characterized as minimizers of a functional that can be seen as a generalization of the Onsager Machlup functional in the Gaussian case. For inverse problems with finite-dimensional Gaussian noise, the generalized MAP estimates are consistent in the small noise as well as in the large sample it. This work can be extended in several directions. The first question is whether qualifying sequences can be constructed from MAP estimates for a sequence of finite-dimensional problems, i.e., whether our generalized MAP estimates arise as Γ-its of finite-dimensional MAP estimates. Also, it would be of interest to study Gaussian priors with non-negativity constraints, where 27

Bayesian inverse problems with Laplacian noise

Bayesian inverse problems with Laplacian noise Bayesian inverse problems with Laplacian noise Remo Kretschmann Faculty of Mathematics, University of Duisburg-Essen Applied Inverse Problems 2017, M27 Hangzhou, 1 June 2017 1 / 33 Outline 1 Inverse heat

More information

l(y j ) = 0 for all y j (1)

l(y j ) = 0 for all y j (1) Problem 1. The closed linear span of a subset {y j } of a normed vector space is defined as the intersection of all closed subspaces containing all y j and thus the smallest such subspace. 1 Show that

More information

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University

Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University Lecture Notes in Advanced Calculus 1 (80315) Raz Kupferman Institute of Mathematics The Hebrew University February 7, 2007 2 Contents 1 Metric Spaces 1 1.1 Basic definitions...........................

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

MATH 4200 HW: PROBLEM SET FOUR: METRIC SPACES

MATH 4200 HW: PROBLEM SET FOUR: METRIC SPACES MATH 4200 HW: PROBLEM SET FOUR: METRIC SPACES PETE L. CLARK 4. Metric Spaces (no more lulz) Directions: This week, please solve any seven problems. Next week, please solve seven more. Starred parts of

More information

Optimization and Optimal Control in Banach Spaces

Optimization and Optimal Control in Banach Spaces Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,

More information

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION

ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION ON GENERALIZED-CONVEX CONSTRAINED MULTI-OBJECTIVE OPTIMIZATION CHRISTIAN GÜNTHER AND CHRISTIANE TAMMER Abstract. In this paper, we consider multi-objective optimization problems involving not necessarily

More information

THEOREMS, ETC., FOR MATH 515

THEOREMS, ETC., FOR MATH 515 THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

Real Analysis Notes. Thomas Goller

Real Analysis Notes. Thomas Goller Real Analysis Notes Thomas Goller September 4, 2011 Contents 1 Abstract Measure Spaces 2 1.1 Basic Definitions........................... 2 1.2 Measurable Functions........................ 2 1.3 Integration..............................

More information

Overview of normed linear spaces

Overview of normed linear spaces 20 Chapter 2 Overview of normed linear spaces Starting from this chapter, we begin examining linear spaces with at least one extra structure (topology or geometry). We assume linearity; this is a natural

More information

Proof. We indicate by α, β (finite or not) the end-points of I and call

Proof. We indicate by α, β (finite or not) the end-points of I and call C.6 Continuous functions Pag. 111 Proof of Corollary 4.25 Corollary 4.25 Let f be continuous on the interval I and suppose it admits non-zero its (finite or infinite) that are different in sign for x tending

More information

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9

MAT 570 REAL ANALYSIS LECTURE NOTES. Contents. 1. Sets Functions Countability Axiom of choice Equivalence relations 9 MAT 570 REAL ANALYSIS LECTURE NOTES PROFESSOR: JOHN QUIGG SEMESTER: FALL 204 Contents. Sets 2 2. Functions 5 3. Countability 7 4. Axiom of choice 8 5. Equivalence relations 9 6. Real numbers 9 7. Extended

More information

Introduction to Real Analysis Alternative Chapter 1

Introduction to Real Analysis Alternative Chapter 1 Christopher Heil Introduction to Real Analysis Alternative Chapter 1 A Primer on Norms and Banach Spaces Last Updated: March 10, 2018 c 2018 by Christopher Heil Chapter 1 A Primer on Norms and Banach Spaces

More information

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set

Analysis Finite and Infinite Sets The Real Numbers The Cantor Set Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered

More information

arxiv: v2 [math.st] 23 May 2017

arxiv: v2 [math.st] 23 May 2017 SPARSITY-PROMOTING AND EDGE-PRESERVING MAXIMUM A POSTERIORI ESTIMATORS IN NON-PARAMETRIC BAYESIAN INVERSE PROBLEMS arxiv:705.03286v2 [math.st] 23 May 207 By Sergios Agapiou Martin Burger Masoumeh Dashti

More information

g 2 (x) (1/3)M 1 = (1/3)(2/3)M.

g 2 (x) (1/3)M 1 = (1/3)(2/3)M. COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is

More information

Continuity of convex functions in normed spaces

Continuity of convex functions in normed spaces Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Midterm 1. Every element of the set of functions is continuous

Midterm 1. Every element of the set of functions is continuous Econ 200 Mathematics for Economists Midterm Question.- Consider the set of functions F C(0, ) dened by { } F = f C(0, ) f(x) = ax b, a A R and b B R That is, F is a subset of the set of continuous functions

More information

Chapter 2 Metric Spaces

Chapter 2 Metric Spaces Chapter 2 Metric Spaces The purpose of this chapter is to present a summary of some basic properties of metric and topological spaces that play an important role in the main body of the book. 2.1 Metrics

More information

Set, functions and Euclidean space. Seungjin Han

Set, functions and Euclidean space. Seungjin Han Set, functions and Euclidean space Seungjin Han September, 2018 1 Some Basics LOGIC A is necessary for B : If B holds, then A holds. B A A B is the contraposition of B A. A is sufficient for B: If A holds,

More information

1. Let A R be a nonempty set that is bounded from above, and let a be the least upper bound of A. Show that there exists a sequence {a n } n N

1. Let A R be a nonempty set that is bounded from above, and let a be the least upper bound of A. Show that there exists a sequence {a n } n N Applied Analysis prelim July 15, 216, with solutions Solve 4 of the problems 1-5 and 2 of the problems 6-8. We will only grade the first 4 problems attempted from1-5 and the first 2 attempted from problems

More information

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3

1 Topology Definition of a topology Basis (Base) of a topology The subspace topology & the product topology on X Y 3 Index Page 1 Topology 2 1.1 Definition of a topology 2 1.2 Basis (Base) of a topology 2 1.3 The subspace topology & the product topology on X Y 3 1.4 Basic topology concepts: limit points, closed sets,

More information

Best approximations in normed vector spaces

Best approximations in normed vector spaces Best approximations in normed vector spaces Mike de Vries 5699703 a thesis submitted to the Department of Mathematics at Utrecht University in partial fulfillment of the requirements for the degree of

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

3 (Due ). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due ). Show that the open disk x 2 + y 2 < 1 is a countable union of planar elementary sets. Show that the closed disk x 2 + y 2 1 is a countable

More information

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy

Notions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.

More information

Continuous Functions on Metric Spaces

Continuous Functions on Metric Spaces Continuous Functions on Metric Spaces Math 201A, Fall 2016 1 Continuous functions Definition 1. Let (X, d X ) and (Y, d Y ) be metric spaces. A function f : X Y is continuous at a X if for every ɛ > 0

More information

CONTENTS. 4 Hausdorff Measure Introduction The Cantor Set Rectifiable Curves Cantor Set-Like Objects...

CONTENTS. 4 Hausdorff Measure Introduction The Cantor Set Rectifiable Curves Cantor Set-Like Objects... Contents 1 Functional Analysis 1 1.1 Hilbert Spaces................................... 1 1.1.1 Spectral Theorem............................. 4 1.2 Normed Vector Spaces.............................. 7 1.2.1

More information

1 Stochastic Dynamic Programming

1 Stochastic Dynamic Programming 1 Stochastic Dynamic Programming Formally, a stochastic dynamic program has the same components as a deterministic one; the only modification is to the state transition equation. When events in the future

More information

Topological properties

Topological properties CHAPTER 4 Topological properties 1. Connectedness Definitions and examples Basic properties Connected components Connected versus path connected, again 2. Compactness Definition and first examples Topological

More information

The Arzelà-Ascoli Theorem

The Arzelà-Ascoli Theorem John Nachbar Washington University March 27, 2016 The Arzelà-Ascoli Theorem The Arzelà-Ascoli Theorem gives sufficient conditions for compactness in certain function spaces. Among other things, it helps

More information

Introduction to Functional Analysis

Introduction to Functional Analysis Introduction to Functional Analysis Carnegie Mellon University, 21-640, Spring 2014 Acknowledgements These notes are based on the lecture course given by Irene Fonseca but may differ from the exact lecture

More information

McGill University Math 354: Honors Analysis 3

McGill University Math 354: Honors Analysis 3 Practice problems McGill University Math 354: Honors Analysis 3 not for credit Problem 1. Determine whether the family of F = {f n } functions f n (x) = x n is uniformly equicontinuous. 1st Solution: The

More information

MAA6617 COURSE NOTES SPRING 2014

MAA6617 COURSE NOTES SPRING 2014 MAA6617 COURSE NOTES SPRING 2014 19. Normed vector spaces Let X be a vector space over a field K (in this course we always have either K = R or K = C). Definition 19.1. A norm on X is a function : X K

More information

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32

Functional Analysis. Martin Brokate. 1 Normed Spaces 2. 2 Hilbert Spaces The Principle of Uniform Boundedness 32 Functional Analysis Martin Brokate Contents 1 Normed Spaces 2 2 Hilbert Spaces 2 3 The Principle of Uniform Boundedness 32 4 Extension, Reflexivity, Separation 37 5 Compact subsets of C and L p 46 6 Weak

More information

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping.

Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. Minimization Contents: 1. Minimization. 2. The theorem of Lions-Stampacchia for variational inequalities. 3. Γ -Convergence. 4. Duality mapping. 1 Minimization A Topological Result. Let S be a topological

More information

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε

(convex combination!). Use convexity of f and multiply by the common denominator to get. Interchanging the role of x and y, we obtain that f is ( 2M ε 1. Continuity of convex functions in normed spaces In this chapter, we consider continuity properties of real-valued convex functions defined on open convex sets in normed spaces. Recall that every infinitedimensional

More information

Robust error estimates for regularization and discretization of bang-bang control problems

Robust error estimates for regularization and discretization of bang-bang control problems Robust error estimates for regularization and discretization of bang-bang control problems Daniel Wachsmuth September 2, 205 Abstract We investigate the simultaneous regularization and discretization of

More information

arxiv: v1 [math.st] 26 Mar 2012

arxiv: v1 [math.st] 26 Mar 2012 POSTERIOR CONSISTENCY OF THE BAYESIAN APPROACH TO LINEAR ILL-POSED INVERSE PROBLEMS arxiv:103.5753v1 [math.st] 6 Mar 01 By Sergios Agapiou Stig Larsson and Andrew M. Stuart University of Warwick and Chalmers

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

Convex Optimization Notes

Convex Optimization Notes Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =

More information

Problem Set 2: Solutions Math 201A: Fall 2016

Problem Set 2: Solutions Math 201A: Fall 2016 Problem Set 2: s Math 201A: Fall 2016 Problem 1. (a) Prove that a closed subset of a complete metric space is complete. (b) Prove that a closed subset of a compact metric space is compact. (c) Prove that

More information

Your first day at work MATH 806 (Fall 2015)

Your first day at work MATH 806 (Fall 2015) Your first day at work MATH 806 (Fall 2015) 1. Let X be a set (with no particular algebraic structure). A function d : X X R is called a metric on X (and then X is called a metric space) when d satisfies

More information

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure?

2 (Bonus). Let A X consist of points (x, y) such that either x or y is a rational number. Is A measurable? What is its Lebesgue measure? MA 645-4A (Real Analysis), Dr. Chernov Homework assignment 1 (Due 9/5). Prove that every countable set A is measurable and µ(a) = 0. 2 (Bonus). Let A consist of points (x, y) such that either x or y is

More information

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets

FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES. 1. Compact Sets FUNCTIONAL ANALYSIS LECTURE NOTES: COMPACT SETS AND FINITE-DIMENSIONAL SPACES CHRISTOPHER HEIL 1. Compact Sets Definition 1.1 (Compact and Totally Bounded Sets). Let X be a metric space, and let E X be

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Real Analysis Problems

Real Analysis Problems Real Analysis Problems Cristian E. Gutiérrez September 14, 29 1 1 CONTINUITY 1 Continuity Problem 1.1 Let r n be the sequence of rational numbers and Prove that f(x) = 1. f is continuous on the irrationals.

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction Functional analysis can be seen as a natural extension of the real analysis to more general spaces. As an example we can think at the Heine - Borel theorem (closed and bounded is

More information

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3

Contents Ordered Fields... 2 Ordered sets and fields... 2 Construction of the Reals 1: Dedekind Cuts... 2 Metric Spaces... 3 Analysis Math Notes Study Guide Real Analysis Contents Ordered Fields 2 Ordered sets and fields 2 Construction of the Reals 1: Dedekind Cuts 2 Metric Spaces 3 Metric Spaces 3 Definitions 4 Separability

More information

Measurable functions are approximately nice, even if look terrible.

Measurable functions are approximately nice, even if look terrible. Tel Aviv University, 2015 Functions of real variables 74 7 Approximation 7a A terrible integrable function........... 74 7b Approximation of sets................ 76 7c Approximation of functions............

More information

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond

Measure Theory on Topological Spaces. Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond Measure Theory on Topological Spaces Course: Prof. Tony Dorlas 2010 Typset: Cathal Ormond May 22, 2011 Contents 1 Introduction 2 1.1 The Riemann Integral........................................ 2 1.2 Measurable..............................................

More information

16 1 Basic Facts from Functional Analysis and Banach Lattices

16 1 Basic Facts from Functional Analysis and Banach Lattices 16 1 Basic Facts from Functional Analysis and Banach Lattices 1.2.3 Banach Steinhaus Theorem Another fundamental theorem of functional analysis is the Banach Steinhaus theorem, or the Uniform Boundedness

More information

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space.

2) Let X be a compact space. Prove that the space C(X) of continuous real-valued functions is a complete metric space. University of Bergen General Functional Analysis Problems with solutions 6 ) Prove that is unique in any normed space. Solution of ) Let us suppose that there are 2 zeros and 2. Then = + 2 = 2 + = 2. 2)

More information

Functional Analysis F3/F4/NVP (2005) Homework assignment 3

Functional Analysis F3/F4/NVP (2005) Homework assignment 3 Functional Analysis F3/F4/NVP (005 Homework assignment 3 All students should solve the following problems: 1. Section 4.8: Problem 8.. Section 4.9: Problem 4. 3. Let T : l l be the operator defined by

More information

Course 212: Academic Year Section 1: Metric Spaces

Course 212: Academic Year Section 1: Metric Spaces Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........

More information

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers.

2. Dual space is essential for the concept of gradient which, in turn, leads to the variational analysis of Lagrange multipliers. Chapter 3 Duality in Banach Space Modern optimization theory largely centers around the interplay of a normed vector space and its corresponding dual. The notion of duality is important for the following

More information

3 Integration and Expectation

3 Integration and Expectation 3 Integration and Expectation 3.1 Construction of the Lebesgue Integral Let (, F, µ) be a measure space (not necessarily a probability space). Our objective will be to define the Lebesgue integral R fdµ

More information

Measure and Integration: Solutions of CW2

Measure and Integration: Solutions of CW2 Measure and Integration: s of CW2 Fall 206 [G. Holzegel] December 9, 206 Problem of Sheet 5 a) Left (f n ) and (g n ) be sequences of integrable functions with f n (x) f (x) and g n (x) g (x) for almost

More information

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS

ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS Bendikov, A. and Saloff-Coste, L. Osaka J. Math. 4 (5), 677 7 ON THE REGULARITY OF SAMPLE PATHS OF SUB-ELLIPTIC DIFFUSIONS ON MANIFOLDS ALEXANDER BENDIKOV and LAURENT SALOFF-COSTE (Received March 4, 4)

More information

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan

4 Hilbert spaces. The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan The proof of the Hilbert basis theorem is not mathematics, it is theology. Camille Jordan Wir müssen wissen, wir werden wissen. David Hilbert We now continue to study a special class of Banach spaces,

More information

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA address:

Topology. Xiaolong Han. Department of Mathematics, California State University, Northridge, CA 91330, USA  address: Topology Xiaolong Han Department of Mathematics, California State University, Northridge, CA 91330, USA E-mail address: Xiaolong.Han@csun.edu Remark. You are entitled to a reward of 1 point toward a homework

More information

Analysis Comprehensive Exam Questions Fall 2008

Analysis Comprehensive Exam Questions Fall 2008 Analysis Comprehensive xam Questions Fall 28. (a) Let R be measurable with finite Lebesgue measure. Suppose that {f n } n N is a bounded sequence in L 2 () and there exists a function f such that f n (x)

More information

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms

08a. Operators on Hilbert spaces. 1. Boundedness, continuity, operator norms (February 24, 2017) 08a. Operators on Hilbert spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 2016-17/08a-ops

More information

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989),

1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer 11(2) (1989), Real Analysis 2, Math 651, Spring 2005 April 26, 2005 1 Real Analysis 2, Math 651, Spring 2005 Krzysztof Chris Ciesielski 1/12/05: sec 3.1 and my article: How good is the Lebesgue measure?, Math. Intelligencer

More information

Immerse Metric Space Homework

Immerse Metric Space Homework Immerse Metric Space Homework (Exercises -2). In R n, define d(x, y) = x y +... + x n y n. Show that d is a metric that induces the usual topology. Sketch the basis elements when n = 2. Solution: Steps

More information

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011

Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Economics 204 Summer/Fall 2011 Lecture 5 Friday July 29, 2011 Section 2.6 (cont.) Properties of Real Functions Here we first study properties of functions from R to R, making use of the additional structure

More information

Problem 1: Compactness (12 points, 2 points each)

Problem 1: Compactness (12 points, 2 points each) Final exam Selected Solutions APPM 5440 Fall 2014 Applied Analysis Date: Tuesday, Dec. 15 2014, 10:30 AM to 1 PM You may assume all vector spaces are over the real field unless otherwise specified. Your

More information

ANALYSIS QUALIFYING EXAM FALL 2017: SOLUTIONS. 1 cos(nx) lim. n 2 x 2. g n (x) = 1 cos(nx) n 2 x 2. x 2.

ANALYSIS QUALIFYING EXAM FALL 2017: SOLUTIONS. 1 cos(nx) lim. n 2 x 2. g n (x) = 1 cos(nx) n 2 x 2. x 2. ANALYSIS QUALIFYING EXAM FALL 27: SOLUTIONS Problem. Determine, with justification, the it cos(nx) n 2 x 2 dx. Solution. For an integer n >, define g n : (, ) R by Also define g : (, ) R by g(x) = g n

More information

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem

Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem 56 Chapter 7 Locally convex spaces, the hyperplane separation theorem, and the Krein-Milman theorem Recall that C(X) is not a normed linear space when X is not compact. On the other hand we could use semi

More information

Chapter 3: Baire category and open mapping theorems

Chapter 3: Baire category and open mapping theorems MA3421 2016 17 Chapter 3: Baire category and open mapping theorems A number of the major results rely on completeness via the Baire category theorem. 3.1 The Baire category theorem 3.1.1 Definition. A

More information

7 Complete metric spaces and function spaces

7 Complete metric spaces and function spaces 7 Complete metric spaces and function spaces 7.1 Completeness Let (X, d) be a metric space. Definition 7.1. A sequence (x n ) n N in X is a Cauchy sequence if for any ɛ > 0, there is N N such that n, m

More information

( f ^ M _ M 0 )dµ (5.1)

( f ^ M _ M 0 )dµ (5.1) 47 5. LEBESGUE INTEGRAL: GENERAL CASE Although the Lebesgue integral defined in the previous chapter is in many ways much better behaved than the Riemann integral, it shares its restriction to bounded

More information

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s.

If Y and Y 0 satisfy (1-2), then Y = Y 0 a.s. 20 6. CONDITIONAL EXPECTATION Having discussed at length the limit theory for sums of independent random variables we will now move on to deal with dependent random variables. An important tool in this

More information

Week 5 Lectures 13-15

Week 5 Lectures 13-15 Week 5 Lectures 13-15 Lecture 13 Definition 29 Let Y be a subset X. A subset A Y is open in Y if there exists an open set U in X such that A = U Y. It is not difficult to show that the collection of all

More information

Examples of Dual Spaces from Measure Theory

Examples of Dual Spaces from Measure Theory Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an

More information

L p Spaces and Convexity

L p Spaces and Convexity L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function

More information

Normed and Banach Spaces

Normed and Banach Spaces (August 30, 2005) Normed and Banach Spaces Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ We have seen that many interesting spaces of functions have natural structures of Banach spaces:

More information

ANALYSIS QUALIFYING EXAM FALL 2016: SOLUTIONS. = lim. F n

ANALYSIS QUALIFYING EXAM FALL 2016: SOLUTIONS. = lim. F n ANALYSIS QUALIFYING EXAM FALL 206: SOLUTIONS Problem. Let m be Lebesgue measure on R. For a subset E R and r (0, ), define E r = { x R: dist(x, E) < r}. Let E R be compact. Prove that m(e) = lim m(e /n).

More information

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1.

From now on, we will represent a metric space with (X, d). Here are some examples: i=1 (x i y i ) p ) 1 p, p 1. Chapter 1 Metric spaces 1.1 Metric and convergence We will begin with some basic concepts. Definition 1.1. (Metric space) Metric space is a set X, with a metric satisfying: 1. d(x, y) 0, d(x, y) = 0 x

More information

FUNDAMENTALS OF REAL ANALYSIS by. II.1. Prelude. Recall that the Riemann integral of a real-valued function f on an interval [a, b] is defined as

FUNDAMENTALS OF REAL ANALYSIS by. II.1. Prelude. Recall that the Riemann integral of a real-valued function f on an interval [a, b] is defined as FUNDAMENTALS OF REAL ANALYSIS by Doğan Çömez II. MEASURES AND MEASURE SPACES II.1. Prelude Recall that the Riemann integral of a real-valued function f on an interval [a, b] is defined as b n f(xdx :=

More information

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define

II - REAL ANALYSIS. This property gives us a way to extend the notion of content to finite unions of rectangles: we define 1 Measures 1.1 Jordan content in R N II - REAL ANALYSIS Let I be an interval in R. Then its 1-content is defined as c 1 (I) := b a if I is bounded with endpoints a, b. If I is unbounded, we define c 1

More information

The small ball property in Banach spaces (quantitative results)

The small ball property in Banach spaces (quantitative results) The small ball property in Banach spaces (quantitative results) Ehrhard Behrends Abstract A metric space (M, d) is said to have the small ball property (sbp) if for every ε 0 > 0 there exists a sequence

More information

A LITTLE REAL ANALYSIS AND TOPOLOGY

A LITTLE REAL ANALYSIS AND TOPOLOGY A LITTLE REAL ANALYSIS AND TOPOLOGY 1. NOTATION Before we begin some notational definitions are useful. (1) Z = {, 3, 2, 1, 0, 1, 2, 3, }is the set of integers. (2) Q = { a b : aεz, bεz {0}} is the set

More information

Measures. Chapter Some prerequisites. 1.2 Introduction

Measures. Chapter Some prerequisites. 1.2 Introduction Lecture notes Course Analysis for PhD students Uppsala University, Spring 2018 Rostyslav Kozhan Chapter 1 Measures 1.1 Some prerequisites I will follow closely the textbook Real analysis: Modern Techniques

More information

Exercise 1. Let f be a nonnegative measurable function. Show that. where ϕ is taken over all simple functions with ϕ f. k 1.

Exercise 1. Let f be a nonnegative measurable function. Show that. where ϕ is taken over all simple functions with ϕ f. k 1. Real Variables, Fall 2014 Problem set 3 Solution suggestions xercise 1. Let f be a nonnegative measurable function. Show that f = sup ϕ, where ϕ is taken over all simple functions with ϕ f. For each n

More information

Tools from Lebesgue integration

Tools from Lebesgue integration Tools from Lebesgue integration E.P. van den Ban Fall 2005 Introduction In these notes we describe some of the basic tools from the theory of Lebesgue integration. Definitions and results will be given

More information

B. Appendix B. Topological vector spaces

B. Appendix B. Topological vector spaces B.1 B. Appendix B. Topological vector spaces B.1. Fréchet spaces. In this appendix we go through the definition of Fréchet spaces and their inductive limits, such as they are used for definitions of function

More information

Mathematics for Economists

Mathematics for Economists Mathematics for Economists Victor Filipe Sao Paulo School of Economics FGV Metric Spaces: Basic Definitions Victor Filipe (EESP/FGV) Mathematics for Economists Jan.-Feb. 2017 1 / 34 Definitions and Examples

More information

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION

PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION PARTIAL REGULARITY OF BRENIER SOLUTIONS OF THE MONGE-AMPÈRE EQUATION ALESSIO FIGALLI AND YOUNG-HEON KIM Abstract. Given Ω, Λ R n two bounded open sets, and f and g two probability densities concentrated

More information

The Heine-Borel and Arzela-Ascoli Theorems

The Heine-Borel and Arzela-Ascoli Theorems The Heine-Borel and Arzela-Ascoli Theorems David Jekel October 29, 2016 This paper explains two important results about compactness, the Heine- Borel theorem and the Arzela-Ascoli theorem. We prove them

More information

01. Review of metric spaces and point-set topology. 1. Euclidean spaces

01. Review of metric spaces and point-set topology. 1. Euclidean spaces (October 3, 017) 01. Review of metric spaces and point-set topology Paul Garrett garrett@math.umn.edu http://www.math.umn.edu/ garrett/ [This document is http://www.math.umn.edu/ garrett/m/real/notes 017-18/01

More information

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Continuity. Matt Rosenzweig

Continuity. Matt Rosenzweig Continuity Matt Rosenzweig Contents 1 Continuity 1 1.1 Rudin Chapter 4 Exercises........................................ 1 1.1.1 Exercise 1............................................. 1 1.1.2 Exercise

More information

Extreme points of compact convex sets

Extreme points of compact convex sets Extreme points of compact convex sets In this chapter, we are going to show that compact convex sets are determined by a proper subset, the set of its extreme points. Let us start with the main definition.

More information

Spectral theory for compact operators on Banach spaces

Spectral theory for compact operators on Banach spaces 68 Chapter 9 Spectral theory for compact operators on Banach spaces Recall that a subset S of a metric space X is precompact if its closure is compact, or equivalently every sequence contains a Cauchy

More information

2 Sequences, Continuity, and Limits

2 Sequences, Continuity, and Limits 2 Sequences, Continuity, and Limits In this chapter, we introduce the fundamental notions of continuity and limit of a real-valued function of two variables. As in ACICARA, the definitions as well as proofs

More information

JUHA KINNUNEN. Harmonic Analysis

JUHA KINNUNEN. Harmonic Analysis JUHA KINNUNEN Harmonic Analysis Department of Mathematics and Systems Analysis, Aalto University 27 Contents Calderón-Zygmund decomposition. Dyadic subcubes of a cube.........................2 Dyadic cubes

More information