Functional Bregman Divergence and Bayesian Estimation of Distributions
|
|
- Chad Baker
- 5 years ago
- Views:
Transcription
1 FUNCTIONAL BRGAN DIVRGNC Functional Breman Diverence and Bayesian stimation of Distributions B. A. Friyik, S. Srivastava, and. R. Gupta Abstract A class of distortions termed functional Breman diverences is defined, which includes squared error and relative entropy. A functional Breman diverence acts on functions or distributions, and eneralizes the standard Breman diverence for vectors and a previous pointwise Breman diverence that was defined for functions. A recently published result showed that the mean minimizes the expected Breman diverence. The new functional definition enables the extension of this result to the continuous case to show that the mean minimizes the expected functional Breman diverence over a set of functions or distributions. It is shown how this theorem applies to the Bayesian estimation of distributions. stimation of the uniform distribution from independent and identically drawn samples is used as a case study. Index Terms Breman diverence, Bayesian estimation, uniform distribution, learnin BRGAN diverences are a useful set of distortion functions that include squared error, relative entropy, loistic loss, ahalanobis distance, and the Itakura-Saito function. Breman diverences are popular in statistical estimation and information theory. Analysis usin the concept of Breman diverences has played a key role in recent advances in statistical learnin [] [0], clusterin [], [2], inverse problems [3], maximum entropy estimation [4], and the applicability of the data processin theorem [5]. Recently, it was discovered that the mean is the minimizer of the expected Breman diverence for a set of d-dimensional points [], [6]. In this paper we define a functional Breman diverence that applies to functions and distributions, and we show that this new definition is equivalent to Breman diverence applied to vectors. The functional definition eneralizes a pointwise Breman diverence that has been previously defined for measurable functions [7], [7], and thus extends the class of distortion functions that are Breman diverences; see Section I-A.2 for an example. ost importantly, the functional definition enables one to solve functional minimization problems usin standard methods from the calculus of variations; we extend the recent result on the expectation of vector Breman diverence [], [6] to show that the mean minimizes the expected Breman diverence for a set of functions or distributions. We show how this theorem links to Bayesian estimation of distributions. For distributions from the exponential family distributions, many popular diverences, such B. A. Friyik is with the Department of athematics, Purdue University, West Lafayette, IN ( bfriyik@math.purdue.edu). S. Srivastava is with the Department of Applied athematics, University of Washinton, Seattle, WA 9895 ( santosh@amath.washinton.edu).. R. Gupta is with the Department of lectrical nineerin, University of Washinton, Seattle, WA 9895 ( upta@ee.washinton.edu). This author s work was supported in part by the United States Office of Naval Research. as relative entropy, can be expressed as a (different) Breman diverence on the exponential distribution parameters. The functional Breman definition enables stroner results and a more eneral application. In Section we state a functional definition of the Breman diverence and ive examples for total squared difference, relative entropy, and squared bias. In later subsections, the relationship between the functional definition and previous Breman definitions is established, and properties are noted. Then in Section 2 we present the main theorem: that the expectation of a set of functions minimizes the expected Breman diverence. We discuss the application of this theorem to Bayesian estimation, and as a case study compare different estimates for the uniform distribution iven independent and identically drawn samples. Proofs are in the appendix. Readers who are not familiar with functional derivatives may find helpful our short introduction to functional derivatives [8] or the text by Gelfand and Fomin [9]. I. FUNCTIONAL BRGAN DIVRGNC Let ( R d, Ω, ν ) be a measure space, where ν is a Borel measure and d is a positive inteer. Let φ be a real functional over the normed space L p (ν) for p. Recall that the bounded linear functional δφ[f; ] is the Fréchet derivative of φ at f L p (ν) if φ[f + a] φ[f] φ[f; a] δφ[f; a] + ɛ[f, a] a L p (ν) () for all a L p (ν), with ɛ[f, a] 0 as a L p (ν) 0 [9]. Then iven an appropriate functional φ, a functional Breman diverence can be defined: Definition I. (Functional Definition of Breman Diverence). Let φ : L p (ν) R be a strictly convex, twice-continuously Fréchet-differentiable functional. The Breman diverence d φ : L p (ν) L p (ν) [0, ) is defined for all admissible f, L p (ν) as d φ [f, ] φ[f] φ[] δφ[; f ], (2) where δφ[; ] is the Fréchet derivative of φ at. Here, we have used the Fréchet derivative, but the definition (and results in this paper) can be easily extended usin other definitions of functional derivatives; a sample extension is iven in Section I-A.3.
2 Report Documentation Pae Form Approved OB No Public reportin burden for the collection of information is estimated to averae hour per response, includin the time for reviewin instructions, searchin existin data sources, atherin and maintainin the data needed, and completin and reviewin the collection of information. Send comments reardin this burden estimate or any other aspect of this collection of information, includin suestions for reducin this burden, to Washinton Headquarters Services, Directorate for Information Operations and Reports, 25 Jefferson Davis Hihway, Suite 204, Arlinton VA Respondents should be aware that notwithstandin any other provision of law, no person shall be subject to a penalty for failin to comply with a collection of information if it does not display a currently valid OB control number.. RPORT DAT RPORT TYP 3. DATS COVRD to TITL AND SUBTITL Functional Breman Diverence and Bayesian stimation of Distributions 5a. CONTRACT NUBR 5b. GRANT NUBR 5c. PROGRA LNT NUBR 6. AUTHOR(S) 5d. PROJCT NUBR 5e. TASK NUBR 5f. WORK UNIT NUBR 7. PRFORING ORGANIZATION NA(S) AND ADDRSS(S) University of Washinton,Department of lectrical nineerin,seattle,wa, PRFORING ORGANIZATION RPORT NUBR 9. SPONSORING/ONITORING AGNCY NA(S) AND ADDRSS(S) 0. SPONSOR/ONITOR S ACRONY(S) 2. DISTRIBUTION/AVAILABILITY STATNT Approved for public release; distribution unlimited 3. SUPPLNTARY NOTS. SPONSOR/ONITOR S RPORT NUBR(S) 4. ABSTRACT A class of distortions termed functional Breman diverences is defined, which includes squared error and relative entropy. A functional Breman diverence acts on functions or distributions, and eneralizes the standard Breman diverence for vectors and a previous pointwise Breman diverence that was defined for functions. A recently published result showed that the mean minimizes the expected Breman diverence. The new functional definition enables the extension of this result to the continuous case to show that the mean minimizes the expected functional Breman diverence over a set of functions or distributions. It is shown how this theorem applies to the Bayesian estimation of distributions. stimation of the uniform distribution from independent and identically drawn samples is used as a case study. 5. SUBJCT TRS 6. SCURITY CLASSIFICATION OF: 7. LIITATION OF ABSTRACT a. RPORT unclassified b. ABSTRACT unclassified c. THIS PAG unclassified Same as Report (SAR) 8. NUBR OF PAGS 0 9a. NA OF RSPONSIBL PRSON Standard Form 298 (Rev. 8-98) Prescribed by ANSI Std Z39-8
3 2 FUNCTIONAL BRGAN DIVRGNC A. xamples Different choices of the functional φ lead to different Breman diverences. Illustrative examples are iven for squared error, squared bias, and relative entropy. Functionals for other Breman diverences can be derived based on these examples, from the example functions for the discrete case iven in Table of [6], and from the fact that φ is a strictly convex functional if it has the form φ() φ((t))dt where φ : R R, φ is strictly convex and is in some well-defined vector space of functions [20]. ) Total Squared Difference: Let φ[] 2 dν, where φ : L 2 (ν) R, and let, f, a L 2 (ν). Then φ[ + a] φ[] ( + a) 2 dν 2 dν 2 adν + a 2 dν. Because a 2 dν a L 2 (ν) as a 0 in L 2 (ν), a 2 L 2 (ν) a L a 2 (ν) 0 L 2 (ν) δφ[; a] 2 adν, which is a continuous linear functional in a. To show that φ is strictly convex we show that φ is stronly positive. When the second variation δ 2 φ and the third variation δ 3 φ exist, they are described by φ[f; a] δφ[f; a] + 2 δ2 φ[f; a, a] + ɛ[f, a] a 2 L p (ν) (3) δφ[f; a] + 2 δ2 φ[f; a, a] + 6 δ3 φ[f; a, a, a] + ɛ[f, a] a 3 L p (ν), where ɛ[f, a] 0 as a L p (ν) 0. The quadratic functional δ 2 φ[f; a, a] defined on normed linear space L p (ν) is stronly positive if there exists a constant k > 0 such that δ 2 φ[f; a, a] k a 2 L p (ν) for all a L2 (ν). By definition of the second Fréchet derivative, δ 2 φ[; b, a] δφ[ + b; a] δφ[; a] 2 ( + b)adν 2 adν 2 badν. Thus δ 2 φ[; b, a] is a quadratic form, where δ 2 φ is actually independent of and stronly positive since δ 2 φ[; a, a] 2 a 2 dν 2 a 2 L 2 (ν) for all a L 2 (ν), which implies that φ is strictly convex and d φ [f, ] f 2 dν 2 dν 2 (f )dν (f ) 2 dν f 2 L 2 (ν). 2) Squared Bias: Under definition (2), squared bias is a Breman diverence, this we have not previously seen noted in the literature despite the importance of minimizin bias in estimation [2]. Let φ[] ( dν ) 2, where φ : L (ν) R. In this case φ[ + a] φ[] ( 2 dν + dν 2 ( adν) ( adν + ) 2 dν adν) 2. (4) Note that 2 dν adν is a continuous linear functional on L (ν) and ( adν ) 2 a 2 L (ν), so that 0 ( adν ) 2 a L (ν) a 2 L (ν) a L a (ν). L (ν) Thus from (4) and the definition of the Fréchet derivative, δφ[; a] 2 dν adν. By the definition of the second Fréchet derivative, δ 2 φ[; b, a] δφ[ + b; a] δφ[; a] 2 ( + b)dν adν 2 2 bdν adν dν is another quadratic form, and δ 2 φ is independent of. Then δ 2 φ[; a, a] is stronly positive because, ( δ 2 φ[; a, a] 2 ) 2 adν 2 a 2 L (ν) 0 adν for a L (ν), and thus φ is in fact strictly convex. The Breman diverence is thus d φ [f, ] ( ( ( 2 ( fdν) 2 ( fdν) + ) 2 (f )dν f 2 L (ν). 2 dν) 2 2 dν) 2 dν (f )dν dν fdν
4 FRIGYIK, SRIVASTAVA, GUPTA 3 3) Relative ntropy of Simple Functions: Denote by S the collection of all interable simple functions on the measure space (R d, Ω, ν), that is, the set of functions which can be written as a finite linear combination of indicator functions such that if S, can be expressed, (x) t α i I Ti ; α 0 0, i0 where I Ti is the indicator function of the set T i, {T i } t i is a collection of mutually disjoint sets of finite measure and T 0 R d \ t i T i. We adopt the convention that T 0 is the set on which is zero and therefore α i 0 if i 0. Consider the normed vector space (S, L (ν)) and let W be the subset (not necessarily a vector subspace) of nonneative functions in this normed space: If W then R d ln dν W { S subject to 0}. t i T i α i ln α i dν t α i ln α i ν(t i ), (5) i since 0 ln 0 0. Define the functional φ on W, φ[] ln dν, W. (6) R d The functional φ is not Fréchet-differentiable at because in eneral it cannot be uaranteed that + h is non-neative on the set where 0 for all perturbin functions h in the underlyin normed vector space ( S, L (ν)) with norm smaller than any prescribed ɛ > 0. However, a eneralized Gâteaux derivative can be defined if we limit the perturbin function h to a vector subspace. To that end, let G be the subspace of ( S, L (ν)) defined by G {f S subject to f dν dν}. It is straihtforward to show that G is a vector space. We define the eneralized Gâteaux derivative of φ at W to be the linear operator δ G φ[; ] if φ[ + h] φ[] δ G φ[; h] lim 0. (7) h L (ν) 0 h L (ν) h G Note, that δ G φ[; ] is not linear in eneral, but it is on the vector space G. In eneral, if G is the entire underlyin vector space then (7) is the Fréchet derivative, and if G is the span of only one element from the underlyin vector space then (7) is the Gâteaux derivative. Here, we have eneralized the Gâteaux derivative for the present case that G is a subspace of the underlyin vector space. It remains to be shown that iven the functional (6), the derivative (7) exists and yields a Breman diverence correspondin to the usual notion of relative entropy. Consider the possible solution δ G φ[; h] ( + ln )hdν, (8) R d which coupled with (6) does yield relative entropy. It remains to be shown only that (8) satisfies (7). Note that φ[ + h] φ[] δ G φ[; h] (h + ) ln h + hdν R d (h + ) ln h + hdν, (9) where is the set on which is not zero. Because W, there are m, > 0 such that m on. Let h G be such that h L (ν) m, then + h 0. Our oal is to show that the expression φ[ + h] φ[] δ G φ[; h] h L (ν) (0) is non-neative and that it is bounded above by a bound that oes to 0 as h L (ν) 0. We start by boundin the interand from above usin the inequality ln x x : (h + ) ln h + Then since h 2 / ( h L (ν)) 2 /m, φ[ + h] φ[] δ G φ[; h] h L (ν) h (h + ) h h h2. h 2 h L (ν) dν ν() m h L (ν). Since is interable ν() < and the riht hand side oes to 0 as h L (ν) 0. Next, in order to show that (0) is non-neative we have to prove that the interal (9) is not neative. To do so, we normalize the measure and apply Jensen s inequality. Take the first term of the interand of (9), (h + ) ln h + dν ( h + ln h + ) dν, h + L (ν) ln h + ( ) h + L (ν) λ d ν, L (ν) where the normalized measure d ν dν is a probability measure and λ(x) x ln x is a convex function on (0, ). L (ν) By Jensen s inequality and then chanin the measure back to dν
5 4 FUNCTIONAL BRGAN DIVRGNC dν, ( ) h + λ d ν ( ) h + L (ν)λ d ν ( ) + h L L (ν)λ (ν) L (ν) ( ) + h L + h L (ν) ln (ν) L (ν) ( + h L (ν) ) L (ν) + h L (ν) + h L (ν) L (ν) hdν, L (ν) where we used the fact that ln x x for all x > 0. By combinin these two latest results we find that (h + ) ln h + dν hdν, so equivalently (9) is always non-neative. This fact also confirms that the resultin relative entropy d φ [f, ] is always non-neative, because (9) is d φ [f, ] if one sets h f. Lastly, one must show that the functional defined in (6) is strictly convex. Aain we will show this by showin that the second variation of φ[] is stronly positive. Let f G and f L (ν) m. Usin the Taylor expansion of ln one can express, ( δ G φ[ + f; h] δ G φ[; h] h ln + f ) dν h f dν + ɛ[f, ] f L (ν), where ɛ[f, ] oes to 0 as f L (ν) 0 because Therefore and ɛ[f, ] L (ν) ν() h L (ν) 2 2 f L (ν). δgφ[; 2 h, f] δgφ[; 2 h, h] h f dν, h 2 dν h L (ν). Thus δg 2 φ[; h, h] is stronly positive. B. Relationship to Other Breman Diverence Definitions Two propositions establish the relationship of the functional Breman diverence to other Breman diverence definitions. Proposition I.2 (Functional Breman Diverence Generalizes Vector Breman Diverence). The functional definition (2) is a eneralization of the standard vector Breman diverence d φ(x, y) φ(x) φ(y) φ(y) T (x y), () where x, y R n, and φ : R n R is strictly convex and twice differentiable. Jones and Byrne describe a eneral class of diverences between functions usin a pointwise formulation [7]. Csiszár specialized the pointwise formulation to a class of diverences he termed Breman distances B s,ν [7], where iven a σ- finite measure space (, Ω, ν), and non-neative measurable functions f(x) and (x), B s,ν (f, ) equals s(f(x)) s((x)) s ((x))(f(x) (x))dν(x). (2) The function s : (0, ) R is constrained to be differentiable and strictly convex, and the limit lim x 0 s(x) and lim x 0 s (x) must exist, but not necessarily be finite. The function s plays a role similar to the function φ in the functional Breman diverence; however, s acts on the rane of the functions f,, whereas φ acts on the functions f,. Proposition I.3 (Functional Definition Generalizes Pointwise Definition). Given a pointwise Breman diverence as per (2), an equivalent functional Breman diverence can be defined as per (2) if the measure ν is finite. However, iven a functional Breman diverence d φ [f, ], there is not necessarily an equivalent pointwise Breman diverence. C. Properties of the Functional Breman Diverence The Breman diverence for random variables has some well-known properties, as reviewed in [, Appendix A]. Here, we note that the same properties hold for the functional Breman diverence (2). We ive complete proofs in [8].. Non-neativity: The functional Breman diverence is non-neative: d φ [f, ] 0 for all admissible inputs. 2. Convexity: The Breman diverence d φ [f, ] is always convex with respect to f. 3. Linearity: The functional Breman diverence is linear such that, d (c φ +c 2 φ 2 )[f, ] c d φ [f, ] + c 2 d φ2 [f, ]. 4. quivalence Classes: Partition the set of strictly convex, differentiable functionals {φ} into classes such that φ and φ 2 belon to the same class if d φ [f, ] d φ2 [f, ] for all f, A. For brevity we will denote d φ [f, ] simply by d φ. Let φ φ 2 denote that φ and φ 2 belon to the same class, then is an equivalence relation in that is satisfies the properties of reflexivity (d φ d φ ), symmetry (if d φ d φ2, then d φ2 d φ ), and transitivity (if d φ d φ2 and d φ2 d φ3, then d φ d φ3 ). Further, if φ φ 2, then they differ only by an affine transformation. 5. Linear Separation: The locus of admissible functions f L p (ν) that are equidistant from two fixed functions, 2 L p (ν) in terms of functional Breman diverence form a hyperplane.
6 FRIGYIK, SRIVASTAVA, GUPTA 5 6. Dual Diverence: Given a pair (, φ) where L p (ν) and φ is a strictly convex twice-continuously Fréchet-differentiable functional, then the function-functional pair (G, ψ) is the Leendre transform of (, φ) [9], if φ[] ψ[g] + (x)g(x)dν(x), (3) δφ[; a] G(x)a(x)dν(x), (4) where ψ is a strictly convex twice-continuously Fréchetdifferentiable functional, and G L q (ν), where p + q. Given Leendre transformation pairs f, L p (ν) and F, G L q (ν), d φ [f, ] d ψ [G, F ]. 7. Generalized Pythaorean Inequality: For any admissible f,, h L p (ν), d φ [f, h] d φ [f, ] + d φ [, h] + δφ[; f ] δφ[h; f ]. II. INIU PCTD BRGAN DIVRGNC Consider two sets of functions (or distributions), and A. Let F be a random function with realization f. Suppose there exists a probability distribution P F over the set, such that P F (f) is the probability of f. For example, consider the set of Gaussian distributions, and iven samples drawn independently and identically from a randomly selected Gaussian distribution N, the data imply a posterior probability P N (N ) for each possible eneratin realization of a Gaussian distribution N. The oal is to find the function A that minimizes the expected Breman diverence between the random function F and any function A. The followin theorem shows that if the set of possible minimizers A includes PF [F ], then PF [F ] minimizes the expectation of any Breman diverence. Note the theorem requires slihtly stroner conditions on φ than the definition of the Breman diverence (2) requires. Theorem II. (inimizer of the xpected Breman Diverence). Let δ 2 φ[f; a, a] be stronly positive and let φ C 3 (L (ν); R) be a three-times continuously Fréchetdifferentiable functional on L (ν). Let be a set of functions that lie on a manifold, and have associated measure d such that interation is well-defined. Suppose there is a probability distribution P F defined over the set. Let A be a set of functions that includes PF [F ] if it exists. Suppose the function minimizes the expected Breman diverence between the random function F and any function A such that ar inf A P F [d φ (F, )]. Then, if exists, it is iven by PF [F ]. (5) A. Bayesian stimation Theorem II. can be applied to a set of distributions to find the Bayesian estimate of a distribution iven a posterior or likelihood. For parametric distributions parameterized by θ R n, a probability measure Λ(θ), and some risk function R(θ, ψ), ψ R n, the Bayes estimator is defined [22] as ˆθ ar inf ψ R n R(θ, ψ)dλ(θ). (6) That is, the Bayes estimator minimizes some expected risk in terms of the parameters. It follows from recent results [6] that ˆθ [Θ] if the risk R is a Breman diverence, where Θ is the random variable whose realization is θ; this property has been previously noted [8], [0]. The principle of Bayesian estimation can be applied to the distributions themselves rather than to the parameters: ĝ ar inf R(f, )P F (f)d, (7) A where P F (f) is a probability measure on the distributions f, d is a measure for the manifold, and A is either the space of all distributions or a subset of the space of all distributions, such as the set. When the set A includes the distribution PF [F ] and the risk function R in (7) is a functional Breman diverence, then Theorem II. establishes that ĝ PF [F ]. For example, in recent work, two of the authors derived the mean class posterior distribution for each class for a Bayesian quadratic discriminant analysis classifier, and showed that the classification results were superior to parameter-based Bayesian quadratic discriminant analysis [23]. Of particular interest for estimation problems are the Breman diverence examples iven in Section I-A: total squared difference (mean squared error) is a popular risk function in reression [2]; minimizin relative entropy leads to useful theorems for lare deviations and other statistical subfields [24]; and analyzin bias is a common approach to characterizin and understandin statistical learnin alorithms [2]. B. Case Study: stimatin a Scaled Uniform Distribution As an illustration of the theorem, we present and compare different estimates of a scaled uniform distribution iven independent and identically drawn samples. Let the set of uniform distributions over [0, θ] for θ R + be denoted by U. Given independent and identically distributed samples, 2,..., n drawn from an unknown uniform distribution f U, the eneratin distribution is to be estimated. The risk function R is taken to be squared error or total squared error dependin on context. ) Bayesian Parameter stimate: Dependin on the choice of the probability measure Λ(θ), the interal (6) may not be finite; for example, usin the likelihood of θ with Lebesue measure the interal is not finite. A standard solution is to use a amma prior on θ and Lebesue measure. Let Θ be a random parameter with realization θ, let the amma distribution have parameters t and t 2, and denote the maximum of the data as
7 6 FUNCTIONAL BRGAN DIVRGNC max max{, 2,..., n }. Then a Bayesian estimate is formulated [22, p. 240, 285]: [Θ {, 2,..., n }, t, t 2 ] θ max e θ n+t + θt 2 dθ. (8) e θ n+t + θt 2 dθ max The interals can be expressed in terms of the chi-squared random variable I 2 v with v derees of freedom: [Θ {, 2,..., n }, t, t 2 ] t 2 (n + t a) P (χ 2 2(n+t ) < 2 t 2 max ) P (χ 2 2(n+t ) < 2 t 2 max ). (9) Note that (6) presupposes that the best solution is also a uniform distribution. 2) Bayesian Uniform Distribution stimate: If one restricts the minimizer of (7) to be a uniform distribution, then (7) is solved with A U. Because the set of uniform distributions does not enerally include its mean, Theorem II. does not apply, and thus different Breman diverences may ive different minimizers for (7). Let P F be the likelihood of the data (no prior is assumed over the set U), and use the Fisher information metric ( [25] [27]) for d. Then the solution to (7) is the uniform distribution on [0, 2 /n max ]. Usin Lebesue measure instead ives a similar result: [0, 2 /(n+/2) max ]. We were unable to find these estimates in the literature, and so their derivations are presented in the appendix. 3) Unrestricted Bayesian Distribution stimate: When the only restriction placed on the minimizer in (7) is that be a distribution, then one can apply Theorem II. and solve directly for the expected distribution PF [F ]. Let P F be the likelihood of the data (no prior is assumed over the set U), and use the Fisher information metric for d. Solvin (5), notin that the uniform probability of x is f(x) /a if x a and zero otherwise, and the likelihood of the n drawn points is (/ max ) n if a max and zero otherwise, ( ) ( ) ( da ) max(x, max) a a n a (x) max da a n a n ( max ) n. (20) (n + )[max(x, max )] n+ III. FURTHR DISCUSSION AND OPN QUSTIONS We have defined a eneral Breman diverence for functions and distributions that can provide a foundation for results in statistics, information theory and sinal processin. Theorem II. is important for these fields because it ties Breman diverences to expectation. As shown in Section II-A, Theorem II. can be directly applied to distributions to show that Bayesian distribution estimation simplifies to expectation when the risk function is a Breman diverence and the minimizin distribution is unrestricted. It is common in Bayesian estimation to interpret the prior as representin some actual prior knowlede, but in fact prior knowlede often is not available or is difficult to quantify. Another approach is to use a prior to capture coarse information from the data that may be used to stabilize the estimation [9], [23]. In practice, priors are sometimes chosen in Bayesian estimation to tame the tail of likelihood distributions so that expectations will exist when they miht otherwise be infinite [22]. This mathematically convenient use of priors adds estimation bias that may be unwarranted by prior knowlede. An alternative to mathematically convenient priors is to formulate the estimation problem as a minimization of an expected Breman diverence between the unknown distribution and the estimated distribution, and restrict the set of distributions that can be the minimizer to be a set for which the Bayesian interal exist. Open questions are how such restrictions tradeoff bias for reduced variance, and how to find or define an optimal restricted set of distributions for this estimation approach. Finally, there are some results for the standard vector Breman diverence that have not been extended here. It has been shown that a standard vector Breman diverence must be the risk function in order for the mean to be the minimizer of an expected risk [6, Theorems 3 and 4]. The proof of that result relies heavily on the discrete nature of the underlyin vectors, and it remains an open question as to whether a similar result holds for the functional Breman diverence. Another result that has been shown for the vector case but remains an open question in the functional case is converence in probability [6, Theorem 2]. ACKNOWLDGNTS This work was funded in part by the Office of Naval Research, Code 32, Grant # N The authors thank Inderjit Dhillon, Castedo llerman, and Galen Shorack for helpful discussions. A. Proof of Proposition I.2 APPNDI: PROOFS We ive a constructive proof that there is a correspondin functional Breman diverence d φ [f, ] for a specific choice of φ : L (ν) R, where ν n i δ c i and f, L (ν). Here, δ x denotes the Dirac measure such that all mass is concentrated at x, and {c, c 2,..., c n } is a collection of n distinct points in R d. For any x R n, define φ[f] φ(x, x 2,..., x n ), where f(c ) x, f(c 2 ) x 2,..., f(c n ) x n. Then the difference is φ[f; a] φ[f + a] φ[f] φ ((f + a)(c ),..., (f + a)(c n )) φ (x,..., x n ) φ (x + a(c ),..., x n + a(c n )) φ (x,..., x n ). Let a i be short hand for a(c i ), and use the Taylor expansion for functions of several variables to yield φ[f; a] φ(x,..., x n ) T (a,..., a n ) + ɛ[f, a] a L. Therefore, δφ[f; a] φ(x,..., x n ) T (a,..., a n ) φ(x) T a,
8 FRIGYIK, SRIVASTAVA, GUPTA 7 where x (x, x 2,..., x n ) and a (a,..., a n ). Thus, from (3), the functional Breman diverence definition (2) for φ is equivalent to the standard vector Breman diverence: d φ[f, ] φ[f] φ[] δφ[; f ] B. Proof of Proposition I.3 φ(x) φ(y) φ(y) T (x y). (2) First, we ive a constructive proof of the first part of the proposition by showin that iven a B s,ν, there is an equivalent functional diverence d φ. Then, the second part of the proposition is shown by example: we prove that the squared bias functional Breman diverence iven in Section I-A.2 is a functional Breman diverence that cannot be defined as a pointwise Breman diverence. Note that the interal to calculate B s,ν is not always finite. To ensure finite B s,ν, we explicitly constrain lim x 0 s (x) and lim x 0 s(x) to be finite. From the assumption that s is strictly convex, s must be continuous on (0, ). Recall from the assumptions that the measure ν is finite, and that the function s is differentiable on (0, ). Given a B s,ν, define the continuously differentiable function { s(x) x 0 s(x) s( x) + 2s(0) x < 0. Specify φ : L (ν) R as φ[f] Note that if f 0, φ[f] s(f(x))dν. s(f(x))dν. Because s is continuous on R, s(f) L (ν) whenever f L (ν), so the above interals always make sense. It remains to be shown that δφ[f; ] completes the equivalence when f 0. For h L (ν), φ[f + h] φ[f] s(f(x) + h(x))dν s(f(x))dν s(f(x) + h(x)) s(f(x))dν s (f(x))h(x) + ɛ (f(x), h(x)) h(x)dν s (f(x))h(x) + ɛ (f(x), h(x)) h(x)dν, where we used the fact that s(f(x) + h(x)) s(f(x)) + ( s (f(x)) + ɛ(f(x), h(x))) h(x) s(f(x)) + (s (f(x)) + ɛ(f(x), h(x))) h(x), because f 0. On the other hand, if h(x) 0 then ɛ(f(x), h(x)) 0, and if h(x) 0 then ɛ(f(x), h(x)) s(f(x) + h(x)) s(f(x)) h(x) + s (f(x)). Suppose {h n } L (ν) such that h n 0. Then there is a measurable set such that its complement is of measure 0 and h n 0 uniformly on. There is some N > 0 such that for any n > N, h n (x) ɛ for all x. Without loss of enerality, assume that there is some > 0 such that for all x, f(x). Since s is continuously differentiable, there is a K > 0 such that max{ s (t) subject to t [ ɛ, + ɛ]} K, and by the mean value theorem for almost all x. Then s(f(x) + h(x)) s(f(x)) h(x) ɛ(f(x), h(x)) 2K, K, except on a set of measure 0. The fact that h(x) 0 almost everywhere implies that ɛ(f(x), h(x)) 0 almost everywhere, and by Lebesue s dominated converence theorem, the correspondin interal oes to 0. As a result, the Fréchet derivative of φ is δφ[f; h] s (f(x))h(x)dν. (22) Thus the functional Breman diverence is equivalent to the iven pointwise B s,ν. We additionally note that the assumptions that f L (ν) and that the measure ν is finite are necessary for this proof. Counterexamples can be constructed if f L p or ν() such that the Fréchet derivative of φ does not obey (22). This concludes the first part of the proof. To show that the squared bias functional Breman diverence iven in Section I-A.2 is an example of a functional Breman diverence that cannot be defined as a pointwise Breman diverence we prove that the converse statement leads to a contradiction. Suppose (, Σ, ν) and (, Σ, µ) are measure spaces where ν is a non-zero σ-finite measure and that there is a differentiable function f : (0, ) R such that ( 2 ξdν) f(ξ)dµ, (23) where ξ L (ν). Let f(0) lim x 0 f(x), which can be finite or infinite, and let α be any real number. Then ( ) 2 ( ) 2 f(αξ)dµ αξdν α 2 ξdν α 2 f(ξ)dµ. Because ν is σ-finite, there is a measurable set such that 0 < ν() <. Let \ denote the complement of in. Then ( ) 2 α 2 ν 2 () α 2 I dν α 2 f(i )dµ α 2 f(0)dµ + α 2 f()dµ \ α 2 f(0)µ(\) + α 2 f()µ().
9 8 FUNCTIONAL BRGAN DIVRGNC Also, However, ( 2 αi dν) so one can conclude that ( 2 α 2 ν 2 () αi dν). f(αi )dµ f(αi )dµ + \ f(0)µ(\) + f(α)µ(); α 2 f(0)µ(\) + α 2 f()µ() f(αi )dµ f(0)µ(\) + f(α)µ(). (24) Apply equation (23) for ξ 0 to yield ( 2 0 0dν) f(0)dµ f(0)µ(). Since ν() > 0, µ() 0, so it must be that f(0) 0, and (24) becomes α 2 ν 2 () α 2 f()µ() f(α)µ() α R. The first equation implies that µ() 0. The second equation determines the function f completely: Then (23) becomes ( f(α) f()α 2. 2 ξdν) f()ξ 2 dµ. Consider any two disjoint measurable sets, and 2, with finite nonzero measure. Define ξ I and ξ 2 I 2. Then ξ ξ + ξ 2 and ξ ξ 2 I I 2 0. quation (23) becomes ξ dν ξ 2 dν f() ξ ξ 2 dµ. (25) This implies the followin contradiction: ξ dν ξ 2 dν ν( )ν( 2 ) 0, (26) but f() C. Proof of Theorem II. ξ ξ 2 dµ 0. (27) Recall that for a functional J to have an extremum (minimum) at f ˆf, it is necessary that δj[f; a] 0 and δ 2 J[f; a, a] 0, for f ˆf and for all admissible functions a A. A sufficient condition for a functional J[f] to have a minimum for f ˆf is that the first variation δj[f; a] must vanish for f ˆf, and its second variation δ 2 J[f; a, a] must be stronly positive for f ˆf. Let J[] PF [d φ (F, )] d φ [f, ]P F (f)d (φ[f] φ[] δφ[; f ])P F (f)d,(28) where (28) follows by substitutin the definition of Breman diverence (2). Consider the increment J[; a] J[ + a] J[] (29) (φ[ + a] φ[]) P F (f)d (δφ[ + a; f a] δφ[; f ]) P F (f)d, (30) where (30) follows from substitutin (28) into (29). Usin the definition of the differential of a functional iven in (), the first interand in (30) can be written as φ[ + a] φ[] δφ[; a] + ɛ[, a] a L (ν). (3) Take the second interand of (30), and subtract and add δφ[; f a], δφ[ + a; f a] δφ[; f ] δφ[ + a; f a] δφ[; f a] + δφ[; f a] δφ[; f ] (a) δ 2 φ[; f a, a] + ɛ[, a] a L (ν) + δφ[; f ] δφ[; a] δφ[; f ] (b) δ 2 φ[; f, a] δ 2 φ[; a, a] + ɛ[, a] a L (ν) δφ[; a] (32) where (a) follows from the linearity of the third term, and (b) follows from the linearity of the first term. Substitute (3) and (32) into (30), ( J[; a] δ 2 φ[; f, a] δ 2 φ[; a, a] ) + ɛ[, a] a L (ν) P F (f)d. Note that the term δ 2 φ[; a, a] is of order a 2 L (ν), that is, δ 2 φ[; a, a] m L (ν) a 2 L (ν) for some constant m. Therefore, where, J[ + a] J[] δj[; a] L lim (ν) 0, a L (ν) 0 a L (ν) δj[; a] δ 2 φ[; f, a]p F (f)d. (33) For fixed a, δ 2 φ[;, a] is a bounded linear functional in the second arument, so the interation and the functional can be interchaned in (33), which becomes [ ] δj[; a] δ 2 φ ; (f ) P F (f)d, a.
10 FRIGYIK, SRIVASTAVA, GUPTA 9 Usin the functional optimality conditions, J[] has an extremum for ĝ if [ ] δ 2 φ ĝ; (f ĝ) P F (f)d, a 0. (34) Set a (f ĝ) P (f)d in (34) and use the assumption that the quadratic functional δ 2 φ[; a, a] is stronly positive, which implies that the above functional can be zero if and only if a 0, that is, 0 (f ĝ)p F (f)d, (35) ĝ PF [F ], (36) where the last line holds if the expectation exists (i.e. if the measure is well-defined and the expectation is finite). Because a Breman diverence is not necessarily convex in its second arument, it is not yet established that the above unique extremum is a minimum. To see that (36) is in fact a minimum of J[], from the functional optimality conditions it is enouh to show that δ 2 J[ĝ; a, a] is stronly positive. To show this, for b L (ν), consider δj[ + b; a] δj[; a] (c) ( δ 2 φ[ + b; f b, a] δ 2 φ[; f, a] ) P F (f)d (d) ( δ 2 φ[ + b; f b, a] δ 2 φ[; f b, a] + δ 2 φ[; f b, a] δ 2 φ[; f, a] ) P F (f)d (e) ( δ 3 φ[; f b, a, b] + ɛ[, a, b] b L (ν) + δ 2 φ[; f, a] δ 2 φ[; b, a] δ 2 φ[; f, a] ) P F (f)d (f) ( δ 3 φ[; f, a, b] δ 3 φ[; b, a, b] + ɛ[, a, b] b L (ν) δ2 φ[; b, a] ) P F (f)d, (37) where (c) follows from usin interal (33); (d) from subtractin and addin δ 2 φ[; f b, a]; (e) from the fact that the variation of the second variation of φ is the third variation of φ [28]; and (f) from the linearity of the first term and cancellation of the third and fifth terms. Note that in (37) for fixed a, the term δ 3 φ[; b, a, b] is of order b 2 L (ν), while the first and the last terms are of order b L (ν). Therefore, δj[ + b; a] δj[; a] δ 2 J[; a, b] L lim (ν) b L (ν) 0 b L (ν) 0, where δ 2 J[; a, b] δ 3 φ[; f, a, b]p F (f)d + δ 2 φ[; a, b]p F (f)d. (38) Substitute b a, ĝ and interchane interation and the continuous functional δ 3 φ in the first interal of (38), then [ ] δ 2 J[ĝ; a, a] δ 3 φ ĝ; (f ĝ)p F (f)d, a, a + δ 2 φ[ĝ; a, a]p F (f)d δ 2 φ[ĝ; a, a]p F (f)d (39) k a 2 L (ν) P F (f)d k a 2 L (ν) > 0, (40) where (39) follows from (35), and (40) follows from the stron positivity of δ 2 φ[ĝ; a, a]. Therefore, from (40) and the functional optimality conditions, ĝ is the minimum. D. Derivation of the Bayesian Distribution-based Uniform stimate Restricted to a Uniform inimizer Let f(x) /a for all 0 x a and (x) /b for all 0 x b. Assume at first that b > a; then the total squared difference between f and is x (f(x) (x)) 2 dx a ( a ) 2 + (b a) b b a ab b a, ab ( ) 2 b where the last line does not require the assumption that b > a. In this case, the interal (7) is over the one-dimensional manifold of uniform distributions U; a Riemannian metric can be formed by usin the differential arc element to convert Lebesue measure on the set U to a measure on the set of parameters a such that (7) is re-formulated in terms of the parameters for ease of calculation: b ar min b R + b a a max ab a n df da da, (4) 2 where /a n is the likelihood of the n data points bein drawn from a uniform distribution [0, a], and the estimated distribution is uniform on [0, b ]. The differential arc element can be calculated by expandin df/da in terms of df da 2 the Haar orthonormal basis { a, φ jk (x)}, which forms a complete orthonormal basis for the interval 0 x a, and then the required norm is equivalent to the norm of the basis coefficients of the orthonormal expansion: df da. (42) 2 a3/2 For estimation problems, the measure determined by the Fisher information metric may be more appropriate than Lebesue measure [25] [27]. Then d I(a) 2 da, (43) where I is the Fisher information matrix. For the onedimensional manifold formed by the set of scaled uniform
11 0 FUNCTIONAL BRGAN DIVRGNC distributions U, the Fisher information matrix is ( d I(a) [ 2 lo )] a da 2 a 0 a 2 a dx a 2, so that the differential element is d da a. We solve (7) usin the Lebesue measure (42); the solution with the Fisher differential element follows the same loic. Then (4) is equivalent to ar min b J(b) b b a a max ab a n+3/2 da b a da a max ab a + a b da n+3/2 b ab a n+3/2 2 (n + /2)(n + 3/2)b n+3/2 b(n + 2 )n+ 2 max + (n + 3/2) n+3/2 max The minimum is found by settin the first derivative to zero: J 2 (n + 3/2) (ˆb) (n + /2)(n + 3/2) + 0 ˆb2 (n + /2)max n+/2 ˆb 2 n+/2 max. To establish that ˆb is in fact a minimum, note that J (ˆb) > 0. n+/2 ˆb max 2 3 n+7/2 n+/6 max Thus, the restricted Bayesian estimate is the uniform distribution over [0, 2 n+/2 max ]. RFRNCS [] B. Taskar, S. Lacoste-Julien, and. I. Jordan, Structured prediction, dual extraradient and Breman projections, Journal of achine Learnin Research, vol. 7, pp , [2] N. urata, T. Takenouchi, T. Kanamori, and S. uchi, Information eometry of U-Boost and Breman diverence, Neural Computation, vol. 6, pp , [3]. Collins, R.. Schapire, and Y. Siner, Loistic reression, AdaBoost and Breman distances, achine Learnin, vol. 48, pp , [4] J. Kivinen and. Warmuth, Relative loss bounds for multidimensional reression problems, achine Learnin, vol. 45, no. 3, pp , 200. [5] J. Lafferty, Additive models, boostin, and inference for eneralized diverences, Proc. of Conf. on Learnin Theory (COLT), 999. [6] S. Della Pietra, V. Della Pietra, and J. Lafferty, Duality and auxiliary functions for breman distances, Carneie ellon University Technical Report CU-CS-0-09R, 200. [7] L. K. Jones and C. L. Byrne, General entropy criteria for inverse problems, with applications to data compression, pattern classification, and cluster analysis, I Trans. on Information Theory, vol. 36, pp , 990. [8]. R. Gupta, S. Srivastava, and L. Cazzanti, Optimal estimation for nearest neihbor classifiers, Univ. of Washinton Dept. of lectrical nineerin Technical Report , 2006, available at idl.ee.washinton.edu. [9] S. Srivastava,. R. Gupta, and B. A. Friyik, Bayesian quadratic discriminant analysis, Journal of achine Learnin Research, vol. 8, pp , [0] A. Banerjee, An analysis of loistic models: xponential family connections and online performance, SIA Intl. Conf. on Data inin, [] A. Banerjee, S. eruu, I. S. Dhillon, and J. Ghosh, Clusterin with Breman diverences, Journal of achine Learnin Research, vol. 6, pp , [2] R. Nock and F. Nielsen, On weihtin clusterin, I Trans. on Pattern Analysis and achine Intellience, vol. 28, no. 8, pp , [3] G. LeBesenerais, J. Bercher, and G. Demoment, A new look at entropy for solvin linear inverse problems, I Trans. on Information Theory, vol. 45, pp , 999. [4] Y. Altun and A. Smola, Unifyin diverence minimization and statistical inference via convex duality, Proc. of Conf. on Learnin Theory (COLT), [5]. C. Pardo and I. Vajda, About distances of discrete distributions satisfyin the data processin theorem of information theory, I Trans. on Information Theory, vol. 43, no. 4, pp , 997. [6] A. Banerjee,. Guo, and H. Wan, On the optimality of conditional expectation as a Breman predictor, I Trans. on Information Theory, vol. 5, no. 7, pp , [7] I. Csiszár, Generalized projections for non-neative functions, Acta athematica Hunarica, vol. 68, pp. 6 85, 995. [8] B. A. Friyik, S. Srivastava, and. R. Gupta, An introduction to functional derivatives, Univ. of Washinton Dept. of lectrical ninerin Technical Report , Available at idl.ee.washinton.edu/publications.php. [9] I.. Gelfand and S. V. Fomin, Calculus of Variations. USA: Dover, [20] T. Rockafellar, Interals which are convex functionals, Pacific Journal of athematics, vol. 24, no. 3, pp , 968. [2] T. Hastie, R. Tibshirani, and J. Friedman, The lements of Statistical Learnin. New York: Spriner, 200. [22]. L. Lehmann and G. Casella, Theory of Point stimation. New York: Spriner, 998. [23] S. Srivastava and. R. Gupta, Distribution-based Bayesian minimum expected risk for discriminant analysis, Proc. of the I Intl. Symposium on Information Theory, [24] T. Cover and J. Thomas, lements of Information Theory. United States of America: John Wiley and Sons, 99. [25] R.. Kass, The eometry of asymptotic inference, Statistical Science, vol. 4, no. 3, pp , 989. [26] S. Amari and H. Naaoka, ethods of Information Geometry. New York: Oxford University Press, [27] G. Lebanon, Axiomatic eometry of conditional models, I Trans. on Information Theory, vol. 5, no. 4, pp , [28] C. H. dwards, Advanced Calculus of Several Variables. New York: Dover, 995.
Functional Bregman Divergence and Bayesian Estimation of Distributions Béla A. Frigyik, Santosh Srivastava, and Maya R. Gupta
5130 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 54, NO. 11, NOVEMBER 2008 Functional Bregman Divergence and Bayesian Estimation of Distributions Béla A. Frigyik, Santosh Srivastava, and Maya R. Gupta
More informationAn Introduction to Functional Derivatives
An Introduction to Functional Derivatives Béla A. Frigyik, Santosh Srivastava, Maya R. Gupta Dept of EE, University of Washington Seattle WA, 98195-2500 UWEE Technical Report Number UWEETR-2008-0001 January
More informationUse of Wijsman's Theorem for the Ratio of Maximal Invariant Densities in Signal Detection Applications
Use of Wijsman's Theorem for the Ratio of Maximal Invariant Densities in Signal Detection Applications Joseph R. Gabriel Naval Undersea Warfare Center Newport, Rl 02841 Steven M. Kay University of Rhode
More informationBregman Divergences for Data Mining Meta-Algorithms
p.1/?? Bregman Divergences for Data Mining Meta-Algorithms Joydeep Ghosh University of Texas at Austin ghosh@ece.utexas.edu Reflects joint work with Arindam Banerjee, Srujana Merugu, Inderjit Dhillon,
More informationJournal of Inequalities in Pure and Applied Mathematics
Journal of Inequalities in Pure and Applied Mathematics L HOSPITAL TYPE RULES FOR OSCILLATION, WITH APPLICATIONS IOSIF PINELIS Department of Mathematical Sciences, Michian Technoloical University, Houhton,
More informationTHE EULER FUNCTION OF FIBONACCI AND LUCAS NUMBERS AND FACTORIALS
Annales Univ. Sci. Budapest., Sect. Comp. 40 (2013) nn nnn THE EULER FUNCTION OF FIBONACCI AND LUCAS NUMBERS AND FACTORIALS Florian Luca (Morelia, Mexico) Pantelimon Stănică (Monterey, USA) Dedicated to
More informationReport Documentation Page
Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationSums of the Thue Morse sequence over arithmetic progressions
Sums of the Thue Morse sequence over arithmetic progressions T.W. Cusick 1, P. Stănică 2 1 State University of New York, Department of Mathematics Buffalo, NY 14260; Email: cusick@buffalo.edu 2 Naval Postgraduate
More informationAn Invariance Property of the Generalized Likelihood Ratio Test
352 IEEE SIGNAL PROCESSING LETTERS, VOL. 10, NO. 12, DECEMBER 2003 An Invariance Property of the Generalized Likelihood Ratio Test Steven M. Kay, Fellow, IEEE, and Joseph R. Gabriel, Member, IEEE Abstract
More informationuniform distribution theory
Uniform Distribution Theory 9 (2014), no.1, 21 25 uniform distribution theory ON THE FIRST DIGITS OF THE FIBONACCI NUMBERS AND THEIR EULER FUNCTION Florian Luca Pantelimon Stănică ABSTRACT. Here, we show
More informationOptimizing Robotic Team Performance with Probabilistic Model Checking
Optimizing Robotic Team Performance with Probabilistic Model Checking Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Sagar Chaki, Joseph Giampapa, David Kyle (presenting),
More informationMixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains. Barry Cobb. Alan Johnson
Mixture Distributions for Modeling Lead Time Demand in Coordinated Supply Chains Barry Cobb Virginia Military Institute Alan Johnson Air Force Institute of Technology AFCEA Acquisition Research Symposium
More informationAIR FORCE RESEARCH LABORATORY Directed Energy Directorate 3550 Aberdeen Ave SE AIR FORCE MATERIEL COMMAND KIRTLAND AIR FORCE BASE, NM
AFRL-DE-PS-JA-2007-1004 AFRL-DE-PS-JA-2007-1004 Noise Reduction in support-constrained multi-frame blind-deconvolution restorations as a function of the number of data frames and the support constraint
More informationStat260: Bayesian Modeling and Inference Lecture Date: March 10, 2010
Stat60: Bayesian Modelin and Inference Lecture Date: March 10, 010 Bayes Factors, -priors, and Model Selection for Reression Lecturer: Michael I. Jordan Scribe: Tamara Broderick The readin for this lecture
More informationP. Kestener and A. Arneodo. Laboratoire de Physique Ecole Normale Supérieure de Lyon 46, allée d Italie Lyon cedex 07, FRANCE
A wavelet-based generalization of the multifractal formalism from scalar to vector valued d- dimensional random fields : from theoretical concepts to experimental applications P. Kestener and A. Arneodo
More informationAttribution Concepts for Sub-meter Resolution Ground Physics Models
Attribution Concepts for Sub-meter Resolution Ground Physics Models 76 th MORS Symposium US Coast Guard Academy Approved for public release distribution. 2 Report Documentation Page Form Approved OMB No.
More informationNormic continued fractions in totally and tamely ramified extensions of local fields
Normic continued fractions in totally and tamely ramified extensions of local fields Pantelimon Stănică Naval Postgraduate School Applied Mathematics Department, Monterey, CA 93943 5216, USA; email: pstanica@nps.edu
More informationExact Solution of a Constrained. Optimization Problem in Thermoelectric Cooling
Applied Mathematical Sciences, Vol., 8, no. 4, 77-86 Exact Solution of a Constrained Optimization Problem in Thermoelectric Cooling Hongyun Wang Department of Applied Mathematics and Statistics University
More informationCHAPTER I THE RIESZ REPRESENTATION THEOREM
CHAPTER I THE RIESZ REPRESENTATION THEOREM We begin our study by identifying certain special kinds of linear functionals on certain special vector spaces of functions. We describe these linear functionals
More informationGuide to the Development of a Deterioration Rate Curve Using Condition State Inspection Data
Guide to the Development of a Deterioration Rate Curve Using Condition State Inspection Data by Guillermo A. Riveros and Elias Arredondo PURPOSE: The deterioration of elements of steel hydraulic structures
More informationMATH MEASURE THEORY AND FOURIER ANALYSIS. Contents
MATH 3969 - MEASURE THEORY AND FOURIER ANALYSIS ANDREW TULLOCH Contents 1. Measure Theory 2 1.1. Properties of Measures 3 1.2. Constructing σ-algebras and measures 3 1.3. Properties of the Lebesgue measure
More informationSensitivity of West Florida Shelf Simulations to Initial and Boundary Conditions Provided by HYCOM Data-Assimilative Ocean Hindcasts
Sensitivity of West Florida Shelf Simulations to Initial and Boundary Conditions Provided by HYCOM Data-Assimilative Ocean Hindcasts George Halliwell, MPO/RSMAS, University of Miami Alexander Barth, University
More informationImprovements in Modeling Radiant Emission from the Interaction Between Spacecraft Emanations and the Residual Atmosphere in LEO
Improvements in Modeling Radiant Emission from the Interaction Between Spacecraft Emanations and the Residual Atmosphere in LEO William L. Dimpfl Space Science Applications Laboratory The Aerospace Corporation
More informationDiagonal Representation of Certain Matrices
Diagonal Representation of Certain Matrices Mark Tygert Research Report YALEU/DCS/RR-33 December 2, 2004 Abstract An explicit expression is provided for the characteristic polynomial of a matrix M of the
More informationFlocculation, Optics and Turbulence in the Community Sediment Transport Model System: Application of OASIS Results
DISTRIBUTION STATEMENT A: Approved for public release; distribution is unlimited. Flocculation, Optics and Turbulence in the Community Sediment Transport Model System: Application of OASIS Results Emmanuel
More informationHigh-Fidelity Computational Simulation of Nonlinear Fluid- Structure Interaction Problems
Aerodynamic Issues of Unmanned Air Vehicles Fluid-Structure Interaction High-Fidelity Computational Simulation of Nonlinear Fluid- Structure Interaction Problems Raymond E. Gordnier Computational Sciences
More informationFleet Maintenance Simulation With Insufficient Data
Fleet Maintenance Simulation With Insufficient Data Zissimos P. Mourelatos Mechanical Engineering Department Oakland University mourelat@oakland.edu Ground Robotics Reliability Center (GRRC) Seminar 17
More informationCRS Report for Congress
CRS Report for Congress Received through the CRS Web Order Code RS21396 Updated May 26, 2006 Summary Iraq: Map Sources Hannah Fischer Information Research Specialist Knowledge Services Group This report
More informationNUMERICAL SOLUTIONS FOR OPTIMAL CONTROL PROBLEMS UNDER SPDE CONSTRAINTS
NUMERICAL SOLUTIONS FOR OPTIMAL CONTROL PROBLEMS UNDER SPDE CONSTRAINTS AFOSR grant number: FA9550-06-1-0234 Yanzhao Cao Department of Mathematics Florida A & M University Abstract The primary source of
More informationA report (dated September 20, 2011) on. scientific research carried out under Grant: FA
A report (dated September 20, 2011) on scientific research carried out under Grant: FA2386-10-1-4150 First-principles determination of thermal properties in nano-structured hexagonal solids with doping
More informationWEAK AND STRONG CONVERGENCE OF AN ITERATIVE METHOD FOR NONEXPANSIVE MAPPINGS IN HILBERT SPACES
Applicable Analysis and Discrete Mathematics available online at http://pemath.et.b.ac.yu Appl. Anal. Discrete Math. 2 (2008), 197 204. doi:10.2298/aadm0802197m WEAK AND STRONG CONVERGENCE OF AN ITERATIVE
More informationSystem Reliability Simulation and Optimization by Component Reliability Allocation
System Reliability Simulation and Optimization by Component Reliability Allocation Zissimos P. Mourelatos Professor and Head Mechanical Engineering Department Oakland University Rochester MI 48309 Report
More informationVolume 6 Water Surface Profiles
A United States Contribution to the International Hydrological Decade HEC-IHD-0600 Hydrologic Engineering Methods For Water Resources Development Volume 6 Water Surface Profiles July 1975 Approved for
More informationPattern Search for Mixed Variable Optimization Problems p.1/35
Pattern Search for Mixed Variable Optimization Problems Mark A. Abramson (Air Force Institute of Technology) Charles Audet (École Polytechnique de Montréal) John Dennis, Jr. (Rice University) The views
More informationBabylonian resistor networks
IOP PUBLISHING Eur. J. Phys. 33 (2012) 531 537 EUROPEAN JOURNAL OF PHYSICS doi:10.1088/0143-0807/33/3/531 Babylonian resistor networks Carl E Mungan 1 and Trevor C Lipscombe 2 1 Physics Department, US
More informationZ-scan Measurement of Upconversion in Er:YAG
Proceedings of the 13 th Annual Directed Energy Symposium (Bethesda, MD) Dec. 2010 Z-scan Measurement of Upconversion in Er:YAG Jeffrey O. White, Thomas A. Mercier, Jr., John E. McElhenny Army Research
More informationReport Documentation Page
Report Documentation Page Form Approved OMB No. 0704-0188 Public reporting burden for the collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationParametric Techniques Lecture 3
Parametric Techniques Lecture 3 Jason Corso SUNY at Buffalo 22 January 2009 J. Corso (SUNY at Buffalo) Parametric Techniques Lecture 3 22 January 2009 1 / 39 Introduction In Lecture 2, we learned how to
More informationWorst-Case Bounds for Gaussian Process Models
Worst-Case Bounds for Gaussian Process Models Sham M. Kakade University of Pennsylvania Matthias W. Seeger UC Berkeley Abstract Dean P. Foster University of Pennsylvania We present a competitive analysis
More informationMINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS. Maya Gupta, Luca Cazzanti, and Santosh Srivastava
MINIMUM EXPECTED RISK PROBABILITY ESTIMATES FOR NONPARAMETRIC NEIGHBORHOOD CLASSIFIERS Maya Gupta, Luca Cazzanti, and Santosh Srivastava University of Washington Dept. of Electrical Engineering Seattle,
More information5 Measure theory II. (or. lim. Prove the proposition. 5. For fixed F A and φ M define the restriction of φ on F by writing.
5 Measure theory II 1. Charges (signed measures). Let (Ω, A) be a σ -algebra. A map φ: A R is called a charge, (or signed measure or σ -additive set function) if φ = φ(a j ) (5.1) A j for any disjoint
More informationTHEOREMS, ETC., FOR MATH 515
THEOREMS, ETC., FOR MATH 515 Proposition 1 (=comment on page 17). If A is an algebra, then any finite union or finite intersection of sets in A is also in A. Proposition 2 (=Proposition 1.1). For every
More informationDIRECTIONAL WAVE SPECTRA USING NORMAL SPREADING FUNCTION
CETN-I-6 3185 DIRECTIONAL WAVE SPECTRA USING NORMAL SPREADING FUNCTION PURPOSE : To present a parameterized model of a directional spectrum of the sea surface using an energy spectrum and a value for the
More informationIsotropic diffeomorphisms: solutions to a differential system for a deformed random fields study
Isotropic diffeomorphisms: solutions to a differential system for a deformed random fields study Marc Briant, Julie Fournier To cite this version: Marc Briant, Julie Fournier. Isotropic diffeomorphisms:
More informationClosed-form and Numerical Reverberation and Propagation: Inclusion of Convergence Effects
DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. Closed-form and Numerical Reverberation and Propagation: Inclusion of Convergence Effects Chris Harrison Centre for Marine
More informationVLBA IMAGING OF SOURCES AT 24 AND 43 GHZ
VLBA IMAGING OF SOURCES AT 24 AND 43 GHZ D.A. BOBOLTZ 1, A.L. FEY 1, P. CHARLOT 2,3 & THE K-Q VLBI SURVEY COLLABORATION 1 U.S. Naval Observatory 3450 Massachusetts Ave., NW, Washington, DC, 20392-5420,
More informationLinearized optimal power flow
Linearized optimal power flow. Some introductory comments The advantae of the economic dispatch formulation to obtain minimum cost allocation of demand to the eneration units is that it is computationally
More informationParametric Techniques
Parametric Techniques Jason J. Corso SUNY at Buffalo J. Corso (SUNY at Buffalo) Parametric Techniques 1 / 39 Introduction When covering Bayesian Decision Theory, we assumed the full probabilistic structure
More informationOcean Acoustics Turbulence Study
Ocean Acoustics Turbulence Study PI John Oeschger Coastal Systems Station 6703 West Highway 98 Panama City, FL 32407 phone: (850) 230-7054 fax: (850) 234-4886 email: OeschgerJW@ncsc.navy.mil CO-PI Louis
More informationEstimation of Vertical Distributions of Water Vapor and Aerosols from Spaceborne Observations of Scattered Sunlight
Estimation of Vertical Distributions of Water Vapor and Aerosols from Spaceborne Observations of Scattered Sunlight Dale P. Winebrenner Polar Science Center/Applied Physics Laboratory University of Washington
More informationREPORT DOCUMENTATION PAGE
REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More informationTIME SERIES ANALYSIS OF VLBI ASTROMETRIC SOURCE POSITIONS AT 24-GHZ
TIME SERIES ANALYSIS OF VLBI ASTROMETRIC SOURCE POSITIONS AT 24-GHZ D.A. BOBOLTZ 1, A.L. FEY 1 & The K-Q VLBI Survey Collaboration 1 U.S. Naval Observatory 3450 Massachusetts Ave., NW, Washington, DC,
More informationNAVGEM Platform Support
DISTRIBUTION STATEMENT A. Approved for public release; distribution is unlimited. NAVGEM Platform Support Mr. Timothy Whitcomb Naval Research Laboratory 7 Grace Hopper Ave, MS2 Monterey, CA 93943 phone:
More informationThe local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance
The local equivalence of two distances between clusterings: the Misclassification Error metric and the χ 2 distance Marina Meilă University of Washington Department of Statistics Box 354322 Seattle, WA
More informationREPORT DOCUMENTATION PAGE. Theoretical Study on Nano-Catalyst Burn Rate. Yoshiyuki Kawazoe (Tohoku Univ) N/A AOARD UNIT APO AP
REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 The public reporting burden for this collection of information is estimated to average 1 hour per response, including the time for reviewing instructions,
More informationCrowd Behavior Modeling in COMBAT XXI
Crowd Behavior Modeling in COMBAT XXI Imre Balogh MOVES Research Associate Professor ilbalogh@nps.edu July 2010 831-656-7582 http://movesinstitute.org Report Documentation Page Form Approved OMB No. 0704-0188
More informationStatistical Approaches to Learning and Discovery. Week 4: Decision Theory and Risk Minimization. February 3, 2003
Statistical Approaches to Learning and Discovery Week 4: Decision Theory and Risk Minimization February 3, 2003 Recall From Last Time Bayesian expected loss is ρ(π, a) = E π [L(θ, a)] = L(θ, a) df π (θ)
More informationSTATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA
AFRL-OSR-VA-TR-2014-0234 STATISTICAL MACHINE LEARNING FOR STRUCTURED AND HIGH DIMENSIONAL DATA Larry Wasserman CARNEGIE MELLON UNIVERSITY 0 Final Report DISTRIBUTION A: Distribution approved for public
More informationL p Spaces and Convexity
L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function
More informationAnalysis Finite and Infinite Sets The Real Numbers The Cantor Set
Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered
More informationDesign for Lifecycle Cost using Time-Dependent Reliability
Design for Lifecycle Cost using Time-Dependent Reliability Amandeep Singh Zissimos P. Mourelatos Jing Li Mechanical Engineering Department Oakland University Rochester, MI 48309, USA : Distribution Statement
More informationCHAPTER V DUAL SPACES
CHAPTER V DUAL SPACES DEFINITION Let (X, T ) be a (real) locally convex topological vector space. By the dual space X, or (X, T ), of X we mean the set of all continuous linear functionals on X. By the
More informationNotions such as convergent sequence and Cauchy sequence make sense for any metric space. Convergent Sequences are Cauchy
Banach Spaces These notes provide an introduction to Banach spaces, which are complete normed vector spaces. For the purposes of these notes, all vector spaces are assumed to be over the real numbers.
More informationReport Documentation Page
Inhibition of blood cholinesterase activity is a poor predictor of acetylcholinesterase inhibition in brain regions of guinea pigs exposed to repeated doses of low levels of soman. Sally M. Anderson Report
More informationDevelopment and Application of Acoustic Metamaterials with Locally Resonant Microstructures
Development and Application of Acoustic Metamaterials with Locally Resonant Microstructures AFOSR grant #FA9550-10-1-0061 Program manager: Dr. Les Lee PI: C.T. Sun Purdue University West Lafayette, Indiana
More informationUSER S GUIDE. ESTCP Project ER
USER S GUIDE Demonstration of a Fractured Rock Geophysical Toolbox (FRGT) for Characterization and Monitoring of DNAPL Biodegradation in Fractured Rock Aquifers ESTCP Project ER-201118 JANUARY 2016 F.D.
More informationProbability and Measure
Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability
More informationThe Bernoulli equation in a moving reference frame
IOP PUBLISHING Eur. J. Phys. 3 (011) 517 50 EUROPEAN JOURNAL OF PHYSICS doi:10.1088/0143-0807/3//0 The Bernoulli equation in a moving reference frame Carl E Mungan Physics Department, US Naval Academy
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationTheory and Practice of Data Assimilation in Ocean Modeling
Theory and Practice of Data Assimilation in Ocean Modeling Robert N. Miller College of Oceanic and Atmospheric Sciences Oregon State University Oceanography Admin. Bldg. 104 Corvallis, OR 97331-5503 Phone:
More informationWavelet Spectral Finite Elements for Wave Propagation in Composite Plates
Wavelet Spectral Finite Elements for Wave Propagation in Composite Plates Award no: AOARD-0904022 Submitted to Dr Kumar JATA Program Manager, Thermal Sciences Directorate of Aerospace, Chemistry and Materials
More informationReal-Time Environmental Information Network and Analysis System (REINAS)
Calhoun: The NPS Institutional Archive Faculty and Researcher Publications Faculty and Researcher Publications 1998-09 Real-Time Environmental Information Network and Analysis System (REINAS) Nuss, Wendell
More informationECE 275B Homework # 1 Solutions Winter 2018
ECE 275B Homework # 1 Solutions Winter 2018 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2 < < x n Thus,
More informationMetrology Experiment for Engineering Students: Platinum Resistance Temperature Detector
Session 1359 Metrology Experiment for Engineering Students: Platinum Resistance Temperature Detector Svetlana Avramov-Zamurovic, Carl Wick, Robert DeMoyer United States Naval Academy Abstract This paper
More informationCoastal Engineering Technical Note
Coastal Engineering Technical Note CETN I-48 (12/91) Evaluation and Application of the Wave Information Study for the Gulf of Mexico INTRODUCTION The Wave Information Study (WIS) for the Gulf of Mexico
More informationAppendix B Convex analysis
This version: 28/02/2014 Appendix B Convex analysis In this appendix we review a few basic notions of convexity and related notions that will be important for us at various times. B.1 The Hausdorff distance
More informationOn Applying Point-Interval Logic to Criminal Forensics
On Applying Point-Interval Logic to Criminal Forensics (Student Paper) Mashhood Ishaque Abbas K. Zaidi Alexander H. Levis Command and Control Research and Technology Symposium 1 Report Documentation Page
More informationWavelets and Affine Distributions A Time-Frequency Perspective
Wavelets and Affine Distributions A Time-Frequency Perspective Franz Hlawatsch Institute of Communications and Radio-Frequency Engineering Vienna University of Technology INSTITUT FÜR NACHRICHTENTECHNIK
More informationGeometry of U-Boost Algorithms
Geometry of U-Boost Algorithms Noboru Murata 1, Takashi Takenouchi 2, Takafumi Kanamori 3, Shinto Eguchi 2,4 1 School of Science and Engineering, Waseda University 2 Department of Statistical Science,
More informationPIPS 3.0. Pamela G. Posey NRL Code 7322 Stennis Space Center, MS Phone: Fax:
PIPS 3.0 Ruth H. Preller Naval Research Laboratory, Code 7322 Stennis Space Center, MS 39529 phone: (228) 688-5444 fax: (228)688-4759 email: preller@nrlssc.navy.mil Pamela G. Posey NRL Code 7322 Stennis
More informationECE 275B Homework # 1 Solutions Version Winter 2015
ECE 275B Homework # 1 Solutions Version Winter 2015 1. (a) Because x i are assumed to be independent realizations of a continuous random variable, it is almost surely (a.s.) 1 the case that x 1 < x 2
More informationConvergence of DFT eigenvalues with cell volume and vacuum level
Converence of DFT eienvalues with cell volume and vacuum level Sohrab Ismail-Beii October 4, 2013 Computin work functions or absolute DFT eienvalues (e.. ionization potentials) requires some care. Obviously,
More informationAnalysis Comparison between CFD and FEA of an Idealized Concept V- Hull Floor Configuration in Two Dimensions. Dr. Bijan Khatib-Shahidi & Rob E.
Concept V- Hull Floor Configuration in Two Dimensions Dr. Bijan Khatib-Shahidi & Rob E. Smith 10 November 2010 : Dist A. Approved for public release Report Documentation Page Form Approved OMB No. 0704-0188
More informationInformation geometry of Bayesian statistics
Information geometry of Bayesian statistics Hiroshi Matsuzoe Department of Computer Science and Engineering, Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan Abstract.
More information+ 2x sin x. f(b i ) f(a i ) < ɛ. i=1. i=1
Appendix To understand weak derivatives and distributional derivatives in the simplest context of functions of a single variable, we describe without proof some results from real analysis (see [7] and
More informationThermo-Kinetic Model of Burning for Polymeric Materials
Thermo-Kinetic Model of Burning for Polymeric Materials Stanislav I. Stoliarov a, Sean Crowley b, Richard Lyon b a University of Maryland, Fire Protection Engineering, College Park, MD 20742 b FAA W. J.
More informationarxiv: v2 [math.oc] 11 Jun 2018
: Tiht Automated Converence Guarantees Adrien Taylor * Bryan Van Scoy * 2 Laurent Lessard * 2 3 arxiv:83.673v2 [math.oc] Jun 28 Abstract We present a novel way of eneratin Lyapunov functions for provin
More informationPROBLEMS. (b) (Polarization Identity) Show that in any inner product space
1 Professor Carl Cowen Math 54600 Fall 09 PROBLEMS 1. (Geometry in Inner Product Spaces) (a) (Parallelogram Law) Show that in any inner product space x + y 2 + x y 2 = 2( x 2 + y 2 ). (b) (Polarization
More informationQuantitation and Ratio Determination of Uranium Isotopes in Water and Soil Using Inductively Coupled Plasma Mass Spectrometry (ICP-MS)
Quantitation and Ratio Determination of Uranium Isotopes in Water and Soil Using Inductively Coupled Plasma Mass Spectrometry (ICP-MS) D.N. Kurk, T.E. Beegle, S.C. Spence and R.J. Swatski Report Documentation
More informationBroadband matched-field source localization in the East China Sea*
Broadband matched-field source localization in the East China Sea* Renhe Zhang Zhenglin Li Jin Yan Zhaohui Peng Fenghua Li National Laboratory of Acoustics, Institute of Acoustics, Chinese Academy of Sciences,
More information2008 Monitoring Research Review: Ground-Based Nuclear Explosion Monitoring Technologies SPECTRAL ANALYSIS OF RADIOXENON
SPECTRAL ANALYSIS OF RADIOXENON Matthew W. Cooper, Ted W. Bowyer, James C. Hayes, Tom R. Heimbigner, Charles W. Hubbard, Justin I. McIntyre, and Brian T. Schrom Pacific Northwest National Laboratory Sponsored
More informationUnderstanding Near-Surface and In-cloud Turbulent Fluxes in the Coastal Stratocumulus-topped Boundary Layers
Understanding Near-Surface and In-cloud Turbulent Fluxes in the Coastal Stratocumulus-topped Boundary Layers Qing Wang Meteorology Department, Naval Postgraduate School Monterey, CA 93943 Phone: (831)
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationHYCOM Caspian Sea Modeling. Part I: An Overview of the Model and Coastal Upwelling. Naval Research Laboratory, Stennis Space Center, USA
HYCOM Caspian Sea Modeling. Part I: An Overview of the Model and Coastal Upwelling By BIROL KARA, ALAN WALLCRAFT AND JOE METZGER Naval Research Laboratory, Stennis Space Center, USA MURAT GUNDUZ Institute
More informationImproved Parameterizations Of Nonlinear Four Wave Interactions For Application In Operational Wave Prediction Models
Improved Parameterizations Of Nonlinear Four Wave Interactions For Application In Operational Wave Prediction Models Gerbrant Ph. van Vledder ALKYON Hydraulic Consultancy & Research P.O.Box 248, 8300 AD
More informationREGENERATION OF SPENT ADSORBENTS USING ADVANCED OXIDATION (PREPRINT)
AL/EQ-TP-1993-0307 REGENERATION OF SPENT ADSORBENTS USING ADVANCED OXIDATION (PREPRINT) John T. Mourand, John C. Crittenden, David W. Hand, David L. Perram, Sawang Notthakun Department of Chemical Engineering
More informationA Bound on Mean-Square Estimation Error Accounting for System Model Mismatch
A Bound on Mean-Square Estimation Error Accounting for System Model Mismatch Wen Xu RD Instruments phone: 858-689-8682 email: wxu@rdinstruments.com Christ D. Richmond MIT Lincoln Laboratory email: christ@ll.mit.edu
More informationGrupo demecanica Celeste, Facultad de Ciencias, Valladolid, Spain. Dept. Applied Mathematics, University of Alicante, Alicante, Spain
Advances in the Unied Theory of the Rotation of the Nonrigid Earth Juan Getino Grupo demecanica Celeste, Facultad de Ciencias, Valladolid, Spain Jose M.Ferrandiz Dept. Applied Mathematics, University of
More informationarxiv: v1 [math.ds] 31 Jul 2018
arxiv:1807.11801v1 [math.ds] 31 Jul 2018 On the interior of projections of planar self-similar sets YUKI TAKAHASHI Abstract. We consider projections of planar self-similar sets, and show that one can create
More information