Information geometry for bivariate distribution control
|
|
- Lydia Sims
- 5 years ago
- Views:
Transcription
1 Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic processes through sensor estimation of probability density functions has a geometric setting via information theory and the information metric. We consider gamma models with positive correlation, for which the information theoretic 3-manifold geometry has recently been formulated. For comparison we summarize also the case for bivariate Gaussian processes with arbitrary correlation.
2 Poisson Models Model for random events: probability of exactly m in time t ( ) t m 1 P m = τ m! e t/τ for m = 0, 1, 2,... The mean and variance of the number of events in such intervals is t/τ. From P 0 = e t/τ, the pdf for intervals t between successive events is f random (t; τ) = 1 τ e t/τ for t [0, ) with variance V ar(t) = τ 2. Departures from randomness involve either clustering of events or evening out of event spacings.
3 Gamma pdfs Parameters: τ, ν R + f(t; τ, ν) = ( ν τ ) ν t ν 1 Γ(ν) e tν/τ t R + Mean t = τ and variance V ar(t) = τ 2 /ν. ν = 1 Poisson, with mean interval τ. ν = 1, 2,..., Z Poisson process with events removed to leave only every ν th. ν < 1 clustering of events ν = 0.5 (Clustered) f(t; 1, ν) ν = 1 (Random) ν = 2 (Smoothed) ν = 5 (Smoothed) Inter-event interval t
4 Fisher Information Metric on pdf family Example of univariate pdf family: {f(t; (θ i )), (θ i ) S} Random variable t R + and n parameters. Log-likelihood function l = log(f) For all points (θ i ) S a subset of R n the covariance of partial derivatives ( ) l g ij = f(t; l R + (θi )) θ i θ j dt = f(t; τ, ν) R + ( 2 l θ i θ j is a positive definite n n matrix and induces a Riemannian metric g on the n-dimensional parameter space S. ) dt General Case: integrate over all random variables for multivariate pdfs.
5 McKay bivariate gamma pdf The Mckay bivariate gamma distribution defined on 0 < x < y < with parameters α 1, σ 12, α 2 > 0 is given by: f(x, y; α 1, σ 12, α 2 ) = ( α 1 σ ) (α 1 +α 2 ) 2 x α1 1 (y x) α2 1 e α1 σ y 12 12, (1) Γ(α 1 )Γ(α 2 ) where σ 12 is the covariance of X, Y. The correlation coefficient and marginal gamma distributions, of X and Y are given respectively, for positive x, y by : ρ(x, Y ) = α1 α 1 + α 2 f X (x) = ( α 1 σ ) α 1 2 x α1 1 e α1 σ x 12 12, Γ(α 1 ) f Y (y) = ( α 1 σ ) (α 1 +α 2 ) 2 y (α 1+α 2 ) 1 e α1 σ y Γ(α 1 + α 2 )
6 Applicability of McKay distributions Explicitly, f X (x) is a gamma density with mean α 1 σ 12 and dispersion parameter α 1. Similarly, f Y (y) is a gamma density with mean (α 1 + α 2 ) σ 12 α and dispersion parameter (α 1 + α 2 1 ). So, for the McKay distribution to be applicable to a given joint distribution for (x, y), we need to have: 0 < x < y < Covariance > 0 (Dispersion parameter for y) > (Dispersion parameter for x) And we expect, roughly, (Mean x)(mean y) = α 1+α 2 α 1
7 McKay 3-manifold Denote by M the set of Mckay bivariate gamma distributions, that is M = {f f(x, y; α 1, σ 12, α 2 ) = ( α 1 σ ) (α 1 +α 2 ) 2 x α1 1 (y x) α2 1 e α1 σ y 12 12, Γ(α 1 )Γ(α 2 ) y > x > 0, α 1, σ 12, α 2 > 0} (2) Then we have from Arwini and Dodson: Global coordinates (α 1, σ 12, α 2 ) make M a 3-manifold with Fisher information metric [g ij ] given by : [g ij ] = 3 α 1 +α 2 4 α ψ (α 1 ) α 1 α 2 4 α 1 σ 12 1 α 1 α 2 α 1 +α 2 4 α 1 σ 12 4 σ α σ α σ 12 ψ (α 2 ) (3)
8 Information arclength through McKay distributions For small changes dα 1, dα 2, dσ 12 the element of arclength ds is given by ds 2 = 3 ij=1 g ij dx i dx j (4) with (x 1, x 2, x 3 ) = (α 1, σ 12, α 2 ). For larger separations between two bivariate gamma distributions the arclength along a curve is obtained by integration of ds; geodesic curves can give minimizing trajectories. c : [a, b] M : t (c 1 (t), c 2 (t), c 3 (t)) (5) Tangent vector ċ(t) = (ċ 1 (t), ċ 2 (t), ċ 3 (t)) has norm ċ given via (3) by ċ(t) 2 = 3 i,j=1 g ij ċ i (t) ċ j (t). (6) Information length of the curve is L c (a, b) = b a ċ(t) dt for a b. (7)
9 Bivariate Gaussian The probability density of the 2-dimensional normal distribution has the form: f(x) = 1 2π Σ 1 2 e 1 2 (x µ) Σ 1 (x µ), (8) where x = [ x1 x 2 ], µ = [ µ1 µ 2 ], Σ = < x 1 < x 2 <, < µ 1 < µ 2 <, 0 < σ 11, σ 22 <. [ σ11 σ 12 σ 12 σ 22 This contains the five parameters µ 1, µ 2, σ 11, σ 12, σ 22. So our global coordinate system consists of the 5-tuples θ = (θ 1, θ 2, θ 3, θ 4, θ 5 ) = (µ 1, µ 2, σ 11, σ 12, σ 22 ). ],
10 Bivariate Gaussian 5-manifold θ 1 = µ 1, θ 2 = µ 2, θ 3 = σ 11, θ 4 = σ 12, θ 5 = σ 22. The information metric tensor is known to be given by: [g ij ] = θ 5 θ θ4 θ (θ 5 ) θ4 θ 5 (θ 4 ) θ4 θ 5 θ 3 θ 5 +(θ 4 ) θ3 θ where is the determinant (θ 4 ) θ3 θ 4 (θ 3 ) = Σ = θ 3 θ 5 (θ 4 ) 2. (9)
11 B-spline approximations to McKay Since every pdf has unit measure lim f(x, y, α 1, σ 12, α 2 ) = 0 (10) x,y + Hence, for all ɛ > 0 there is a b(ɛ, α 1, σ 12, α 2 ) > 0 such that x, y > b(ɛ, α 1, σ 12, α 2 ) f(x, y, α 1, σ 12, α 2 ) ɛ. (11) So we can use B-spline functions to approximate the McKay pdf such that f(x, y, α 1, σ 12, α 2 ) n i=1 w i B i (x, y) δ (12) where B i (x, y) are pre-specified bivariate basis functions defined on Ω = [0, b] [0, b], w i are the weights to be trained adaptively and δ is a small number generally larger than ɛ.
12 Square root approximation This is numerically more robust than linear B-splines. Used here to represent the coupled links between the three parameters and the McKay pdf f. Since the basis functions are pre-specified, different values of {α 1, σ 12, α 2 } will generate different sets of weights. Therefore, our approximation should be represented as f(x, y, α 1, σ 12, α 2 ) = where e δ. n i=1 w i B i (x, y) + e (13)
13 Renormalization Wang has used the following transformed representation f(x, y, α 1, σ 12, α 2 ) = C(x, y)v k +h(v k )B n (x, y) (14) to guarantee that b b 0 0 f(x, y, α 1, σ 12, α 2 )dxdy = 1 where V k = (w 1, w 2..., w n 1 ) T constitutes a vector of independent weights, and h(.) is a known nonlinear function of V k. With this format, the relationship between V k and {α 1, σ 12, α 2 } can be formulated.
14 Tuning procedure The initial bivariate gamma distribution has parameters {α 1 (0), σ 12 (0), α 2 (0)} and the desired distribution has {α 1 (f), σ 12 (f), α 2 (f)} Let g(x, y) represent the probability density of the bivariate gamma distribution with parameter set {α 1 (f), σ 12 (f), α 2 (f)}. An effective trajectory for V k should be chosen to minimise the following performance function J = 1 2 b 0 b 0 y) g(x, y)logk(x, dxdy (15) g(x, y) with K(x, y) = C(x, y)v k + h(v k )B n (x, y).
15 Gradient Rule for iteration V k = V k 1 η J V V =V k 1 (16) where k = 0, 1, 2,... represents the sample number and η > 0 is a pre-specified learning rate. Using the relationship between the weight vector and the three parameters, the adaptive tuning of the actual parameters in the pdf of (14) can be readily formulated. The information arclength provides distance estimates between current and target distribution
16 Kullback-Leibler Divergence For two probability density functions p and p on an event space Ω, the function KL(p, p ) = Ω p(x) log p p(x)dx (17) (x) is called the Kullback-Leibler divergence or relative entropy. We could instead of (15), consider (17) as the performance function; explicitly this would be: W = 1 2 b 0 b 0 g(x, y) g(x, y)log dxdy (18) K(x, y) with K(x, y) = (C(x, y)v k + h(v k )B n (x, y).
Neighbourhoods of Randomness and Independence
Neighbourhoods of Randomness and Independence C.T.J. Dodson School of Mathematics, Manchester University Augment information geometric measures in spaces of distributions, via explicit geometric representations
More informationAN AFFINE EMBEDDING OF THE GAMMA MANIFOLD
AN AFFINE EMBEDDING OF THE GAMMA MANIFOLD C.T.J. DODSON AND HIROSHI MATSUZOE Abstract. For the space of gamma distributions with Fisher metric and exponential connections, natural coordinate systems, potential
More informationOn the entropy flows to disorder
CHAOS 2009 Charnia, Crete 1-5 June 2009 On the entropy flows to disorder C.T.J. Dodson School of Mathematics University of Manchester, UK Abstract Gamma distributions, which contain the exponential as
More informationOn the entropy flows to disorder
On the entropy flows to disorder C.T.J. Dodson School of Mathematics, University of Manchester, Manchester M 9PL, UK ctdodson@manchester.ac.uk May, 9 Abstract Gamma distributions, which contain the exponential
More informationOn the entropy flows to disorder
On the entropy flows to disorder School of Mathematics University of Manchester Manchester M 9PL, UK (e-mail: ctdodson@manchester.ac.uk) C.T.J. Dodson Abstract. Gamma distributions, which contain the exponential
More informationOn random sequence spacing distributions
On random sequence spacing distributions C.T.J. Dodson School of Mathematics, Manchester University Manchester M6 1QD, UK ctdodson@manchester.ac.uk Abstract The random model for allocation of an amino
More informationarxiv: v2 [math-ph] 22 May 2009
On the entropy flows to disorder arxiv:8.438v [math-ph] May 9 C.T.J. Dodson School of Mathematics, University of Manchester, Manchester M3 9PL, UK ctdodson@manchester.ac.uk August 6, 8 Abstract Gamma distributions,
More informationUniversal connection and curvature for statistical manifold geometry
Universal connection and curvature for statistical manifold geometry Khadiga Arwini, L. Del Riego and C.T.J. Dodson Department of Mathematics University of Manchester Institute of Science and Technology
More informationTopics in Information Geometry. Dodson, CTJ. MIMS EPrint: Manchester Institute for Mathematical Sciences School of Mathematics
Topics in Information Geometry Dodson, CTJ 005 MIMS EPrint: 005.49 Manchester Institute for Mathematical Sciences School of Mathematics The University of Manchester Reports available from: And by contacting:
More informationInformation Geometry of Bivariate Gamma Exponential Distributions
Information Geometry of Bivariate Gamma Exponential Distriutions Khadiga Ali Arwini Department of Mathematics Al-Fateh University Tripoli-Liya July 7, 9 Astract This paper is devoted to the information
More informationChapter 5 continued. Chapter 5 sections
Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationUniversal connection and curvature for statistical manifold geometry
Universal connection and curvature for statistical manifold geometry Khadiga Arwini, L. Del Riego and C.T.J. Dodson School of Mathematics, University of Manchester, Manchester M60 QD, UK arwini00@yahoo.com
More informationELEMENTS OF PROBABILITY THEORY
ELEMENTS OF PROBABILITY THEORY Elements of Probability Theory A collection of subsets of a set Ω is called a σ algebra if it contains Ω and is closed under the operations of taking complements and countable
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationJune 21, Peking University. Dual Connections. Zhengchao Wan. Overview. Duality of connections. Divergence: general contrast functions
Dual Peking University June 21, 2016 Divergences: Riemannian connection Let M be a manifold on which there is given a Riemannian metric g =,. A connection satisfying Z X, Y = Z X, Y + X, Z Y (1) for all
More informationx. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).
.8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics
More information1 First and second variational formulas for area
1 First and second variational formulas for area In this chapter, we will derive the first and second variational formulas for the area of a submanifold. This will be useful in our later discussion on
More informationApplications of Information Geometry to Hypothesis Testing and Signal Detection
CMCAA 2016 Applications of Information Geometry to Hypothesis Testing and Signal Detection Yongqiang Cheng National University of Defense Technology July 2016 Outline 1. Principles of Information Geometry
More informationRandom Variables and Their Distributions
Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital
More informationMaximum Likelihood Estimation. only training data is available to design a classifier
Introduction to Pattern Recognition [ Part 5 ] Mahdi Vasighi Introduction Bayesian Decision Theory shows that we could design an optimal classifier if we knew: P( i ) : priors p(x i ) : class-conditional
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationInformation geometry of the power inverse Gaussian distribution
Information geometry of the power inverse Gaussian distribution Zhenning Zhang, Huafei Sun and Fengwei Zhong Abstract. The power inverse Gaussian distribution is a common distribution in reliability analysis
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationChapter 5. Chapter 5 sections
1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 254 Part V
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationLecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016
Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46 Outline 1 Common Continuous
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationProbability on a Riemannian Manifold
Probability on a Riemannian Manifold Jennifer Pajda-De La O December 2, 2015 1 Introduction We discuss how we can construct probability theory on a Riemannian manifold. We make comparisons to this and
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationSome illustrations of information geometry in biology and physics
Some illustrations of information geometry in biology and physics C.T.J. Dodson School of Mathematics, University of Manchester UK Abstract Many real processes have stochastic features which seem to be
More informationIntroduction to Information Geometry
Introduction to Information Geometry based on the book Methods of Information Geometry written by Shun-Ichi Amari and Hiroshi Nagaoka Yunshu Liu 2012-02-17 Outline 1 Introduction to differential geometry
More informationFormulas for probability theory and linear models SF2941
Formulas for probability theory and linear models SF2941 These pages + Appendix 2 of Gut) are permitted as assistance at the exam. 11 maj 2008 Selected formulae of probability Bivariate probability Transforms
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationContrastive Divergence
Contrastive Divergence Training Products of Experts by Minimizing CD Hinton, 2002 Helmut Puhr Institute for Theoretical Computer Science TU Graz June 9, 2010 Contents 1 Theory 2 Argument 3 Contrastive
More information1. Geometry of the unit tangent bundle
1 1. Geometry of the unit tangent bundle The main reference for this section is [8]. In the following, we consider (M, g) an n-dimensional smooth manifold endowed with a Riemannian metric g. 1.1. Notations
More informationLatent state estimation using control theory
Latent state estimation using control theory Bert Kappen SNN Donders Institute, Radboud University, Nijmegen Gatsby Unit, UCL London August 3, 7 with Hans Christian Ruiz Bert Kappen Smoothing problem Given
More informationStat410 Probability and Statistics II (F16)
Stat4 Probability and Statistics II (F6 Exponential, Poisson and Gamma Suppose on average every /λ hours, a Stochastic train arrives at the Random station. Further we assume the waiting time between two
More informationCopulas. MOU Lili. December, 2014
Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability
More informationGaussian processes for inference in stochastic differential equations
Gaussian processes for inference in stochastic differential equations Manfred Opper, AI group, TU Berlin November 6, 2017 Manfred Opper, AI group, TU Berlin (TU Berlin) inference in SDE November 6, 2017
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationManifold Monte Carlo Methods
Manifold Monte Carlo Methods Mark Girolami Department of Statistical Science University College London Joint work with Ben Calderhead Research Section Ordinary Meeting The Royal Statistical Society October
More informationis a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.
Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationAn Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis
An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis Luigi Malagò Department of Electronics and Information Politecnico di Milano Via Ponzio, 34/5 20133 Milan,
More informationA Detailed Analysis of Geodesic Least Squares Regression and Its Application to Edge-Localized Modes in Fusion Plasmas
A Detailed Analysis of Geodesic Least Squares Regression and Its Application to Edge-Localized Modes in Fusion Plasmas Geert Verdoolaege1,2, Aqsa Shabbir1,3 and JET Contributors EUROfusion Consortium,
More informationMathematical Preliminaries
Mathematical Preliminaries Economics 3307 - Intermediate Macroeconomics Aaron Hedlund Baylor University Fall 2013 Econ 3307 (Baylor University) Mathematical Preliminaries Fall 2013 1 / 25 Outline I: Sequences
More informationTransport Continuity Property
On Riemannian manifolds satisfying the Transport Continuity Property Université de Nice - Sophia Antipolis (Joint work with A. Figalli and C. Villani) I. Statement of the problem Optimal transport on Riemannian
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationSTA 2201/442 Assignment 2
STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution
More informationWarped Riemannian metrics for location-scale models
1 Warped Riemannian metrics for location-scale models Salem Said, Lionel Bombrun, Yannick Berthoumieu arxiv:1707.07163v1 [math.st] 22 Jul 2017 Abstract The present paper shows that warped Riemannian metrics,
More informationFoundations of Nonparametric Bayesian Methods
1 / 27 Foundations of Nonparametric Bayesian Methods Part II: Models on the Simplex Peter Orbanz http://mlg.eng.cam.ac.uk/porbanz/npb-tutorial.html 2 / 27 Tutorial Overview Part I: Basics Part II: Models
More informationFuncICA for time series pattern discovery
FuncICA for time series pattern discovery Nishant Mehta and Alexander Gray Georgia Institute of Technology The problem Given a set of inherently continuous time series (e.g. EEG) Find a set of patterns
More informationInformation geometry of mirror descent
Information geometry of mirror descent Geometric Science of Information Anthea Monod Department of Statistical Science Duke University Information Initiative at Duke G. Raskutti (UW Madison) and S. Mukherjee
More informationMidterm Examination. STA 215: Statistical Inference. Due Wednesday, 2006 Mar 8, 1:15 pm
Midterm Examination STA 215: Statistical Inference Due Wednesday, 2006 Mar 8, 1:15 pm This is an open-book take-home examination. You may work on it during any consecutive 24-hour period you like; please
More informationNatural Evolution Strategies for Direct Search
Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationRiemannian geometry of surfaces
Riemannian geometry of surfaces In this note, we will learn how to make sense of the concepts of differential geometry on a surface M, which is not necessarily situated in R 3. This intrinsic approach
More informationIndependent Component Analysis
1 Independent Component Analysis Background paper: http://www-stat.stanford.edu/ hastie/papers/ica.pdf 2 ICA Problem X = AS where X is a random p-vector representing multivariate input measurements. S
More informationON THE FOLIATION OF SPACE-TIME BY CONSTANT MEAN CURVATURE HYPERSURFACES
ON THE FOLIATION OF SPACE-TIME BY CONSTANT MEAN CURVATURE HYPERSURFACES CLAUS GERHARDT Abstract. We prove that the mean curvature τ of the slices given by a constant mean curvature foliation can be used
More informationThe Variational Gaussian Approximation Revisited
The Variational Gaussian Approximation Revisited Manfred Opper Cédric Archambeau March 16, 2009 Abstract The variational approximation of posterior distributions by multivariate Gaussians has been much
More informationEnergy-Based Generative Adversarial Network
Energy-Based Generative Adversarial Network Energy-Based Generative Adversarial Network J. Zhao, M. Mathieu and Y. LeCun Learning to Draw Samples: With Application to Amoritized MLE for Generalized Adversarial
More informationA NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1
A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,
More informationMetric Spaces. Exercises Fall 2017 Lecturer: Viveka Erlandsson. Written by M.van den Berg
Metric Spaces Exercises Fall 2017 Lecturer: Viveka Erlandsson Written by M.van den Berg School of Mathematics University of Bristol BS8 1TW Bristol, UK 1 Exercises. 1. Let X be a non-empty set, and suppose
More informationPROBABILITY: LIMIT THEOREMS II, SPRING HOMEWORK PROBLEMS
PROBABILITY: LIMIT THEOREMS II, SPRING 218. HOMEWORK PROBLEMS PROF. YURI BAKHTIN Instructions. You are allowed to work on solutions in groups, but you are required to write up solutions on your own. Please
More informationConsistency of the maximum likelihood estimator for general hidden Markov models
Consistency of the maximum likelihood estimator for general hidden Markov models Jimmy Olsson Centre for Mathematical Sciences Lund University Nordstat 2012 Umeå, Sweden Collaborators Hidden Markov models
More information04. Random Variables: Concepts
University of Rhode Island DigitalCommons@URI Nonequilibrium Statistical Physics Physics Course Materials 215 4. Random Variables: Concepts Gerhard Müller University of Rhode Island, gmuller@uri.edu Creative
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationExercise 1 (Formula for connection 1-forms) Using the first structure equation, show that
1 Stokes s Theorem Let D R 2 be a connected compact smooth domain, so that D is a smooth embedded circle. Given a smooth function f : D R, define fdx dy fdxdy, D where the left-hand side is the integral
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationGSI Geometric Science of Information, Paris, August Dimensionality reduction for classification of stochastic fibre radiographs
GSI2013 - Geometric Science of Information, Paris, 28-30 August 2013 Dimensionality reduction for classification of stochastic fibre radiographs C.T.J. Dodson 1 and W.W. Sampson 2 School of Mathematics
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationSpring 2012 Math 541A Exam 1. X i, S 2 = 1 n. n 1. X i I(X i < c), T n =
Spring 2012 Math 541A Exam 1 1. (a) Let Z i be independent N(0, 1), i = 1, 2,, n. Are Z = 1 n n Z i and S 2 Z = 1 n 1 n (Z i Z) 2 independent? Prove your claim. (b) Let X 1, X 2,, X n be independent identically
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationMAS223 Statistical Inference and Modelling Exercises
MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,
More informationWeb Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.
Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D. Ruppert A. EMPIRICAL ESTIMATE OF THE KERNEL MIXTURE Here we
More informationChapter 2 Exponential Families and Mixture Families of Probability Distributions
Chapter 2 Exponential Families and Mixture Families of Probability Distributions The present chapter studies the geometry of the exponential family of probability distributions. It is not only a typical
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationGaussian predictive process models for large spatial data sets.
Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015 Overview
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationErrata for Vector and Geometric Calculus Printings 1-4
October 21, 2017 Errata for Vector and Geometric Calculus Printings 1-4 Note: p. m (n) refers to page m of Printing 4 and page n of Printings 1-3. p. 31 (29), just before Theorem 3.10. f x(h) = [f x][h]
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More informationAn Information Geometric Perspective on Active Learning
An Information Geometric Perspective on Active Learning Chen-Hsiang Yeang Artificial Intelligence Lab, MIT, Cambridge, MA 02139, USA {chyeang}@ai.mit.edu Abstract. The Fisher information matrix plays a
More informationDependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.
Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationIntroduction to Probability and Stocastic Processes - Part I
Introduction to Probability and Stocastic Processes - Part I Lecture 2 Henrik Vie Christensen vie@control.auc.dk Department of Control Engineering Institute of Electronic Systems Aalborg University Denmark
More informationOutline. Supervised Learning. Hong Chang. Institute of Computing Technology, Chinese Academy of Sciences. Machine Learning Methods (Fall 2012)
Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Linear Models for Regression Linear Regression Probabilistic Interpretation
More informationExpectation Propagation for Approximate Bayesian Inference
Expectation Propagation for Approximate Bayesian Inference José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department February 5, 2007 1/ 24 Bayesian Inference Inference Given
More informationData assimilation with and without a model
Data assimilation with and without a model Tyrus Berry George Mason University NJIT Feb. 28, 2017 Postdoc supported by NSF This work is in collaboration with: Tim Sauer, GMU Franz Hamilton, Postdoc, NCSU
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationCS229T/STATS231: Statistical Learning Theory. Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018
CS229T/STATS231: Statistical Learning Theory Lecturer: Tengyu Ma Lecture 11 Scribe: Jongho Kim, Jamie Kang October 29th, 2018 1 Overview This lecture mainly covers Recall the statistical theory of GANs
More informationIndependent Component Analysis on the Basis of Helmholtz Machine
Independent Component Analysis on the Basis of Helmholtz Machine Masashi OHATA *1 ohatama@bmc.riken.go.jp Toshiharu MUKAI *1 tosh@bmc.riken.go.jp Kiyotoshi MATSUOKA *2 matsuoka@brain.kyutech.ac.jp *1 Biologically
More informationStatistics 203: Introduction to Regression and Analysis of Variance Course review
Statistics 203: Introduction to Regression and Analysis of Variance Course review Jonathan Taylor - p. 1/?? Today Review / overview of what we learned. - p. 2/?? General themes in regression models Specifying
More informationProperties of the boundary rg flow
Properties of the boundary rg flow Daniel Friedan Department of Physics & Astronomy Rutgers the State University of New Jersey, USA Natural Science Institute University of Iceland 82ème rencontre entre
More informationDynamic System Identification using HDMR-Bayesian Technique
Dynamic System Identification using HDMR-Bayesian Technique *Shereena O A 1) and Dr. B N Rao 2) 1), 2) Department of Civil Engineering, IIT Madras, Chennai 600036, Tamil Nadu, India 1) ce14d020@smail.iitm.ac.in
More informationUNIVERSITY of LIMERICK OLLSCOIL LUIMNIGH
UNIVERSITY of LIMERICK OLLSCOIL LUIMNIGH Faculty of Science and Engineering Department of Mathematics and Statistics END OF SEMESTER ASSESSMENT PAPER MODULE CODE: MA4006 SEMESTER: Spring 2011 MODULE TITLE:
More information3. Probability and Statistics
FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important
More information