Mutual Information and Optimal Data Coding

Size: px
Start display at page:

Download "Mutual Information and Optimal Data Coding"

Transcription

1 Mutual Information and Optimal Data Coding May 9 th 2012 Jules de Tibeiro Université de Moncton à Shippagan Bernard Colin François Dubeau Hussein Khreibani Université de Sherbooe

2 Abstract Introduction and Motivation Example Theoretical Framewor φ Divergence Mutual Information Optimal Partition Mutual Information explained by a partition Existence of an Optimal Partition Computational Aspects and Examples Conclusions and Perspectives References 2

3 Abstract Based on the notion of mutual information between the components of a random vector An optimal quantization of the support of its probability measure A simultaneous discretization of the whole set of the components of the random vector The stochastic dependence between them Key words: Divergence, mutual information, copula, optimal quantization 3

4 Introduction and Motivation An optimal discretization of the support of a continuous multivariate distribution To retain the stochastic dependence between the variables X = X 1, X 2,, X a random vector with values in R, β R, P X Where P X is the probability measure of X and S PX = R is the support of P X n = n 1 n 2 n a product of given integers A partition P of S PX in n elements or classes A partition P is a product partition deduced from partitions P 1, P 2,, P of the supports of the marginal probability measures in n 1 n 2 n intervals Using a mutual information criterion, choosing the set of all intervals such that the quantization of the support of S PX, retains the stochastic dependence between the components of the random vector X 4

5 Introduction and Motivation Here is the example for which such an optimal discretization might be desirable Let us suppose that we have a sample of individuals on which we observe the following variables: X = age, Y = salary, Z = socioprofessional group If we want to tae into account the variables simultaneously as, for example, in multiple correspondence analysis, we have to put them on the same form by means of a discretization of the first two ones Instead of the usual independent categorization of the variables X and Y in a given number of classes (p for X and q for Y), it would be more relevant, using their stochastic dependence, to categorize simultaneously X and Y in pq classes (referred sometimes as a (p, q) partition), in order to preserve as much as possible the dependence between them Moreover, and depending on the values taen by the categorical variable Z, the (conditional) discretization of the random vector (X, Y ) must differ, from one class to the others, to tae into account the stochastic dependence between the continuous random variables and the categorical one Usually, we do not tae care of this dependence in creating classes for continuous random variables However, the dependence between X = age and Y = salary are certainly quite different between the socioprofessional groups. 5

6 Let (Ω, F, μ) be a measured space φ Divergence μ 1 and μ 2 be two probability measures defined on F, such that μ i μ for i = 1,2 φ divergence or the generalized divergence (Csiszár[2]) between μ 1 and μ 2 I φ μ 1, μ 2 = φ dμ 1 dμ 2 dμ 2 = φ f 1 f 2 f 2 dμ where φ t is a convex function from R + \ 0 to R and where f i = dμ 1 dμ I φ μ 1, μ 2 does not depend on the choice of μ for i = 1,2 Homogenous models I φ μ 1, μ 2 = dμ 2 dμ 1 φ dμ 1 dμ 2 dμ 1 = f 2 f 1 φ f 1 f 2 f 1 dμ 6

7 φ Divergence Usual measures of φ divergence φ (x) Name x ln x; x 1 ln x Kullbac and Leibler x 1 Distance in variation ( x 1) 2 Hellinger 1 x α ; 0 < α < 1 Chernoff (x 1) 2 χ 2 [1 x 1 m] m ; m > 0 Jeffreys 7

8 Mutual Information Let (Ω, F, P) be a probability space X 1, X 2,, X be random variables defined on (Ω, F, P) with values in measured spaces X i, F i, λ i i = 1,2,, Denote respectively by P X = P X1,X 2,,X by i=1 P X probability measures defined on the product space (X i=1 χ i, i=1 F i, i=1 λ i ) Equal to the joint probability measure and to the product of the marginal ones, Supposed to be absolutely continuous with respect to the product measure λ = i=1 λ i 8

9 Definition:1 Mutual Information The φ mutual information or the mutual information between the random variables X 1, X 2,, X, is given by: I φ X 1, X 2,, X = I φ P X, i=1 P Xi = φ dp X d i=1 P Xi d i=1 P Xi = φ( f 1 f 2 )f 2 dλ where f 1 and f 2 are the probability density functions of the measures P X and i=1 P Xi with respect to λ = i=1 λ i 9

10 Mutual Information explained by a partition Random vector X defined on (Ω, F, P) has values in (R, β R ) P X - probability measure (P X λ where λ is the Lebesgue measure on R ) Support S PX may be assumed of the form i=1 every i = 1,2,, [a i, b i ] where < a i < b i < for Given integers n 1, n 2,, n, let P i for i = 1,2,, be a partition of [a i, b i ] in n i intervals {γ iji } such that a i = x i0 < x i1 < < x ini 1 < x ini = b i γ iji = [x iji 1 < x iji ] for j i = 1,2,, n i 1 and γ ini = [x ini 1, b i ] Product-partition P = i=1 P i of S PX in n = n 1 n 2 n rectangles of R P = γ 1j1 γ 2j2 γ j = { i=1 γ iji }; every i: j i = 1,2,, n i 10

11 Mutual Information explained by a partition Product-partition P = i=1 P i of S PX in n = n 1 n 2 n rectangles of R P = γ 1j1 γ 2j2 γ j = { i=1 γ iji }; every i: j i = 1,2,, n i If σ(p) denotes the σ-algebra generated by P, the restriction of P X to σ(p) is given by P X ( i=1 γ iji ) for every j 1, j 2,, j whose marginal are, for every i = 1,2,, : P X i 1 r=1 a r, b r γ iji r=i+1 a r, b r = P X (γ iji ) the mutual information, denoted by I φ (P), explained by partition P of the support of S PX I φ P = φ( P X( i=1γiji ) j 1,j 2,,j i=1 P Xi (γ iji ) i=1 ) P Xi (γ iji ) 11

12 Existence of an Optimal Partition For given integers n 1, n 2,, n and for every i = 1,2,, P i,ni - the class of partitions of [a i, b i ] in n i disjoint intervals P n - the class of partitions of S PX given by P n = i=1 where n is the multi index (n 1, n 2,, n ) Each element P of P n may be considered as a vector of R i=1 (n i+1) having components P i,ni (a 1, x 11,, x 1n1 1, b 1, a 2, x 21,, x 2n1 1, b 2,, a, x 1,, x n 1, b ), Under the constraints: a i < x i1 < < x ini 1 < b i for every i = 1,2,, A partition P of S PX, for which the mutual information loss is minimum, solve the optimization problem: min P P n (I φ X 1, X 2,, X I φ (P)), which is equivalent to: max P P n I φ P = max P P n j 1,j 2,,j φ( P X( i=1γiji ) i=1 P Xi (γ iji ) i=1 ) P Xi (γ iji ) 12

13 Computational Aspects and Examples Consider the case of a bivariate random vector X = (X 1, X 2 ) with probability density function f x 1, x 2 whose support is [0,1] 2 For each component, let respectively: 0 = x 10 < x 11 < x 12 < x 1i < < x 1p 1 < x 1p = 1, and 0 = x 20 < x 21 < x 22 < x 2j < < x 2q 1 < x 2q = 1, the ends of intervals of two partitions of [0,1] in respectively p and q elements For i = 1,2,, p and j = 1,2,, q, the probability measure of a rectangle x 1i 1, x 1i x 2j 1, x 2j is given by: x 1i x 1i 1 x 2j x 2j 1 f x 1, x 2 dx 1 dx 2 = p ij While its product probability measure is expressed as: x 1i x f 1 x 1 dx 1 2j q p f x 1i 1 2 x 2 dx x 2j 1 2 = p i+ p +j with p i+ = j=1 p ij and p +j = i=1 p ij 13

14 Computational Aspects and Examples The approximation of the mutual information between the random variables X 1 and X 2, conveyed by the discrete probability measure {p ij } is given by p i=1 q j=1 φ( p ij p i+ p +j ) p i+ p +j And for given p and q and f(x 1, x 2 ), one has to maximize the following expression; max x 1i,{x 2j } p i=1 q j=1 φ x 1i x 1i 1 x 2j x 2j 1 f x 1,x 2 dx 1 dx 2 x x 1i 2j x f 1 x 1 dx 1 f 2 x 2 dx 2 1i 1 x 1i f 1 x 1 dx 1 f x 1i 1 2 x 2 dx x 2j 1 2 x 2j 1 x 2j The well nown method of feasible directions in Zoutendiji [3] and also in Berseas [1] 14

15 Example Computational Aspects and Examples Let X = X 1, X 2 ~ε 2 θ 1 θ 1 be a bivariate exponential random vector, whose probability density function is given by f x 1, x 2 = e x 1 x θ 2θ e x 1 + e x 2 2e x 1 x 2 I R+ 2 x 1, x 2 Let C u 1, u 2 be its copula whose probability density function c u 1, u 2 is c u 1, u 2 = 1 + θ 1 2u 1 1 2u 2 I 0,1 2 u 1, u 2 This family of distribution is also nown as Farlie-Gumbel-Morgensten class 15

16 Conclusions and Perspectives In data mining, the choice of a parametric statistical model is not quite realistic due to a huge number of variables and data and, in this case, a non parametric framewor is often more appropriate To estimate the probability density function of a random vector, we will use a ernel density estimator in order to evaluate the mutual information between its components and study the effects of the choice of the ernel on the robustness of the optimal partition In Multiple Correspondence analysis (MC) and in Classification, we have often to deal simultaneously with continuous and categorical variables, and it may be of interest to use an optimal partition in order to retain, as much as possible, the stochastic dependence between the random variables and we will explore the consequences of the choices of φ and of an optimal partition P on these models Finally, will develop user friendly software to perform optimal coding in the nonparametric and semi parametric cases 16

17 References [1] D.P. Berseas, Nonlinear Programming 2 nd Ed, Athena Scientific, Belmont, Mass., 1990 [2] I. Csiszár, Information-type measures of difference of probability distributions and indirect observations, Studia Scientiarum Mathematicarum Hungarica, 2 (1967), [3] G. Zoutendij, Methods of feasible directions, Elsevier, Amsterdam and D. VanNostrand, Princeton, N.J,

Two Useful Bounds for Variational Inference

Two Useful Bounds for Variational Inference Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the

More information

Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables

Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables László Antal, atalie Shlomo, Mark Elliot University of Manchester laszlo.antal@postgrad.manchester.ac.uk 8 September

More information

CS Lecture 19. Exponential Families & Expectation Propagation

CS Lecture 19. Exponential Families & Expectation Propagation CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces

More information

Extreme Value Analysis and Spatial Extremes

Extreme Value Analysis and Spatial Extremes Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

A Brief Introduction to Copulas

A Brief Introduction to Copulas A Brief Introduction to Copulas Speaker: Hua, Lei February 24, 2009 Department of Statistics University of British Columbia Outline Introduction Definition Properties Archimedean Copulas Constructing Copulas

More information

A Conditional Approach to Modeling Multivariate Extremes

A Conditional Approach to Modeling Multivariate Extremes A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying

More information

Maximization of the information divergence from the multinomial distributions 1

Maximization of the information divergence from the multinomial distributions 1 aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics

More information

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,

Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C, Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: PMF case: p(x, y, z) is the joint Probability Mass Function of X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) PDF case: f(x, y, z) is

More information

Probability Distribution And Density For Functional Random Variables

Probability Distribution And Density For Functional Random Variables Probability Distribution And Density For Functional Random Variables E. Cuvelier 1 M. Noirhomme-Fraiture 1 1 Institut d Informatique Facultés Universitaires Notre-Dame de la paix Namur CIL Research Contact

More information

Goodness of Fit Test and Test of Independence by Entropy

Goodness of Fit Test and Test of Independence by Entropy Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi

More information

Information geometry for bivariate distribution control

Information geometry for bivariate distribution control Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic

More information

Integration on Measure Spaces

Integration on Measure Spaces Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of

More information

Quasi-copulas and signed measures

Quasi-copulas and signed measures Quasi-copulas and signed measures Roger B. Nelsen Department of Mathematical Sciences, Lewis & Clark College, Portland (USA) José Juan Quesada-Molina Department of Applied Mathematics, University of Granada

More information

Model Validation: A Probabilistic Formulation

Model Validation: A Probabilistic Formulation 2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) Orlando, FL, USA, December 12-15, 2011 Model Validation: A Probabilistic Formulation Abhishe Halder and Ratim

More information

An Introduction to Expectation-Maximization

An Introduction to Expectation-Maximization An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative

More information

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure

The Information Bottleneck Revisited or How to Choose a Good Distortion Measure The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

Research Article On Some Improvements of the Jensen Inequality with Some Applications

Research Article On Some Improvements of the Jensen Inequality with Some Applications Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with

More information

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.

LECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test. Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the

More information

16.4. Power Series. Introduction. Prerequisites. Learning Outcomes

16.4. Power Series. Introduction. Prerequisites. Learning Outcomes Power Series 6.4 Introduction In this Section we consider power series. These are examples of infinite series where each term contains a variable, x, raised to a positive integer power. We use the ratio

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Chapter I : Basic Notions

Chapter I : Basic Notions Chapter I : Basic Notions - Image Processing - Lattices - Residues - Measurements J. Serra Ecole des Mines de Paris ( 2000 ) Course on Math. Morphology I. 1 Image Processing (I) => Image processing may

More information

Estimation Under Multivariate Inverse Weibull Distribution

Estimation Under Multivariate Inverse Weibull Distribution Global Journal of Pure and Applied Mathematics. ISSN 097-768 Volume, Number 8 (07), pp. 4-4 Research India Publications http://www.ripublication.com Estimation Under Multivariate Inverse Weibull Distribution

More information

Statistics (1): Estimation

Statistics (1): Estimation Statistics (1): Estimation Marco Banterlé, Christian Robert and Judith Rousseau Practicals 2014-2015 L3, MIDO, Université Paris Dauphine 1 Table des matières 1 Random variables, probability, expectation

More information

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 1 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

9. Geometric problems

9. Geometric problems 9. Geometric problems EE/AA 578, Univ of Washington, Fall 2016 projection on a set extremal volume ellipsoids centering classification 9 1 Projection on convex set projection of point x on set C defined

More information

Multivariate Measures of Positive Dependence

Multivariate Measures of Positive Dependence Int. J. Contemp. Math. Sciences, Vol. 4, 2009, no. 4, 191-200 Multivariate Measures of Positive Dependence Marta Cardin Department of Applied Mathematics University of Venice, Italy mcardin@unive.it Abstract

More information

2 Lebesgue integration

2 Lebesgue integration 2 Lebesgue integration 1. Let (, A, µ) be a measure space. We will always assume that µ is complete, otherwise we first take its completion. The example to have in mind is the Lebesgue measure on R n,

More information

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1 Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Probability Distributions and Estimation of Ali-Mikhail-Haq Copula

Probability Distributions and Estimation of Ali-Mikhail-Haq Copula Applied Mathematical Sciences, Vol. 4, 2010, no. 14, 657-666 Probability Distributions and Estimation of Ali-Mikhail-Haq Copula Pranesh Kumar Mathematics Department University of Northern British Columbia

More information

Recursive Least Squares for an Entropy Regularized MSE Cost Function

Recursive Least Squares for an Entropy Regularized MSE Cost Function Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1

Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1 Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees Alfred Hero Dept. of EECS, The university of Michigan, Ann Arbor, MI 489-, USA Email: hero@eecs.umich.edu Olivier J.J. Michel

More information

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction

A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18 EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko

More information

Introduction to Smoothing spline ANOVA models (metamodelling)

Introduction to Smoothing spline ANOVA models (metamodelling) Introduction to Smoothing spline ANOVA models (metamodelling) M. Ratto DYNARE Summer School, Paris, June 215. Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting

More information

8. Geometric problems

8. Geometric problems 8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 Minimum volume ellipsoid around a set Löwner-John ellipsoid

More information

Lecture 6 Basic Probability

Lecture 6 Basic Probability Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic

More information

On the Choice of Parametric Families of Copulas

On the Choice of Parametric Families of Copulas On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem

Chapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by

More information

Semi-parametric predictive inference for bivariate data using copulas

Semi-parametric predictive inference for bivariate data using copulas Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1

More information

Various types of likelihood

Various types of likelihood Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,

More information

A View on Extension of Utility-Based on Links with Information Measures

A View on Extension of Utility-Based on Links with Information Measures Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,

More information

Machine Learning Lecture Notes

Machine Learning Lecture Notes Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective

More information

MAXIMUM ENTROPIES COPULAS

MAXIMUM ENTROPIES COPULAS MAXIMUM ENTROPIES COPULAS Doriano-Boris Pougaza & Ali Mohammad-Djafari Groupe Problèmes Inverses Laboratoire des Signaux et Systèmes (UMR 8506 CNRS - SUPELEC - UNIV PARIS SUD) Supélec, Plateau de Moulon,

More information

Convexity/Concavity of Renyi Entropy and α-mutual Information

Convexity/Concavity of Renyi Entropy and α-mutual Information Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au

More information

Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business

Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business Michal Pešta Charles University in Prague Faculty of Mathematics and Physics Ostap Okhrin Dresden University of Technology

More information

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)

REVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B) REVIEW OF MAIN CONCEPTS AND FORMULAS Boolean algebra of events (subsets of a sample space) DeMorgan s formula: A B = Ā B A B = Ā B The notion of conditional probability, and of mutual independence of two

More information

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction

A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction THE TEACHING OF MATHEMATICS 2006, Vol IX,, pp 2 A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY Zoran R Pop-Stojanović Abstract How to introduce the concept of the Markov Property in an elementary

More information

Lecture 10: Semi-Markov Type Processes

Lecture 10: Semi-Markov Type Processes Lecture 1: Semi-Markov Type Processes 1. Semi-Markov processes (SMP) 1.1 Definition of SMP 1.2 Transition probabilities for SMP 1.3 Hitting times and semi-markov renewal equations 2. Processes with semi-markov

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

The main results about probability measures are the following two facts:

The main results about probability measures are the following two facts: Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a

More information

Maximization of a Strongly Unimodal Multivariate Discrete Distribution

Maximization of a Strongly Unimodal Multivariate Discrete Distribution R u t c o r Research R e p o r t Maximization of a Strongly Unimodal Multivariate Discrete Distribution Mine Subasi a Ersoy Subasi b András Prékopa c RRR 12-2009, July 2009 RUTCOR Rutgers Center for Operations

More information

Multiple Random Variables

Multiple Random Variables Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An

More information

Local time of self-affine sets of Brownian motion type and the jigsaw puzzle problem

Local time of self-affine sets of Brownian motion type and the jigsaw puzzle problem Local time of self-affine sets of Brownian motion type and the jigsaw puzzle problem (Journal of Mathematical Analysis and Applications 49 (04), pp.79-93) Yu-Mei XUE and Teturo KAMAE Abstract Let Ω [0,

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)

Diffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions

More information

Chapter 1: Probability Theory Lecture 1: Measure space and measurable function

Chapter 1: Probability Theory Lecture 1: Measure space and measurable function Chapter 1: Probability Theory Lecture 1: Measure space and measurable function Random experiment: uncertainty in outcomes Ω: sample space: a set containing all possible outcomes Definition 1.1 A collection

More information

Modeling Dependence of Daily Stock Prices and Making Predictions of Future Movements

Modeling Dependence of Daily Stock Prices and Making Predictions of Future Movements Modeling Dependence of Daily Stock Prices and Making Predictions of Future Movements Taavi Tamkivi, prof Tõnu Kollo Institute of Mathematical Statistics University of Tartu 29. June 2007 Taavi Tamkivi,

More information

Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad

Lecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad 40.850: athematical Foundation of Big Data Analysis Spring 206 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Lecturer: Fang Han arch 07 Disclaimer: These notes have not been subjected to the

More information

The Regularized EM Algorithm

The Regularized EM Algorithm The Regularized EM Algorithm Haifeng Li Department of Computer Science University of California Riverside, CA 92521 hli@cs.ucr.edu Keshu Zhang Human Interaction Research Lab Motorola, Inc. Tempe, AZ 85282

More information

Bayes rule and Bayes error. Donglin Zeng, Department of Biostatistics, University of North Carolina

Bayes rule and Bayes error. Donglin Zeng, Department of Biostatistics, University of North Carolina Bayes rule and Bayes error Definition If f minimizes E[L(Y, f (X))], then f is called a Bayes rule (associated with the loss function L(y, f )) and the resulting prediction error rate, E[L(Y, f (X))],

More information

Random experiments may consist of stages that are performed. Example: Roll a die two times. Consider the events E 1 = 1 or 2 on first roll

Random experiments may consist of stages that are performed. Example: Roll a die two times. Consider the events E 1 = 1 or 2 on first roll Econ 514: Probability and Statistics Lecture 4: Independence Stochastic independence Random experiments may consist of stages that are performed independently. Example: Roll a die two times. Consider the

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Studia Scientiarum Mathematicarum Hungarica 42 (2), (2005) Communicated by D. Miklós

Studia Scientiarum Mathematicarum Hungarica 42 (2), (2005) Communicated by D. Miklós Studia Scientiarum Mathematicarum Hungarica 4 (), 7 6 (5) A METHOD TO FIND THE BEST BOUNDS IN A MULTIVARIATE DISCRETE MOMENT PROBLEM IF THE BASIS STRUCTURE IS GIVEN G MÁDI-NAGY Communicated by D Miklós

More information

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1

A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,

More information

Dissimilarity, Quasimetric, Metric

Dissimilarity, Quasimetric, Metric Dissimilarity, Quasimetric, Metric Ehtibar N. Dzhafarov Purdue University and Swedish Collegium for Advanced Study Abstract Dissimilarity is a function that assigns to every pair of stimuli a nonnegative

More information

Lecture Quantitative Finance Spring Term 2015

Lecture Quantitative Finance Spring Term 2015 on bivariate Lecture Quantitative Finance Spring Term 2015 Prof. Dr. Erich Walter Farkas Lecture 07: April 2, 2015 1 / 54 Outline on bivariate 1 2 bivariate 3 Distribution 4 5 6 7 8 Comments and conclusions

More information

1 Variance of a Random Variable

1 Variance of a Random Variable Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation

More information

Research Article Global Existence and Boundedness of Solutions to a Second-Order Nonlinear Differential System

Research Article Global Existence and Boundedness of Solutions to a Second-Order Nonlinear Differential System Applied Mathematics Volume 212, Article ID 63783, 12 pages doi:1.1155/212/63783 Research Article Global Existence and Boundedness of Solutions to a Second-Order Nonlinear Differential System Changjian

More information

Entropy and Large Deviations

Entropy and Large Deviations Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015 Entropy comes up in many different contexts.

More information

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.

(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M. 1. Abstract Integration The main reference for this section is Rudin s Real and Complex Analysis. The purpose of developing an abstract theory of integration is to emphasize the difference between the

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

(2) E M = E C = X\E M

(2) E M = E C = X\E M 10 RICHARD B. MELROSE 2. Measures and σ-algebras An outer measure such as µ is a rather crude object since, even if the A i are disjoint, there is generally strict inequality in (1.14). It turns out to

More information

Cross entropy-based importance sampling using Gaussian densities revisited

Cross entropy-based importance sampling using Gaussian densities revisited Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,, Iason Papaioannou a, Daniel Straub a a Engineering Ris Analysis Group, Technische Universität München, Arcisstraße

More information

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that

Exponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that 1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1

More information

Variational Inference. Sargur Srihari

Variational Inference. Sargur Srihari Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized

More information

Copula based Divergence Measures and their use in Image Registration

Copula based Divergence Measures and their use in Image Registration 7th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 Copula based Divergence Measures and their use in Image Registration T S Durrani and Xuexing Zeng Centre of Excellence

More information

THEOREMS, ETC., FOR MATH 516

THEOREMS, ETC., FOR MATH 516 THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

An Extended BIC for Model Selection

An Extended BIC for Model Selection An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

Robust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach

Robust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach Robust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach Federico Belotti Giuseppe Ilardi CEIS, University of Rome Tor Vergata Bank of Italy Workshop on the Econometrics and Statistics

More information

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich Modelling Dependence with Copulas and Applications to Risk Management Filip Lindskog, RiskLab, ETH Zürich 02-07-2000 Home page: http://www.math.ethz.ch/ lindskog E-mail: lindskog@math.ethz.ch RiskLab:

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

The Distribution of Partially Exchangeable Random Variables

The Distribution of Partially Exchangeable Random Variables The Distribution of Partially Exchangeable Random Variables Hanxiang Peng a,1,, Xin Dang b,1, Xueqin Wang c,2 a Department of Mathematical Sciences, Indiana University-Purdue University at Indianapolis,

More information

Introduction to Restricted Boltzmann Machines

Introduction to Restricted Boltzmann Machines Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Data Mining and Analysis: Fundamental Concepts and Algorithms

Data Mining and Analysis: Fundamental Concepts and Algorithms Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department

More information