Mutual Information and Optimal Data Coding
|
|
- Melissa Poole
- 5 years ago
- Views:
Transcription
1 Mutual Information and Optimal Data Coding May 9 th 2012 Jules de Tibeiro Université de Moncton à Shippagan Bernard Colin François Dubeau Hussein Khreibani Université de Sherbooe
2 Abstract Introduction and Motivation Example Theoretical Framewor φ Divergence Mutual Information Optimal Partition Mutual Information explained by a partition Existence of an Optimal Partition Computational Aspects and Examples Conclusions and Perspectives References 2
3 Abstract Based on the notion of mutual information between the components of a random vector An optimal quantization of the support of its probability measure A simultaneous discretization of the whole set of the components of the random vector The stochastic dependence between them Key words: Divergence, mutual information, copula, optimal quantization 3
4 Introduction and Motivation An optimal discretization of the support of a continuous multivariate distribution To retain the stochastic dependence between the variables X = X 1, X 2,, X a random vector with values in R, β R, P X Where P X is the probability measure of X and S PX = R is the support of P X n = n 1 n 2 n a product of given integers A partition P of S PX in n elements or classes A partition P is a product partition deduced from partitions P 1, P 2,, P of the supports of the marginal probability measures in n 1 n 2 n intervals Using a mutual information criterion, choosing the set of all intervals such that the quantization of the support of S PX, retains the stochastic dependence between the components of the random vector X 4
5 Introduction and Motivation Here is the example for which such an optimal discretization might be desirable Let us suppose that we have a sample of individuals on which we observe the following variables: X = age, Y = salary, Z = socioprofessional group If we want to tae into account the variables simultaneously as, for example, in multiple correspondence analysis, we have to put them on the same form by means of a discretization of the first two ones Instead of the usual independent categorization of the variables X and Y in a given number of classes (p for X and q for Y), it would be more relevant, using their stochastic dependence, to categorize simultaneously X and Y in pq classes (referred sometimes as a (p, q) partition), in order to preserve as much as possible the dependence between them Moreover, and depending on the values taen by the categorical variable Z, the (conditional) discretization of the random vector (X, Y ) must differ, from one class to the others, to tae into account the stochastic dependence between the continuous random variables and the categorical one Usually, we do not tae care of this dependence in creating classes for continuous random variables However, the dependence between X = age and Y = salary are certainly quite different between the socioprofessional groups. 5
6 Let (Ω, F, μ) be a measured space φ Divergence μ 1 and μ 2 be two probability measures defined on F, such that μ i μ for i = 1,2 φ divergence or the generalized divergence (Csiszár[2]) between μ 1 and μ 2 I φ μ 1, μ 2 = φ dμ 1 dμ 2 dμ 2 = φ f 1 f 2 f 2 dμ where φ t is a convex function from R + \ 0 to R and where f i = dμ 1 dμ I φ μ 1, μ 2 does not depend on the choice of μ for i = 1,2 Homogenous models I φ μ 1, μ 2 = dμ 2 dμ 1 φ dμ 1 dμ 2 dμ 1 = f 2 f 1 φ f 1 f 2 f 1 dμ 6
7 φ Divergence Usual measures of φ divergence φ (x) Name x ln x; x 1 ln x Kullbac and Leibler x 1 Distance in variation ( x 1) 2 Hellinger 1 x α ; 0 < α < 1 Chernoff (x 1) 2 χ 2 [1 x 1 m] m ; m > 0 Jeffreys 7
8 Mutual Information Let (Ω, F, P) be a probability space X 1, X 2,, X be random variables defined on (Ω, F, P) with values in measured spaces X i, F i, λ i i = 1,2,, Denote respectively by P X = P X1,X 2,,X by i=1 P X probability measures defined on the product space (X i=1 χ i, i=1 F i, i=1 λ i ) Equal to the joint probability measure and to the product of the marginal ones, Supposed to be absolutely continuous with respect to the product measure λ = i=1 λ i 8
9 Definition:1 Mutual Information The φ mutual information or the mutual information between the random variables X 1, X 2,, X, is given by: I φ X 1, X 2,, X = I φ P X, i=1 P Xi = φ dp X d i=1 P Xi d i=1 P Xi = φ( f 1 f 2 )f 2 dλ where f 1 and f 2 are the probability density functions of the measures P X and i=1 P Xi with respect to λ = i=1 λ i 9
10 Mutual Information explained by a partition Random vector X defined on (Ω, F, P) has values in (R, β R ) P X - probability measure (P X λ where λ is the Lebesgue measure on R ) Support S PX may be assumed of the form i=1 every i = 1,2,, [a i, b i ] where < a i < b i < for Given integers n 1, n 2,, n, let P i for i = 1,2,, be a partition of [a i, b i ] in n i intervals {γ iji } such that a i = x i0 < x i1 < < x ini 1 < x ini = b i γ iji = [x iji 1 < x iji ] for j i = 1,2,, n i 1 and γ ini = [x ini 1, b i ] Product-partition P = i=1 P i of S PX in n = n 1 n 2 n rectangles of R P = γ 1j1 γ 2j2 γ j = { i=1 γ iji }; every i: j i = 1,2,, n i 10
11 Mutual Information explained by a partition Product-partition P = i=1 P i of S PX in n = n 1 n 2 n rectangles of R P = γ 1j1 γ 2j2 γ j = { i=1 γ iji }; every i: j i = 1,2,, n i If σ(p) denotes the σ-algebra generated by P, the restriction of P X to σ(p) is given by P X ( i=1 γ iji ) for every j 1, j 2,, j whose marginal are, for every i = 1,2,, : P X i 1 r=1 a r, b r γ iji r=i+1 a r, b r = P X (γ iji ) the mutual information, denoted by I φ (P), explained by partition P of the support of S PX I φ P = φ( P X( i=1γiji ) j 1,j 2,,j i=1 P Xi (γ iji ) i=1 ) P Xi (γ iji ) 11
12 Existence of an Optimal Partition For given integers n 1, n 2,, n and for every i = 1,2,, P i,ni - the class of partitions of [a i, b i ] in n i disjoint intervals P n - the class of partitions of S PX given by P n = i=1 where n is the multi index (n 1, n 2,, n ) Each element P of P n may be considered as a vector of R i=1 (n i+1) having components P i,ni (a 1, x 11,, x 1n1 1, b 1, a 2, x 21,, x 2n1 1, b 2,, a, x 1,, x n 1, b ), Under the constraints: a i < x i1 < < x ini 1 < b i for every i = 1,2,, A partition P of S PX, for which the mutual information loss is minimum, solve the optimization problem: min P P n (I φ X 1, X 2,, X I φ (P)), which is equivalent to: max P P n I φ P = max P P n j 1,j 2,,j φ( P X( i=1γiji ) i=1 P Xi (γ iji ) i=1 ) P Xi (γ iji ) 12
13 Computational Aspects and Examples Consider the case of a bivariate random vector X = (X 1, X 2 ) with probability density function f x 1, x 2 whose support is [0,1] 2 For each component, let respectively: 0 = x 10 < x 11 < x 12 < x 1i < < x 1p 1 < x 1p = 1, and 0 = x 20 < x 21 < x 22 < x 2j < < x 2q 1 < x 2q = 1, the ends of intervals of two partitions of [0,1] in respectively p and q elements For i = 1,2,, p and j = 1,2,, q, the probability measure of a rectangle x 1i 1, x 1i x 2j 1, x 2j is given by: x 1i x 1i 1 x 2j x 2j 1 f x 1, x 2 dx 1 dx 2 = p ij While its product probability measure is expressed as: x 1i x f 1 x 1 dx 1 2j q p f x 1i 1 2 x 2 dx x 2j 1 2 = p i+ p +j with p i+ = j=1 p ij and p +j = i=1 p ij 13
14 Computational Aspects and Examples The approximation of the mutual information between the random variables X 1 and X 2, conveyed by the discrete probability measure {p ij } is given by p i=1 q j=1 φ( p ij p i+ p +j ) p i+ p +j And for given p and q and f(x 1, x 2 ), one has to maximize the following expression; max x 1i,{x 2j } p i=1 q j=1 φ x 1i x 1i 1 x 2j x 2j 1 f x 1,x 2 dx 1 dx 2 x x 1i 2j x f 1 x 1 dx 1 f 2 x 2 dx 2 1i 1 x 1i f 1 x 1 dx 1 f x 1i 1 2 x 2 dx x 2j 1 2 x 2j 1 x 2j The well nown method of feasible directions in Zoutendiji [3] and also in Berseas [1] 14
15 Example Computational Aspects and Examples Let X = X 1, X 2 ~ε 2 θ 1 θ 1 be a bivariate exponential random vector, whose probability density function is given by f x 1, x 2 = e x 1 x θ 2θ e x 1 + e x 2 2e x 1 x 2 I R+ 2 x 1, x 2 Let C u 1, u 2 be its copula whose probability density function c u 1, u 2 is c u 1, u 2 = 1 + θ 1 2u 1 1 2u 2 I 0,1 2 u 1, u 2 This family of distribution is also nown as Farlie-Gumbel-Morgensten class 15
16 Conclusions and Perspectives In data mining, the choice of a parametric statistical model is not quite realistic due to a huge number of variables and data and, in this case, a non parametric framewor is often more appropriate To estimate the probability density function of a random vector, we will use a ernel density estimator in order to evaluate the mutual information between its components and study the effects of the choice of the ernel on the robustness of the optimal partition In Multiple Correspondence analysis (MC) and in Classification, we have often to deal simultaneously with continuous and categorical variables, and it may be of interest to use an optimal partition in order to retain, as much as possible, the stochastic dependence between the random variables and we will explore the consequences of the choices of φ and of an optimal partition P on these models Finally, will develop user friendly software to perform optimal coding in the nonparametric and semi parametric cases 16
17 References [1] D.P. Berseas, Nonlinear Programming 2 nd Ed, Athena Scientific, Belmont, Mass., 1990 [2] I. Csiszár, Information-type measures of difference of probability distributions and indirect observations, Studia Scientiarum Mathematicarum Hungarica, 2 (1967), [3] G. Zoutendij, Methods of feasible directions, Elsevier, Amsterdam and D. VanNostrand, Princeton, N.J,
Two Useful Bounds for Variational Inference
Two Useful Bounds for Variational Inference John Paisley Department of Computer Science Princeton University, Princeton, NJ jpaisley@princeton.edu Abstract We review and derive two lower bounds on the
More informationMeasuring Disclosure Risk and Information Loss in Population Based Frequency Tables
Measuring Disclosure Risk and Information Loss in Population Based Frequency Tables László Antal, atalie Shlomo, Mark Elliot University of Manchester laszlo.antal@postgrad.manchester.ac.uk 8 September
More informationCS Lecture 19. Exponential Families & Expectation Propagation
CS 6347 Lecture 19 Exponential Families & Expectation Propagation Discrete State Spaces We have been focusing on the case of MRFs over discrete state spaces Probability distributions over discrete spaces
More informationExtreme Value Analysis and Spatial Extremes
Extreme Value Analysis and Department of Statistics Purdue University 11/07/2013 Outline Motivation 1 Motivation 2 Extreme Value Theorem and 3 Bayesian Hierarchical Models Copula Models Max-stable Models
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationECE 4400:693 - Information Theory
ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential
More informationA Brief Introduction to Copulas
A Brief Introduction to Copulas Speaker: Hua, Lei February 24, 2009 Department of Statistics University of British Columbia Outline Introduction Definition Properties Archimedean Copulas Constructing Copulas
More informationA Conditional Approach to Modeling Multivariate Extremes
A Approach to ing Multivariate Extremes By Heffernan & Tawn Department of Statistics Purdue University s April 30, 2014 Outline s s Multivariate Extremes s A central aim of multivariate extremes is trying
More informationMaximization of the information divergence from the multinomial distributions 1
aximization of the information divergence from the multinomial distributions Jozef Juríček Charles University in Prague Faculty of athematics and Physics Department of Probability and athematical Statistics
More informationMath 416 Lecture 2 DEFINITION. Here are the multivariate versions: X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) of X, Y, Z iff for all sets A, B, C,
Math 416 Lecture 2 DEFINITION. Here are the multivariate versions: PMF case: p(x, y, z) is the joint Probability Mass Function of X, Y, Z iff P(X = x, Y = y, Z =z) = p(x, y, z) PDF case: f(x, y, z) is
More informationProbability Distribution And Density For Functional Random Variables
Probability Distribution And Density For Functional Random Variables E. Cuvelier 1 M. Noirhomme-Fraiture 1 1 Institut d Informatique Facultés Universitaires Notre-Dame de la paix Namur CIL Research Contact
More informationGoodness of Fit Test and Test of Independence by Entropy
Journal of Mathematical Extension Vol. 3, No. 2 (2009), 43-59 Goodness of Fit Test and Test of Independence by Entropy M. Sharifdoost Islamic Azad University Science & Research Branch, Tehran N. Nematollahi
More informationInformation geometry for bivariate distribution control
Information geometry for bivariate distribution control C.T.J.Dodson + Hong Wang Mathematics + Control Systems Centre, University of Manchester Institute of Science and Technology Optimal control of stochastic
More informationIntegration on Measure Spaces
Chapter 3 Integration on Measure Spaces In this chapter we introduce the general notion of a measure on a space X, define the class of measurable functions, and define the integral, first on a class of
More informationQuasi-copulas and signed measures
Quasi-copulas and signed measures Roger B. Nelsen Department of Mathematical Sciences, Lewis & Clark College, Portland (USA) José Juan Quesada-Molina Department of Applied Mathematics, University of Granada
More informationModel Validation: A Probabilistic Formulation
2011 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC) Orlando, FL, USA, December 12-15, 2011 Model Validation: A Probabilistic Formulation Abhishe Halder and Ratim
More informationAn Introduction to Expectation-Maximization
An Introduction to Expectation-Maximization Dahua Lin Abstract This notes reviews the basics about the Expectation-Maximization EM) algorithm, a popular approach to perform model estimation of the generative
More informationThe Information Bottleneck Revisited or How to Choose a Good Distortion Measure
The Information Bottleneck Revisited or How to Choose a Good Distortion Measure Peter Harremoës Centrum voor Wiskunde en Informatica PO 94079, 1090 GB Amsterdam The Nederlands PHarremoes@cwinl Naftali
More informationCopulas. MOU Lili. December, 2014
Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More informationResearch Article On Some Improvements of the Jensen Inequality with Some Applications
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with
More informationLECTURE 10: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING. The last equality is provided so this can look like a more familiar parametric test.
Economics 52 Econometrics Professor N.M. Kiefer LECTURE 1: NEYMAN-PEARSON LEMMA AND ASYMPTOTIC TESTING NEYMAN-PEARSON LEMMA: Lesson: Good tests are based on the likelihood ratio. The proof is easy in the
More information16.4. Power Series. Introduction. Prerequisites. Learning Outcomes
Power Series 6.4 Introduction In this Section we consider power series. These are examples of infinite series where each term contains a variable, x, raised to a positive integer power. We use the ratio
More informationPrinciples of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata
Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision
More informationChapter I : Basic Notions
Chapter I : Basic Notions - Image Processing - Lattices - Residues - Measurements J. Serra Ecole des Mines de Paris ( 2000 ) Course on Math. Morphology I. 1 Image Processing (I) => Image processing may
More informationEstimation Under Multivariate Inverse Weibull Distribution
Global Journal of Pure and Applied Mathematics. ISSN 097-768 Volume, Number 8 (07), pp. 4-4 Research India Publications http://www.ripublication.com Estimation Under Multivariate Inverse Weibull Distribution
More informationStatistics (1): Estimation
Statistics (1): Estimation Marco Banterlé, Christian Robert and Judith Rousseau Practicals 2014-2015 L3, MIDO, Université Paris Dauphine 1 Table des matières 1 Random variables, probability, expectation
More informationCopula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011
Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011 Outline Ordinary Least Squares (OLS) Regression Generalized Linear Models
More information8. Geometric problems
8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 1 Minimum volume ellipsoid around a set Löwner-John ellipsoid
More informationA SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS
Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More information9. Geometric problems
9. Geometric problems EE/AA 578, Univ of Washington, Fall 2016 projection on a set extremal volume ellipsoids centering classification 9 1 Projection on convex set projection of point x on set C defined
More informationMultivariate Measures of Positive Dependence
Int. J. Contemp. Math. Sciences, Vol. 4, 2009, no. 4, 191-200 Multivariate Measures of Positive Dependence Marta Cardin Department of Applied Mathematics University of Venice, Italy mcardin@unive.it Abstract
More information2 Lebesgue integration
2 Lebesgue integration 1. Let (, A, µ) be a measure space. We will always assume that µ is complete, otherwise we first take its completion. The example to have in mind is the Lebesgue measure on R n,
More informationTheorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1
Chapter 2 Probability measures 1. Existence Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension to the generated σ-field Proof of Theorem 2.1. Let F 0 be
More informationStat 5101 Lecture Notes
Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random
More informationProbability Distributions and Estimation of Ali-Mikhail-Haq Copula
Applied Mathematical Sciences, Vol. 4, 2010, no. 14, 657-666 Probability Distributions and Estimation of Ali-Mikhail-Haq Copula Pranesh Kumar Mathematics Department University of Northern British Columbia
More informationRecursive Least Squares for an Entropy Regularized MSE Cost Function
Recursive Least Squares for an Entropy Regularized MSE Cost Function Deniz Erdogmus, Yadunandana N. Rao, Jose C. Principe Oscar Fontenla-Romero, Amparo Alonso-Betanzos Electrical Eng. Dept., University
More informationProduct measure and Fubini s theorem
Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω
More informationEstimation of Rényi Information Divergence via Pruned Minimal Spanning Trees 1
Estimation of Rényi Information Divergence via Pruned Minimal Spanning Trees Alfred Hero Dept. of EECS, The university of Michigan, Ann Arbor, MI 489-, USA Email: hero@eecs.umich.edu Olivier J.J. Michel
More informationA PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES. 1. Introduction
tm Tatra Mt. Math. Publ. 00 (XXXX), 1 10 A PARAMETRIC MODEL FOR DISCRETE-VALUED TIME SERIES Martin Janžura and Lucie Fialová ABSTRACT. A parametric model for statistical analysis of Markov chains type
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationEE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 18
EE/ACM 150 - Applications of Convex Optimization in Signal Processing and Communications Lecture 18 Andre Tkacenko Signal Processing Research Group Jet Propulsion Laboratory May 31, 2012 Andre Tkacenko
More informationIntroduction to Smoothing spline ANOVA models (metamodelling)
Introduction to Smoothing spline ANOVA models (metamodelling) M. Ratto DYNARE Summer School, Paris, June 215. Joint Research Centre www.jrc.ec.europa.eu Serving society Stimulating innovation Supporting
More information8. Geometric problems
8. Geometric problems Convex Optimization Boyd & Vandenberghe extremal volume ellipsoids centering classification placement and facility location 8 Minimum volume ellipsoid around a set Löwner-John ellipsoid
More informationLecture 6 Basic Probability
Lecture 6: Basic Probability 1 of 17 Course: Theory of Probability I Term: Fall 2013 Instructor: Gordan Zitkovic Lecture 6 Basic Probability Probability spaces A mathematical setup behind a probabilistic
More informationOn the Choice of Parametric Families of Copulas
On the Choice of Parametric Families of Copulas Radu Craiu Department of Statistics University of Toronto Collaborators: Mariana Craiu, University Politehnica, Bucharest Vienna, July 2008 Outline 1 Brief
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationChapter 8. General Countably Additive Set Functions. 8.1 Hahn Decomposition Theorem
Chapter 8 General Countably dditive Set Functions In Theorem 5.2.2 the reader saw that if f : X R is integrable on the measure space (X,, µ) then we can define a countably additive set function ν on by
More informationSemi-parametric predictive inference for bivariate data using copulas
Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1
More informationVarious types of likelihood
Various types of likelihood 1. likelihood, marginal likelihood, conditional likelihood, profile likelihood, adjusted profile likelihood 2. semi-parametric likelihood, partial likelihood 3. empirical likelihood,
More informationA View on Extension of Utility-Based on Links with Information Measures
Communications of the Korean Statistical Society 2009, Vol. 16, No. 5, 813 820 A View on Extension of Utility-Based on Links with Information Measures A.R. Hoseinzadeh a, G.R. Mohtashami Borzadaran 1,b,
More informationMachine Learning Lecture Notes
Machine Learning Lecture Notes Predrag Radivojac January 25, 205 Basic Principles of Parameter Estimation In probabilistic modeling, we are typically presented with a set of observations and the objective
More informationMAXIMUM ENTROPIES COPULAS
MAXIMUM ENTROPIES COPULAS Doriano-Boris Pougaza & Ali Mohammad-Djafari Groupe Problèmes Inverses Laboratoire des Signaux et Systèmes (UMR 8506 CNRS - SUPELEC - UNIV PARIS SUD) Supélec, Plateau de Moulon,
More informationConvexity/Concavity of Renyi Entropy and α-mutual Information
Convexity/Concavity of Renyi Entropy and -Mutual Information Siu-Wai Ho Institute for Telecommunications Research University of South Australia Adelaide, SA 5095, Australia Email: siuwai.ho@unisa.edu.au
More informationConditional Least Squares and Copulae in Claims Reserving for a Single Line of Business
Conditional Least Squares and Copulae in Claims Reserving for a Single Line of Business Michal Pešta Charles University in Prague Faculty of Mathematics and Physics Ostap Okhrin Dresden University of Technology
More informationREVIEW OF MAIN CONCEPTS AND FORMULAS A B = Ā B. Pr(A B C) = Pr(A) Pr(A B C) =Pr(A) Pr(B A) Pr(C A B)
REVIEW OF MAIN CONCEPTS AND FORMULAS Boolean algebra of events (subsets of a sample space) DeMorgan s formula: A B = Ā B A B = Ā B The notion of conditional probability, and of mutual independence of two
More informationA CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY. Zoran R. Pop-Stojanović. 1. Introduction
THE TEACHING OF MATHEMATICS 2006, Vol IX,, pp 2 A CLASSROOM NOTE: ENTROPY, INFORMATION, AND MARKOV PROPERTY Zoran R Pop-Stojanović Abstract How to introduce the concept of the Markov Property in an elementary
More informationLecture 10: Semi-Markov Type Processes
Lecture 1: Semi-Markov Type Processes 1. Semi-Markov processes (SMP) 1.1 Definition of SMP 1.2 Transition probabilities for SMP 1.3 Hitting times and semi-markov renewal equations 2. Processes with semi-markov
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationThe main results about probability measures are the following two facts:
Chapter 2 Probability measures The main results about probability measures are the following two facts: Theorem 2.1 (extension). If P is a (continuous) probability measure on a field F 0 then it has a
More informationMaximization of a Strongly Unimodal Multivariate Discrete Distribution
R u t c o r Research R e p o r t Maximization of a Strongly Unimodal Multivariate Discrete Distribution Mine Subasi a Ersoy Subasi b András Prékopa c RRR 12-2009, July 2009 RUTCOR Rutgers Center for Operations
More informationMultiple Random Variables
Multiple Random Variables This Version: July 30, 2015 Multiple Random Variables 2 Now we consider models with more than one r.v. These are called multivariate models For instance: height and weight An
More informationLocal time of self-affine sets of Brownian motion type and the jigsaw puzzle problem
Local time of self-affine sets of Brownian motion type and the jigsaw puzzle problem (Journal of Mathematical Analysis and Applications 49 (04), pp.79-93) Yu-Mei XUE and Teturo KAMAE Abstract Let Ω [0,
More informationIntroduction to Maximum Likelihood Estimation
Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationDiffeomorphic Warping. Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel)
Diffeomorphic Warping Ben Recht August 17, 2006 Joint work with Ali Rahimi (Intel) What Manifold Learning Isn t Common features of Manifold Learning Algorithms: 1-1 charting Dense sampling Geometric Assumptions
More informationChapter 1: Probability Theory Lecture 1: Measure space and measurable function
Chapter 1: Probability Theory Lecture 1: Measure space and measurable function Random experiment: uncertainty in outcomes Ω: sample space: a set containing all possible outcomes Definition 1.1 A collection
More informationModeling Dependence of Daily Stock Prices and Making Predictions of Future Movements
Modeling Dependence of Daily Stock Prices and Making Predictions of Future Movements Taavi Tamkivi, prof Tõnu Kollo Institute of Mathematical Statistics University of Tartu 29. June 2007 Taavi Tamkivi,
More informationLecture 8: Minimax Lower Bounds: LeCam, Fano, and Assouad
40.850: athematical Foundation of Big Data Analysis Spring 206 Lecture 8: inimax Lower Bounds: LeCam, Fano, and Assouad Lecturer: Fang Han arch 07 Disclaimer: These notes have not been subjected to the
More informationThe Regularized EM Algorithm
The Regularized EM Algorithm Haifeng Li Department of Computer Science University of California Riverside, CA 92521 hli@cs.ucr.edu Keshu Zhang Human Interaction Research Lab Motorola, Inc. Tempe, AZ 85282
More informationBayes rule and Bayes error. Donglin Zeng, Department of Biostatistics, University of North Carolina
Bayes rule and Bayes error Definition If f minimizes E[L(Y, f (X))], then f is called a Bayes rule (associated with the loss function L(y, f )) and the resulting prediction error rate, E[L(Y, f (X))],
More informationRandom experiments may consist of stages that are performed. Example: Roll a die two times. Consider the events E 1 = 1 or 2 on first roll
Econ 514: Probability and Statistics Lecture 4: Independence Stochastic independence Random experiments may consist of stages that are performed independently. Example: Roll a die two times. Consider the
More informationMultivariate Statistics
Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical
More informationStudia Scientiarum Mathematicarum Hungarica 42 (2), (2005) Communicated by D. Miklós
Studia Scientiarum Mathematicarum Hungarica 4 (), 7 6 (5) A METHOD TO FIND THE BEST BOUNDS IN A MULTIVARIATE DISCRETE MOMENT PROBLEM IF THE BASIS STRUCTURE IS GIVEN G MÁDI-NAGY Communicated by D Miklós
More informationA NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1
A NOVEL OPTIMAL PROBABILITY DENSITY FUNCTION TRACKING FILTER DESIGN 1 Jinglin Zhou Hong Wang, Donghua Zhou Department of Automation, Tsinghua University, Beijing 100084, P. R. China Control Systems Centre,
More informationDissimilarity, Quasimetric, Metric
Dissimilarity, Quasimetric, Metric Ehtibar N. Dzhafarov Purdue University and Swedish Collegium for Advanced Study Abstract Dissimilarity is a function that assigns to every pair of stimuli a nonnegative
More informationLecture Quantitative Finance Spring Term 2015
on bivariate Lecture Quantitative Finance Spring Term 2015 Prof. Dr. Erich Walter Farkas Lecture 07: April 2, 2015 1 / 54 Outline on bivariate 1 2 bivariate 3 Distribution 4 5 6 7 8 Comments and conclusions
More information1 Variance of a Random Variable
Indian Institute of Technology Bombay Department of Electrical Engineering Handout 14 EE 325 Probability and Random Processes Lecture Notes 9 August 28, 2014 1 Variance of a Random Variable The expectation
More informationResearch Article Global Existence and Boundedness of Solutions to a Second-Order Nonlinear Differential System
Applied Mathematics Volume 212, Article ID 63783, 12 pages doi:1.1155/212/63783 Research Article Global Existence and Boundedness of Solutions to a Second-Order Nonlinear Differential System Changjian
More informationEntropy and Large Deviations
Entropy and Large Deviations p. 1/32 Entropy and Large Deviations S.R.S. Varadhan Courant Institute, NYU Michigan State Universiy East Lansing March 31, 2015 Entropy comes up in many different contexts.
More information(U) =, if 0 U, 1 U, (U) = X, if 0 U, and 1 U. (U) = E, if 0 U, but 1 U. (U) = X \ E if 0 U, but 1 U. n=1 A n, then A M.
1. Abstract Integration The main reference for this section is Rudin s Real and Complex Analysis. The purpose of developing an abstract theory of integration is to emphasize the difference between the
More informationOptimal global rates of convergence for interpolation problems with random design
Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289
More information(2) E M = E C = X\E M
10 RICHARD B. MELROSE 2. Measures and σ-algebras An outer measure such as µ is a rather crude object since, even if the A i are disjoint, there is generally strict inequality in (1.14). It turns out to
More informationCross entropy-based importance sampling using Gaussian densities revisited
Cross entropy-based importance sampling using Gaussian densities revisited Sebastian Geyer a,, Iason Papaioannou a, Daniel Straub a a Engineering Ris Analysis Group, Technische Universität München, Arcisstraße
More informationExponential families also behave nicely under conditioning. Specifically, suppose we write η = (η 1, η 2 ) R k R p k so that
1 More examples 1.1 Exponential families under conditioning Exponential families also behave nicely under conditioning. Specifically, suppose we write η = η 1, η 2 R k R p k so that dp η dm 0 = e ηt 1
More informationVariational Inference. Sargur Srihari
Variational Inference Sargur srihari@cedar.buffalo.edu 1 Plan of Discussion Functionals Calculus of Variations Maximizing a Functional Finding Approximation to a Posterior Minimizing K-L divergence Factorized
More informationCopula based Divergence Measures and their use in Image Registration
7th European Signal Processing Conference (EUSIPCO 009) Glasgow, Scotland, August 4-8, 009 Copula based Divergence Measures and their use in Image Registration T S Durrani and Xuexing Zeng Centre of Excellence
More informationTHEOREMS, ETC., FOR MATH 516
THEOREMS, ETC., FOR MATH 516 Results labeled Theorem Ea.b.c (or Proposition Ea.b.c, etc.) refer to Theorem c from section a.b of Evans book (Partial Differential Equations). Proposition 1 (=Proposition
More informationSTAT 7032 Probability Spring Wlodek Bryc
STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,
More informationAn Extended BIC for Model Selection
An Extended BIC for Model Selection at the JSM meeting 2007 - Salt Lake City Surajit Ray Boston University (Dept of Mathematics and Statistics) Joint work with James Berger, Duke University; Susie Bayarri,
More informationA Goodness-of-fit Test for Copulas
A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and
More informationRobust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach
Robust Stochastic Frontier Analysis: a Minimum Density Power Divergence Approach Federico Belotti Giuseppe Ilardi CEIS, University of Rome Tor Vergata Bank of Italy Workshop on the Econometrics and Statistics
More informationModelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich
Modelling Dependence with Copulas and Applications to Risk Management Filip Lindskog, RiskLab, ETH Zürich 02-07-2000 Home page: http://www.math.ethz.ch/ lindskog E-mail: lindskog@math.ethz.ch RiskLab:
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationThe Distribution of Partially Exchangeable Random Variables
The Distribution of Partially Exchangeable Random Variables Hanxiang Peng a,1,, Xin Dang b,1, Xueqin Wang c,2 a Department of Mathematical Sciences, Indiana University-Purdue University at Indianapolis,
More informationIntroduction to Restricted Boltzmann Machines
Introduction to Restricted Boltzmann Machines Ilija Bogunovic and Edo Collins EPFL {ilija.bogunovic,edo.collins}@epfl.ch October 13, 2014 Introduction Ingredients: 1. Probabilistic graphical models (undirected,
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationData Mining and Analysis: Fundamental Concepts and Algorithms
Data Mining and Analysis: Fundamental Concepts and Algorithms dataminingbook.info Mohammed J. Zaki 1 Wagner Meira Jr. 1 Department of Computer Science Rensselaer Polytechnic Institute, Troy, NY, USA Department
More information