ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI
|
|
- Meagan Welch
- 5 years ago
- Views:
Transcription
1 ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague
2 Concepts What the φ-divergences stand for. We give a simple method of construction of φ-divergences using a normalization of convex or concave functions. We mention the well-known divergences (Kullback, Hellinger, χ 2, Power,...) We introduce several of its modifications: generalized LeCam and Hellinger divergences We apply selected φ-divergence distance measure to the real classification problem (experimental design of steel material 16530). These new families of divergences open new research possibilities in the area of statistical treatment of acoustic emission sources.
3 Overall assumptions: (X, A) measurable space, P(X, A) the set of all distributions, P P(X, A), nonvoid, P P, P,Q P are dominated by a σ finite measure µ on (X, A), p = dp/dµ, q = dq/dµ denotes the Radon-Nikodym derivative of P,Q with respect to µ, X n = (X 1,...,X n ) vector i.i.d. P 0 P, P n P(X, A) empirical distribution (measure), P n (A) = 1 n n I A (X i ), A A, n = 1,2,.... i=1
4 Definition of φ divergences: D φ (P,Q) = X q φ ( ) p dµ, P,Q P(X, A), q φ : (0, ) R, convex on (0, ), φ is strictly convex at t = 1, φ(1) = 0. Invariant w.r.t. linear transformations (nonnegative version): φ(t) = φ(t) φ +(1)(t 1), t (0, ). Denotation: φ(0) := lim t 0 + φ(t), φ( )/ := lim t φ(t)/t.
5 Metric properties of divergences RANGE: 0 D φ (P,Q) φ(0) + φ( )/, REFLEXIVITY: D φ (P,Q) = 0 iff P = Q, SYMMETRY: D φ (P,Q) = D φ (Q,P)????, TRIANGLE INEQUALITY: D φ (P,Q) D φ (P,R) + D φ (R,Q)????.
6 Examples of standard φ divergences: φ(t) φ(t) Dφ (P, Q) Kullback (I) lnt lnt + t 1 I(Q, P) = q ln q p dµ Shannon (I) t ln t t ln t t + 1 I(P, Q) = p ln p q dµ Variation (V) t 1 2 2t, t < 1 0, t 1 V(P, Q) = p q dµ Pearson (χ 2 ) t 2 1 (t 1) 2 χ 2 (P, Q) = (p q) 2 q dµ Neyman (χ 2 ) (1 t)/t (t 1) 2 /t χ 2 (Q, P) = (p q) 2 p dµ
7 Examples of standard φ divergences (continuation): Le Cam (LC 2 ) φ(t) φ(t) Dφ (P, Q) 2 1 t 1 + t (t 1) 2 t + 1 LC 2 (P, Q) = (p q) 2 p+q dµ Hellinger (H 2 ) 2(1 t) ( t 1) 2 H 2 (P, Q) = ( p q ) 2 dµ Vajda (χ a ) t 1 a χ a (P, Q) = q 1 a p q a dµ Matusita (M a ) t a 1 1/a M a (P, Q) = p a q a 1 a dµ Power (I a ) t a 1 a(a 1) t a a(t 1) 1 ( a(a 1) I a = 1 a(a 1) p a q 1 a dµ 1 )
8 Simple construction of φ-divergences Assume the following conditions: ψ : (0, ) R is a convex or concave function, twice differentiable at t = 1 with ψ (1) 0. CONSTRUCTION: φ(t) = ψ(t) ψ(1) ψ (1)(t 1) ψ, t (0, ), (1) (1) is a fully normalized nonnegative divergence function (i.e. φ (1) = 0 and φ (1) = 1).
9 Example 1: Generalized Le Cam divergence LC β (P, Q) LC β (P,Q) defined by means of strictly convex ψ function: ψ b (t) = 1, t (0, ), b 0. b + t Applying CONSTRUCTION (1) we get: φ b (t) = b (1 t) 2, t > 0, b 0. b + t Blended divergences for β = 1/(b + 1) [0, 1]: LC β (P,Q) := D φβ (P,Q) = 1 2 (p q) 2 dµ, P,Q P. βp + (1 β)q Lindsay (1994) introduced blending parameter β.
10 The special cases: Pearson s χ 2 (P,Q)/2 for β = 0 (b ), Neyman s χ 2 (Q,P)/2 for β = 1 (b = 0), LeCam s distance LC 2 (P,Q) for β = 1/2 (b = 1). LC β (P,Q) bounded for all β (0,1). LC β (P,Q) satisfies the skew symmetry about β = 1/2, LC β (P,Q) = LC 1 β (Q,P), P,Q P, The only one symmetric divergence: LC 1(P,Q) = LC 2 (P,Q). 2
11 Example 2: Generalized Hellinger divergence H β (P, Q) Let start with the ψ function: ψ a,b (t) = 1 t2a, b 0, a 0,1. b + ta For a = 1/2 and the convex transformation g(y) = y 2 we get: ( ) 1 t 2 φ b (t) := ψ 2 1,b(t) = 2 b +, t (0, ). t Applying CONSTRUCTION (1) with β = 1/(1 + b) [0, 1]: φ β (t) = 1 ( ) 1 t β + β, t (0, ), β [0,1], t H β (P,Q) = 1 (p q) 2 2 (β p + (1 β) dµ, P,Q P. q) 2
12 Particular cases: Pearson s χ 2 (P,Q)/2 for β = 0 (b ), Neyman s χ 2 (Q,P)/2 for β = 1 (b = 0), Hellinger s 2H 2 (P,Q) for β = 1/2 (b = 1). H β (P,Q) bounded for all β (0,1), H β (P,Q) satisfies the skew symmetry about β = 1/2, H β (P,Q) = H 1 β (Q,P), P,Q P, The only one symmetric divergence: H1(P,Q) = 2H 2 (P,Q). 2 Hellinger robust estimation techniques in Statistics.
13 Clustering via distribution mixtures The distribution mixture p(x Θ) = α j p j (x θ j ), j M α j = 1, α j 0, j M Θ = (α 1,...,α M,θ 1,...,θ M ). Clustering with the mixture / M (t i ) k = α k p k (x i θ k ) m=1 α m p m (x i θ m ), k (1,...,M). The k-th component of t i evaluates the probability of x i belonging to the k-th component of the mixture.
14 Fitting the mixture The distribution mixture that fits the observed data best is chosen via the maximum likelihood method. Likelihood function N l(θ x) = ln p(x i Θ) = i=1 N M ln α j p j (x i θ j ), i=1 j=1 Maximum likelihood estimate Θ = arg max Θ l(θ x).
15 EM algorithm Missing data principle z = ((x 1,y 1 ) T,...,(x N,y N ) T ). EM algorithm P(x Θ) = P(z Θ)/P(x x, Θ), l(θ x) = l(θ z) ln P(x x,θ), l(θ x) = E[l(Θ z) x,ψ] E[ln P(x x,θ) x,ψ]. E-Step: Calculate Q(Θ, Ψ) = E[l(Θ z) X, Ψ], M-Step: Maximize Q(Θ,Ψ) with respect to Θ.
16 Convenient properties of the mixture A model based method, best fit on elliptical clusters, Not susceptible to uneven number of members of clusters, Does not require independent variables, Robust against outliers, EM algorithm yields straightforward iteration pattern, EM spontaneously suppresses needless components.
17 Number of clusters Penalization criteria based on likelihood function Akaike s information criterion: M 0 = arg min M N [ 2l(Θ M x) + d M(Θ M )], Bayes information criterion: M 0 = arg min M N [ 2l(Θ M x) + d M(Θ M )ln N], CLC-ICL information criterion: M 0 = arg min M N [ 2l(Θ M x) + 2EN(Θ M,x) + d M ln N +...].
18 Experimental tasks The practical results and experiments (the main tasks): Measure different types of acoustic signals, Combine both the main approaches, i.e. the generalized φ-divergences and the distribution mixture method, Verify proposed classification method on experimental data sets Verify the advantage of the combined method, i.e. the ability of assessing the number of clusters, the robustness against sparse outliers that would distort classical classification.
19 Experimental setup on the metal plate
20 Piezo-ceramic sensors (mini and medium sizes)
21 Experimental measurement devices
22 Acoustic signals detected through the piezo-ceramic sensors, Attached to the thin metal plate of sizes 1,8m 0,6m 3mm, Measuring detection device: DAKEL-XEDO 5 with the properties: 4 MHz sampling rate, 12-bit accuracy, i.e. Voltage in the interval [-2048mV, 2048mV ]. Computed classification attributes: The spectral densities of the signal {X t } T t=0 1 Signal attributes given above W, Q 0.33, Z c, denoted by S(f ), D φ discrete form of the generalized Hellinger divergence of the Example 2 with blending parameter β = 0.5 (denoted as H D ).
23 Experiment in action
24 Separation by means of the proposed attributes Type 1 Type 2 Type 3 Type 4 Type Type 1 Type 2 Type 3 Type 4 Type Z c 120 Kv Observation number Observation number Parameters Q 0.33, W Z c, Q 0.33 Z c, Q 0.33, H D Success rate 49% 83% 86%
25 Design of the Experiment Experiment consisted of destructive testing of the steel plate material No which was exposed to strong pressure of a very hard ball. The experiment proceeds till the destruction of the steel material emerged, i.e. the ball passed through the plate so that the material was no longer disposable for any reasonable applications in the industry. The main task of acoustic emission is to distinguish the level of material degradation and prevent us against the critical accident or emergency affair in transportation, chemical or energetic industry.
26 Acoustic signals in the Experiment We had at disposal overall 587 acoustic signals, which were divided into three groups: initial phase (without apparent damage) the first 100 signals considered, middle phase of the experiment the signals from 101 to 350, terminal phase (extensively damaged steel) the signals from 351 to 587. The main task of the experiment 16530: To separate detected AE signals of the initial period (when the steel material has not been too damaged yet) from the terminal period when the plate was close to destruction just before the failure.
27 Separation based on EM-algorithm and the attributes W, Q 0.33 with divergence H D D H W Q.33 0
28 Separation based on EM-algorithm and parameters W, Q komponenta 2. komponenta 3. komponenta 250 W Q 0.33
29 EM algorithm combined with Hellinger φ-divergence The success rates were the following: Distinguishing between all the three phases of experiment: in 63 percents. Separation of the only initial (safe) period against the terminal (critical dangerous) period: about 81 percents. The general conditions of the experiment: We yield these results without any signal preprocessing and without any expert based purification of the data set. We only restrict ourselves to middle part of the signals detected, it means the cutout part ranged from 500 to of digitally sampled values for each acoustic signal. Powerful detection of the mass concentration in signal frequency domain depending on the level of destruction of material.
30 Very similar overstrain processes can be found in our real life, when repeated load of pressure is caused by any sharp and hard object ending in a complete damage of the industrial device under consideration. As an example we present the testing design: Wear Detection of the Axial Ball and Roller Bearings In cooperation with Technical University in Brno, Czech Rep. Doc. Pavel Mazal, VUT Brno
31 Wear Detection of the Axial Ball Bearings
32 Wear Detection of the Roller Bearings
33
34
35
36
37 :-) Thanks a lot for your attention.
IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationEM for Spherical Gaussians
EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical
More informationLikelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University
Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate
More informationCheng Soon Ong & Christian Walder. Canberra February June 2017
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX
More informationExpectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )
Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select
More informationAdvanced Machine Learning & Perception
Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels
More informationClustering and Gaussian Mixture Models
Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap
More informationECE598: Information-theoretic methods in high-dimensional statistics Spring 2016
ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma
More informationExpectation Maximization
Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /
More informationTHEOREM AND METRIZABILITY
f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationComputing the MLE and the EM Algorithm
ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations
More informationChapter 4: Asymptotic Properties of the MLE
Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider
More informationEconomics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,
Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationLecture 2 Machine Learning Review
Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things
More informationStatistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation
Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider
More informationResearch Article On Some Improvements of the Jensen Inequality with Some Applications
Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationChapter 3 : Likelihood function and inference
Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM
More information2.1.3 The Testing Problem and Neave s Step Method
we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationExpectation maximization tutorial
Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed
More informationExperimental Signal Deconvolution in Acoustic Emission Identification Setup
6th NDT in Progress 20 International Workshop of NDT Experts, Prague, 0-2 Oct 20 Experimental Signal Deconvolution in Acoustic Emission Identification Setup Zuzana FAROVÁ23, Zdeněk PŘEVOROVSKÝ, Václav
More informationA Note on the Expectation-Maximization (EM) Algorithm
A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationModel Selection and Geometry
Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model
More informationLecture 12 November 3
STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1
More information1 What is a hidden Markov model?
1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and
More informationOn the Chi square and higher-order Chi distances for approximating f-divergences
On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More informationtopics about f-divergence
topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationA Very Brief Summary of Statistical Inference, and Examples
A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)
More informationPROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria
CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their
More informationMinimax lower bounds I
Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field
More informationSome New Information Inequalities Involving f-divergences
BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 12, No 2 Sofia 2012 Some New Information Inequalities Involving f-divergences Amit Srivastava Department of Mathematics, Jaypee
More informationAPPLICATION OF ACOUSTIC EMISSION METHOD DURING CYCLIC LOADING OF CONCRETE BEAM
More info about this article: http://www.ndt.net/?id=21866 Abstract IX th NDT in PROGRESS October 9 11, 2017, Prague, Czech Republic APPLICATION OF ACOUSTIC EMISSION METHOD DURING CYCLIC LOADING OF CONCRETE
More information40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology
Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson
More informationMixture Models and Expectation-Maximization
Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?
More informationPattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM
Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationInformation Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18
Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable
More informationRestricted Boltzmann Machines
Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector
More informationGeneralized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses
Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationBrief Review on Estimation Theory
Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on
More informationSection 8: Asymptotic Properties of the MLE
2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,
More informationBayesian Regularization
Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors
More informationLatent Variable Models and EM algorithm
Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationChapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)
Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More information1 Expectation Maximization
Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing
More information1/37. Convexity theory. Victor Kitov
1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to
More informationOn the Chi square and higher-order Chi distances for approximating f-divergences
c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock 2 www.informationgeometry.org
More informationPosterior Regularization
Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods
More informationSuperiorized Inversion of the Radon Transform
Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of New York March 28, 2017 The Radon Transform in 2D For a function f of two real variables, a real number
More informationHomework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.
Stat 300A Theory of Statistics Homework 7: Solutions Nikos Ignatiadis Due on November 28, 208 Solutions should be complete and concisely written. Please, use a separate sheet or set of sheets for each
More informationMIT Spring 2016
MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationLecture 35: December The fundamental statistical distances
36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose
More informationLecture 6: Gaussian Mixture Models (GMM)
Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning
More informationWeek 3: The EM algorithm
Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent
More informationVariable selection for model-based clustering
Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationInference in non-linear time series
Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More information10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup
0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: Nov 4 Note: hese notes are based on scribed notes from Spring5 offering of this course LaeX template courtesy of UC
More informationMultilevel Analysis of Continuous AE from Helicopter Gearbox
Multilevel Analysis of Continuous AE from Helicopter Gearbox Milan CHLADA*, Zdenek PREVOROVSKY, Jan HERMANEK, Josef KROFTA Impact and Waves in Solids, Institute of Thermomechanics AS CR, v. v. i.; Prague,
More informationA SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS
Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information
More informationMGMT 69000: Topics in High-dimensional Data Analysis Falll 2016
MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence
More informationGeneralized Hypothesis Testing and Maximizing the Success Probability in Financial Markets
Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets Tim Leung 1, Qingshuo Song 2, and Jie Yang 3 1 Columbia University, New York, USA; leung@ieor.columbia.edu 2 City
More informationClassification Logistic Regression
Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham
More informationStatistical Analysis of Data Generated by a Mixture of Two Parametric Distributions
UDC 519.2 Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions Yu. K. Belyaev, D. Källberg, P. Rydén Department of Mathematics and Mathematical Statistics, Umeå University,
More informationQUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS
QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS Parvathinathan Venkitasubramaniam, Gökhan Mergen, Lang Tong and Ananthram Swami ABSTRACT We study the problem of quantization for
More informationSpazi vettoriali e misure di similaritá
Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2009-10 March 25, 2010 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare
More informationCurve Fitting Re-visited, Bishop1.2.5
Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the
More informationLaw of Cosines and Shannon-Pythagorean Theorem for Quantum Information
In Geometric Science of Information, 2013, Paris. Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information Roman V. Belavkin 1 School of Engineering and Information Sciences Middlesex University,
More informationSTATS 306B: Unsupervised Learning Spring Lecture 3 April 7th
STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationFinite Singular Multivariate Gaussian Mixture
21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate
More informationIntroduction to Machine Learning
Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s
More informationClustering with k-means and Gaussian mixture distributions
Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15
More informationVariational inference
Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM
More informationLecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches
University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches Ref: Huber,
More informationMaximum Likelihood Estimation
University of Pavia Maximum Likelihood Estimation Eduardo Rossi Likelihood function Choosing parameter values that make what one has observed more likely to occur than any other parameter values do. Assumption(Distribution)
More informationLecture 7 Introduction to Statistical Decision Theory
Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7
More informationLecture 13: More uses of Language Models
Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches
More informationEM Algorithm LECTURE OUTLINE
EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationCS 591, Lecture 2 Data Analytics: Theory and Applications Boston University
CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.
More informationComputer simulation on homogeneity testing for weighted data sets used in HEP
Computer simulation on homogeneity testing for weighted data sets used in HEP Petr Bouř and Václav Kůs Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University
More informationA BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo
A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND
More informationRegression Clustering
Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form
More information