ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI

Size: px
Start display at page:

Download "ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI"

Transcription

1 ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague

2 Concepts What the φ-divergences stand for. We give a simple method of construction of φ-divergences using a normalization of convex or concave functions. We mention the well-known divergences (Kullback, Hellinger, χ 2, Power,...) We introduce several of its modifications: generalized LeCam and Hellinger divergences We apply selected φ-divergence distance measure to the real classification problem (experimental design of steel material 16530). These new families of divergences open new research possibilities in the area of statistical treatment of acoustic emission sources.

3 Overall assumptions: (X, A) measurable space, P(X, A) the set of all distributions, P P(X, A), nonvoid, P P, P,Q P are dominated by a σ finite measure µ on (X, A), p = dp/dµ, q = dq/dµ denotes the Radon-Nikodym derivative of P,Q with respect to µ, X n = (X 1,...,X n ) vector i.i.d. P 0 P, P n P(X, A) empirical distribution (measure), P n (A) = 1 n n I A (X i ), A A, n = 1,2,.... i=1

4 Definition of φ divergences: D φ (P,Q) = X q φ ( ) p dµ, P,Q P(X, A), q φ : (0, ) R, convex on (0, ), φ is strictly convex at t = 1, φ(1) = 0. Invariant w.r.t. linear transformations (nonnegative version): φ(t) = φ(t) φ +(1)(t 1), t (0, ). Denotation: φ(0) := lim t 0 + φ(t), φ( )/ := lim t φ(t)/t.

5 Metric properties of divergences RANGE: 0 D φ (P,Q) φ(0) + φ( )/, REFLEXIVITY: D φ (P,Q) = 0 iff P = Q, SYMMETRY: D φ (P,Q) = D φ (Q,P)????, TRIANGLE INEQUALITY: D φ (P,Q) D φ (P,R) + D φ (R,Q)????.

6 Examples of standard φ divergences: φ(t) φ(t) Dφ (P, Q) Kullback (I) lnt lnt + t 1 I(Q, P) = q ln q p dµ Shannon (I) t ln t t ln t t + 1 I(P, Q) = p ln p q dµ Variation (V) t 1 2 2t, t < 1 0, t 1 V(P, Q) = p q dµ Pearson (χ 2 ) t 2 1 (t 1) 2 χ 2 (P, Q) = (p q) 2 q dµ Neyman (χ 2 ) (1 t)/t (t 1) 2 /t χ 2 (Q, P) = (p q) 2 p dµ

7 Examples of standard φ divergences (continuation): Le Cam (LC 2 ) φ(t) φ(t) Dφ (P, Q) 2 1 t 1 + t (t 1) 2 t + 1 LC 2 (P, Q) = (p q) 2 p+q dµ Hellinger (H 2 ) 2(1 t) ( t 1) 2 H 2 (P, Q) = ( p q ) 2 dµ Vajda (χ a ) t 1 a χ a (P, Q) = q 1 a p q a dµ Matusita (M a ) t a 1 1/a M a (P, Q) = p a q a 1 a dµ Power (I a ) t a 1 a(a 1) t a a(t 1) 1 ( a(a 1) I a = 1 a(a 1) p a q 1 a dµ 1 )

8 Simple construction of φ-divergences Assume the following conditions: ψ : (0, ) R is a convex or concave function, twice differentiable at t = 1 with ψ (1) 0. CONSTRUCTION: φ(t) = ψ(t) ψ(1) ψ (1)(t 1) ψ, t (0, ), (1) (1) is a fully normalized nonnegative divergence function (i.e. φ (1) = 0 and φ (1) = 1).

9 Example 1: Generalized Le Cam divergence LC β (P, Q) LC β (P,Q) defined by means of strictly convex ψ function: ψ b (t) = 1, t (0, ), b 0. b + t Applying CONSTRUCTION (1) we get: φ b (t) = b (1 t) 2, t > 0, b 0. b + t Blended divergences for β = 1/(b + 1) [0, 1]: LC β (P,Q) := D φβ (P,Q) = 1 2 (p q) 2 dµ, P,Q P. βp + (1 β)q Lindsay (1994) introduced blending parameter β.

10 The special cases: Pearson s χ 2 (P,Q)/2 for β = 0 (b ), Neyman s χ 2 (Q,P)/2 for β = 1 (b = 0), LeCam s distance LC 2 (P,Q) for β = 1/2 (b = 1). LC β (P,Q) bounded for all β (0,1). LC β (P,Q) satisfies the skew symmetry about β = 1/2, LC β (P,Q) = LC 1 β (Q,P), P,Q P, The only one symmetric divergence: LC 1(P,Q) = LC 2 (P,Q). 2

11 Example 2: Generalized Hellinger divergence H β (P, Q) Let start with the ψ function: ψ a,b (t) = 1 t2a, b 0, a 0,1. b + ta For a = 1/2 and the convex transformation g(y) = y 2 we get: ( ) 1 t 2 φ b (t) := ψ 2 1,b(t) = 2 b +, t (0, ). t Applying CONSTRUCTION (1) with β = 1/(1 + b) [0, 1]: φ β (t) = 1 ( ) 1 t β + β, t (0, ), β [0,1], t H β (P,Q) = 1 (p q) 2 2 (β p + (1 β) dµ, P,Q P. q) 2

12 Particular cases: Pearson s χ 2 (P,Q)/2 for β = 0 (b ), Neyman s χ 2 (Q,P)/2 for β = 1 (b = 0), Hellinger s 2H 2 (P,Q) for β = 1/2 (b = 1). H β (P,Q) bounded for all β (0,1), H β (P,Q) satisfies the skew symmetry about β = 1/2, H β (P,Q) = H 1 β (Q,P), P,Q P, The only one symmetric divergence: H1(P,Q) = 2H 2 (P,Q). 2 Hellinger robust estimation techniques in Statistics.

13 Clustering via distribution mixtures The distribution mixture p(x Θ) = α j p j (x θ j ), j M α j = 1, α j 0, j M Θ = (α 1,...,α M,θ 1,...,θ M ). Clustering with the mixture / M (t i ) k = α k p k (x i θ k ) m=1 α m p m (x i θ m ), k (1,...,M). The k-th component of t i evaluates the probability of x i belonging to the k-th component of the mixture.

14 Fitting the mixture The distribution mixture that fits the observed data best is chosen via the maximum likelihood method. Likelihood function N l(θ x) = ln p(x i Θ) = i=1 N M ln α j p j (x i θ j ), i=1 j=1 Maximum likelihood estimate Θ = arg max Θ l(θ x).

15 EM algorithm Missing data principle z = ((x 1,y 1 ) T,...,(x N,y N ) T ). EM algorithm P(x Θ) = P(z Θ)/P(x x, Θ), l(θ x) = l(θ z) ln P(x x,θ), l(θ x) = E[l(Θ z) x,ψ] E[ln P(x x,θ) x,ψ]. E-Step: Calculate Q(Θ, Ψ) = E[l(Θ z) X, Ψ], M-Step: Maximize Q(Θ,Ψ) with respect to Θ.

16 Convenient properties of the mixture A model based method, best fit on elliptical clusters, Not susceptible to uneven number of members of clusters, Does not require independent variables, Robust against outliers, EM algorithm yields straightforward iteration pattern, EM spontaneously suppresses needless components.

17 Number of clusters Penalization criteria based on likelihood function Akaike s information criterion: M 0 = arg min M N [ 2l(Θ M x) + d M(Θ M )], Bayes information criterion: M 0 = arg min M N [ 2l(Θ M x) + d M(Θ M )ln N], CLC-ICL information criterion: M 0 = arg min M N [ 2l(Θ M x) + 2EN(Θ M,x) + d M ln N +...].

18 Experimental tasks The practical results and experiments (the main tasks): Measure different types of acoustic signals, Combine both the main approaches, i.e. the generalized φ-divergences and the distribution mixture method, Verify proposed classification method on experimental data sets Verify the advantage of the combined method, i.e. the ability of assessing the number of clusters, the robustness against sparse outliers that would distort classical classification.

19 Experimental setup on the metal plate

20 Piezo-ceramic sensors (mini and medium sizes)

21 Experimental measurement devices

22 Acoustic signals detected through the piezo-ceramic sensors, Attached to the thin metal plate of sizes 1,8m 0,6m 3mm, Measuring detection device: DAKEL-XEDO 5 with the properties: 4 MHz sampling rate, 12-bit accuracy, i.e. Voltage in the interval [-2048mV, 2048mV ]. Computed classification attributes: The spectral densities of the signal {X t } T t=0 1 Signal attributes given above W, Q 0.33, Z c, denoted by S(f ), D φ discrete form of the generalized Hellinger divergence of the Example 2 with blending parameter β = 0.5 (denoted as H D ).

23 Experiment in action

24 Separation by means of the proposed attributes Type 1 Type 2 Type 3 Type 4 Type Type 1 Type 2 Type 3 Type 4 Type Z c 120 Kv Observation number Observation number Parameters Q 0.33, W Z c, Q 0.33 Z c, Q 0.33, H D Success rate 49% 83% 86%

25 Design of the Experiment Experiment consisted of destructive testing of the steel plate material No which was exposed to strong pressure of a very hard ball. The experiment proceeds till the destruction of the steel material emerged, i.e. the ball passed through the plate so that the material was no longer disposable for any reasonable applications in the industry. The main task of acoustic emission is to distinguish the level of material degradation and prevent us against the critical accident or emergency affair in transportation, chemical or energetic industry.

26 Acoustic signals in the Experiment We had at disposal overall 587 acoustic signals, which were divided into three groups: initial phase (without apparent damage) the first 100 signals considered, middle phase of the experiment the signals from 101 to 350, terminal phase (extensively damaged steel) the signals from 351 to 587. The main task of the experiment 16530: To separate detected AE signals of the initial period (when the steel material has not been too damaged yet) from the terminal period when the plate was close to destruction just before the failure.

27 Separation based on EM-algorithm and the attributes W, Q 0.33 with divergence H D D H W Q.33 0

28 Separation based on EM-algorithm and parameters W, Q komponenta 2. komponenta 3. komponenta 250 W Q 0.33

29 EM algorithm combined with Hellinger φ-divergence The success rates were the following: Distinguishing between all the three phases of experiment: in 63 percents. Separation of the only initial (safe) period against the terminal (critical dangerous) period: about 81 percents. The general conditions of the experiment: We yield these results without any signal preprocessing and without any expert based purification of the data set. We only restrict ourselves to middle part of the signals detected, it means the cutout part ranged from 500 to of digitally sampled values for each acoustic signal. Powerful detection of the mass concentration in signal frequency domain depending on the level of destruction of material.

30 Very similar overstrain processes can be found in our real life, when repeated load of pressure is caused by any sharp and hard object ending in a complete damage of the industrial device under consideration. As an example we present the testing design: Wear Detection of the Axial Ball and Roller Bearings In cooperation with Technical University in Brno, Czech Rep. Doc. Pavel Mazal, VUT Brno

31 Wear Detection of the Axial Ball Bearings

32 Wear Detection of the Roller Bearings

33

34

35

36

37 :-) Thanks a lot for your attention.

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

EM for Spherical Gaussians

EM for Spherical Gaussians EM for Spherical Gaussians Karthekeyan Chandrasekaran Hassan Kingravi December 4, 2007 1 Introduction In this project, we examine two aspects of the behavior of the EM algorithm for mixtures of spherical

More information

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University

Likelihood, MLE & EM for Gaussian Mixture Clustering. Nick Duffield Texas A&M University Likelihood, MLE & EM for Gaussian Mixture Clustering Nick Duffield Texas A&M University Probability vs. Likelihood Probability: predict unknown outcomes based on known parameters: P(x q) Likelihood: estimate

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2017

Cheng Soon Ong & Christian Walder. Canberra February June 2017 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2017 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 679 Part XIX

More information

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 )

Expectation Maximization (EM) Algorithm. Each has it s own probability of seeing H on any one flip. Let. p 1 = P ( H on Coin 1 ) Expectation Maximization (EM Algorithm Motivating Example: Have two coins: Coin 1 and Coin 2 Each has it s own probability of seeing H on any one flip. Let p 1 = P ( H on Coin 1 p 2 = P ( H on Coin 2 Select

More information

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception Advanced Machine Learning & Perception Instructor: Tony Jebara Topic 6 Standard Kernels Unusual Input Spaces for Kernels String Kernels Probabilistic Kernels Fisher Kernels Probability Product Kernels

More information

Clustering and Gaussian Mixture Models

Clustering and Gaussian Mixture Models Clustering and Gaussian Mixture Models Piyush Rai IIT Kanpur Probabilistic Machine Learning (CS772A) Jan 25, 2016 Probabilistic Machine Learning (CS772A) Clustering and Gaussian Mixture Models 1 Recap

More information

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016

ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 ECE598: Information-theoretic methods in high-dimensional statistics Spring 06 Lecture : Mutual Information Method Lecturer: Yihong Wu Scribe: Jaeho Lee, Mar, 06 Ed. Mar 9 Quick review: Assouad s lemma

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr 1 /

More information

THEOREM AND METRIZABILITY

THEOREM AND METRIZABILITY f-divergences - REPRESENTATION THEOREM AND METRIZABILITY Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk we are first going to state the so-called

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

Computing the MLE and the EM Algorithm

Computing the MLE and the EM Algorithm ECE 830 Fall 0 Statistical Signal Processing instructor: R. Nowak Computing the MLE and the EM Algorithm If X p(x θ), θ Θ, then the MLE is the solution to the equations logp(x θ) θ 0. Sometimes these equations

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1,

Economics 520. Lecture Note 19: Hypothesis Testing via the Neyman-Pearson Lemma CB 8.1, Economics 520 Lecture Note 9: Hypothesis Testing via the Neyman-Pearson Lemma CB 8., 8.3.-8.3.3 Uniformly Most Powerful Tests and the Neyman-Pearson Lemma Let s return to the hypothesis testing problem

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Lecture 2 Machine Learning Review

Lecture 2 Machine Learning Review Lecture 2 Machine Learning Review CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago March 29, 2017 Things we will look at today Formal Setup for Supervised Learning Things

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Research Article On Some Improvements of the Jensen Inequality with Some Applications

Research Article On Some Improvements of the Jensen Inequality with Some Applications Hindawi Publishing Corporation Journal of Inequalities and Applications Volume 2009, Article ID 323615, 15 pages doi:10.1155/2009/323615 Research Article On Some Improvements of the Jensen Inequality with

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Chapter 3 : Likelihood function and inference

Chapter 3 : Likelihood function and inference Chapter 3 : Likelihood function and inference 4 Likelihood function and inference The likelihood Information and curvature Sufficiency and ancilarity Maximum likelihood estimation Non-regular models EM

More information

2.1.3 The Testing Problem and Neave s Step Method

2.1.3 The Testing Problem and Neave s Step Method we can guarantee (1) that the (unknown) true parameter vector θ t Θ is an interior point of Θ, and (2) that ρ θt (R) > 0 for any R 2 Q. These are two of Birch s regularity conditions that were critical

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Expectation maximization tutorial

Expectation maximization tutorial Expectation maximization tutorial Octavian Ganea November 18, 2016 1/1 Today Expectation - maximization algorithm Topic modelling 2/1 ML & MAP Observed data: X = {x 1, x 2... x N } 3/1 ML & MAP Observed

More information

Experimental Signal Deconvolution in Acoustic Emission Identification Setup

Experimental Signal Deconvolution in Acoustic Emission Identification Setup 6th NDT in Progress 20 International Workshop of NDT Experts, Prague, 0-2 Oct 20 Experimental Signal Deconvolution in Acoustic Emission Identification Setup Zuzana FAROVÁ23, Zdeněk PŘEVOROVSKÝ, Václav

More information

A Note on the Expectation-Maximization (EM) Algorithm

A Note on the Expectation-Maximization (EM) Algorithm A Note on the Expectation-Maximization (EM) Algorithm ChengXiang Zhai Department of Computer Science University of Illinois at Urbana-Champaign March 11, 2007 1 Introduction The Expectation-Maximization

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Lecture 12 November 3

Lecture 12 November 3 STATS 300A: Theory of Statistics Fall 2015 Lecture 12 November 3 Lecturer: Lester Mackey Scribe: Jae Hyuck Park, Christian Fong Warning: These notes may contain factual and/or typographic errors. 12.1

More information

1 What is a hidden Markov model?

1 What is a hidden Markov model? 1 What is a hidden Markov model? Consider a Markov chain {X k }, where k is a non-negative integer. Suppose {X k } embedded in signals corrupted by some noise. Indeed, {X k } is hidden due to noise and

More information

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen, Senior Member, IEEE and Richard Nock, Nonmember Abstract We report closed-form formula for calculating the

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

topics about f-divergence

topics about f-divergence topics about f-divergence Presented by Liqun Chen Mar 16th, 2018 1 Outline 1 f-gan: Training Generative Neural Samplers using Variational Experiments 2 f-gans in an Information Geometric Nutshell Experiments

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

Series 7, May 22, 2018 (EM Convergence)

Series 7, May 22, 2018 (EM Convergence) Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2008 Prof. Gesine Reinert 1 Data x = x 1, x 2,..., x n, realisations of random variables X 1, X 2,..., X n with distribution (model)

More information

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria

PROPERTIES. Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria CSISZÁR S f-divergences - BASIC PROPERTIES Ferdinand Österreicher Institute of Mathematics, University of Salzburg, Austria Abstract In this talk basic general properties of f-divergences, including their

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Some New Information Inequalities Involving f-divergences

Some New Information Inequalities Involving f-divergences BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 12, No 2 Sofia 2012 Some New Information Inequalities Involving f-divergences Amit Srivastava Department of Mathematics, Jaypee

More information

APPLICATION OF ACOUSTIC EMISSION METHOD DURING CYCLIC LOADING OF CONCRETE BEAM

APPLICATION OF ACOUSTIC EMISSION METHOD DURING CYCLIC LOADING OF CONCRETE BEAM More info about this article: http://www.ndt.net/?id=21866 Abstract IX th NDT in PROGRESS October 9 11, 2017, Prague, Czech Republic APPLICATION OF ACOUSTIC EMISSION METHOD DURING CYCLIC LOADING OF CONCRETE

More information

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology

40.530: Statistics. Professor Chen Zehua. Singapore University of Design and Technology Singapore University of Design and Technology Lecture 9: Hypothesis testing, uniformly most powerful tests. The Neyman-Pearson framework Let P be the family of distributions of concern. The Neyman-Pearson

More information

Mixture Models and Expectation-Maximization

Mixture Models and Expectation-Maximization Mixture Models and Expectation-Maximiation David M. Blei March 9, 2012 EM for mixtures of multinomials The graphical model for a mixture of multinomials π d x dn N D θ k K How should we fit the parameters?

More information

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM

Pattern Recognition and Machine Learning. Bishop Chapter 9: Mixture Models and EM Pattern Recognition and Machine Learning Chapter 9: Mixture Models and EM Thomas Mensink Jakob Verbeek October 11, 27 Le Menu 9.1 K-means clustering Getting the idea with a simple example 9.2 Mixtures

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18

Information Theory. David Rosenberg. June 15, New York University. David Rosenberg (New York University) DS-GA 1003 June 15, / 18 Information Theory David Rosenberg New York University June 15, 2015 David Rosenberg (New York University) DS-GA 1003 June 15, 2015 1 / 18 A Measure of Information? Consider a discrete random variable

More information

Restricted Boltzmann Machines

Restricted Boltzmann Machines Restricted Boltzmann Machines Boltzmann Machine(BM) A Boltzmann machine extends a stochastic Hopfield network to include hidden units. It has binary (0 or 1) visible vector unit x and hidden (latent) vector

More information

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses

Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Ann Inst Stat Math (2009) 61:773 787 DOI 10.1007/s10463-008-0172-6 Generalized Neyman Pearson optimality of empirical likelihood for testing parameter hypotheses Taisuke Otsu Received: 1 June 2007 / Revised:

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

Brief Review on Estimation Theory

Brief Review on Estimation Theory Brief Review on Estimation Theory K. Abed-Meraim ENST PARIS, Signal and Image Processing Dept. abed@tsi.enst.fr This presentation is essentially based on the course BASTA by E. Moulines Brief review on

More information

Section 8: Asymptotic Properties of the MLE

Section 8: Asymptotic Properties of the MLE 2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,

More information

Bayesian Regularization

Bayesian Regularization Bayesian Regularization Aad van der Vaart Vrije Universiteit Amsterdam International Congress of Mathematicians Hyderabad, August 2010 Contents Introduction Abstract result Gaussian process priors Co-authors

More information

Latent Variable Models and EM algorithm

Latent Variable Models and EM algorithm Latent Variable Models and EM algorithm SC4/SM4 Data Mining and Machine Learning, Hilary Term 2017 Dino Sejdinovic 3.1 Clustering and Mixture Modelling K-means and hierarchical clustering are non-probabilistic

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1)

Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Chapter 2. Binary and M-ary Hypothesis Testing 2.1 Introduction (Levy 2.1) Detection problems can usually be casted as binary or M-ary hypothesis testing problems. Applications: This chapter: Simple hypothesis

More information

Bayes spaces: use of improper priors and distances between densities

Bayes spaces: use of improper priors and distances between densities Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

1 Expectation Maximization

1 Expectation Maximization Introduction Expectation-Maximization Bibliographical notes 1 Expectation Maximization Daniel Khashabi 1 khashab2@illinois.edu 1.1 Introduction Consider the problem of parameter learning by maximizing

More information

1/37. Convexity theory. Victor Kitov

1/37. Convexity theory. Victor Kitov 1/37 Convexity theory Victor Kitov 2/37 Table of Contents 1 2 Strictly convex functions 3 Concave & strictly concave functions 4 Kullback-Leibler divergence 3/37 Convex sets Denition 1 Set X is convex

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Object Recognition 2017-2018 Jakob Verbeek Clustering Finding a group structure in the data Data in one cluster similar to

More information

On the Chi square and higher-order Chi distances for approximating f-divergences

On the Chi square and higher-order Chi distances for approximating f-divergences c 2013 Frank Nielsen, Sony Computer Science Laboratories, Inc. 1/17 On the Chi square and higher-order Chi distances for approximating f-divergences Frank Nielsen 1 Richard Nock 2 www.informationgeometry.org

More information

Posterior Regularization

Posterior Regularization Posterior Regularization 1 Introduction One of the key challenges in probabilistic structured learning, is the intractability of the posterior distribution, for fast inference. There are numerous methods

More information

Superiorized Inversion of the Radon Transform

Superiorized Inversion of the Radon Transform Superiorized Inversion of the Radon Transform Gabor T. Herman Graduate Center, City University of New York March 28, 2017 The Radon Transform in 2D For a function f of two real variables, a real number

More information

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses.

Homework 7: Solutions. P3.1 from Lehmann, Romano, Testing Statistical Hypotheses. Stat 300A Theory of Statistics Homework 7: Solutions Nikos Ignatiadis Due on November 28, 208 Solutions should be complete and concisely written. Please, use a separate sheet or set of sheets for each

More information

MIT Spring 2016

MIT Spring 2016 MIT 18.655 Dr. Kempthorne Spring 2016 1 MIT 18.655 Outline 1 2 MIT 18.655 Decision Problem: Basic Components P = {P θ : θ Θ} : parametric model. Θ = {θ}: Parameter space. A{a} : Action space. L(θ, a) :

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Lecture 35: December The fundamental statistical distances

Lecture 35: December The fundamental statistical distances 36-705: Intermediate Statistics Fall 207 Lecturer: Siva Balakrishnan Lecture 35: December 4 Today we will discuss distances and metrics between distributions that are useful in statistics. I will be lose

More information

Lecture 6: Gaussian Mixture Models (GMM)

Lecture 6: Gaussian Mixture Models (GMM) Helsinki Institute for Information Technology Lecture 6: Gaussian Mixture Models (GMM) Pedram Daee 3.11.2015 Outline Gaussian Mixture Models (GMM) Models Model families and parameters Parameter learning

More information

Week 3: The EM algorithm

Week 3: The EM algorithm Week 3: The EM algorithm Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Term 1, Autumn 2005 Mixtures of Gaussians Data: Y = {y 1... y N } Latent

More information

Variable selection for model-based clustering

Variable selection for model-based clustering Variable selection for model-based clustering Matthieu Marbac (Ensai - Crest) Joint works with: M. Sedki (Univ. Paris-sud) and V. Vandewalle (Univ. Lille 2) The problem Objective: Estimation of a partition

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Inference in non-linear time series

Inference in non-linear time series Intro LS MLE Other Erik Lindström Centre for Mathematical Sciences Lund University LU/LTH & DTU Intro LS MLE Other General Properties Popular estimatiors Overview Introduction General Properties Estimators

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup

10-704: Information Processing and Learning Fall Lecture 21: Nov 14. sup 0-704: Information Processing and Learning Fall 206 Lecturer: Aarti Singh Lecture 2: Nov 4 Note: hese notes are based on scribed notes from Spring5 offering of this course LaeX template courtesy of UC

More information

Multilevel Analysis of Continuous AE from Helicopter Gearbox

Multilevel Analysis of Continuous AE from Helicopter Gearbox Multilevel Analysis of Continuous AE from Helicopter Gearbox Milan CHLADA*, Zdenek PREVOROVSKY, Jan HERMANEK, Josef KROFTA Impact and Waves in Solids, Institute of Thermomechanics AS CR, v. v. i.; Prague,

More information

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS

A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS Journal of the Applied Mathematics, Statistics Informatics (JAMSI), 7 (2011), No. 1 A SYMMETRIC INFORMATION DIVERGENCE MEASURE OF CSISZAR'S F DIVERGENCE CLASS K.C. JAIN AND R. MATHUR Abstract Information

More information

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016

MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 MGMT 69000: Topics in High-dimensional Data Analysis Falll 2016 Lecture 14: Information Theoretic Methods Lecturer: Jiaming Xu Scribe: Hilda Ibriga, Adarsh Barik, December 02, 2016 Outline f-divergence

More information

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets

Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets Generalized Hypothesis Testing and Maximizing the Success Probability in Financial Markets Tim Leung 1, Qingshuo Song 2, and Jie Yang 3 1 Columbia University, New York, USA; leung@ieor.columbia.edu 2 City

More information

Classification Logistic Regression

Classification Logistic Regression Announcements: Classification Logistic Regression Machine Learning CSE546 Sham Kakade University of Washington HW due on Friday. Today: Review: sub-gradients,lasso Logistic Regression October 3, 26 Sham

More information

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions

Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions UDC 519.2 Statistical Analysis of Data Generated by a Mixture of Two Parametric Distributions Yu. K. Belyaev, D. Källberg, P. Rydén Department of Mathematics and Mathematical Statistics, Umeå University,

More information

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS

QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS QUANTIZATION FOR DISTRIBUTED ESTIMATION IN LARGE SCALE SENSOR NETWORKS Parvathinathan Venkitasubramaniam, Gökhan Mergen, Lang Tong and Ananthram Swami ABSTRACT We study the problem of quantization for

More information

Spazi vettoriali e misure di similaritá

Spazi vettoriali e misure di similaritá Spazi vettoriali e misure di similaritá R. Basili Corso di Web Mining e Retrieval a.a. 2009-10 March 25, 2010 Outline Outline Spazi vettoriali a valori reali Operazioni tra vettori Indipendenza Lineare

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information

Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information In Geometric Science of Information, 2013, Paris. Law of Cosines and Shannon-Pythagorean Theorem for Quantum Information Roman V. Belavkin 1 School of Engineering and Information Sciences Middlesex University,

More information

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th

STATS 306B: Unsupervised Learning Spring Lecture 3 April 7th STATS 306B: Unsupervised Learning Spring 2014 Lecture 3 April 7th Lecturer: Lester Mackey Scribe: Jordan Bryan, Dangna Li 3.1 Recap: Gaussian Mixture Modeling In the last lecture, we discussed the Gaussian

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

Finite Singular Multivariate Gaussian Mixture

Finite Singular Multivariate Gaussian Mixture 21/06/2016 Plan 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Plan Singular Multivariate Normal Distribution 1 Basic definitions Singular Multivariate Normal Distribution 2 3 Multivariate

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Brown University CSCI 1950-F, Spring 2012 Prof. Erik Sudderth Lecture 20: Expectation Maximization Algorithm EM for Mixture Models Many figures courtesy Kevin Murphy s

More information

Clustering with k-means and Gaussian mixture distributions

Clustering with k-means and Gaussian mixture distributions Clustering with k-means and Gaussian mixture distributions Machine Learning and Category Representation 2014-2015 Jakob Verbeek, ovember 21, 2014 Course website: http://lear.inrialpes.fr/~verbeek/mlcr.14.15

More information

Variational inference

Variational inference Simon Leglaive Télécom ParisTech, CNRS LTCI, Université Paris Saclay November 18, 2016, Télécom ParisTech, Paris, France. Outline Introduction Probabilistic model Problem Log-likelihood decomposition EM

More information

Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches

Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches University of Illinois Department of Economics Spring 2017 Econ 574 Roger Koenker Lecture 10 Maximum Likelihood Asymptotics under Non-standard Conditions: A Heuristic Introduction to Sandwiches Ref: Huber,

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation University of Pavia Maximum Likelihood Estimation Eduardo Rossi Likelihood function Choosing parameter values that make what one has observed more likely to occur than any other parameter values do. Assumption(Distribution)

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Lecture 13: More uses of Language Models

Lecture 13: More uses of Language Models Lecture 13: More uses of Language Models William Webber (william@williamwebber.com) COMP90042, 2014, Semester 1, Lecture 13 What we ll learn in this lecture Comparing documents, corpora using LM approaches

More information

EM Algorithm LECTURE OUTLINE

EM Algorithm LECTURE OUTLINE EM Algorithm Lukáš Cerman, Václav Hlaváč Czech Technical University, Faculty of Electrical Engineering Department of Cybernetics, Center for Machine Perception 121 35 Praha 2, Karlovo nám. 13, Czech Republic

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

Computer simulation on homogeneity testing for weighted data sets used in HEP

Computer simulation on homogeneity testing for weighted data sets used in HEP Computer simulation on homogeneity testing for weighted data sets used in HEP Petr Bouř and Václav Kůs Department of Mathematics, Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University

More information

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo

A BLEND OF INFORMATION THEORY AND STATISTICS. Andrew R. Barron. Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo A BLEND OF INFORMATION THEORY AND STATISTICS Andrew R. YALE UNIVERSITY Collaborators: Cong Huang, Jonathan Li, Gerald Cheang, Xi Luo Frejus, France, September 1-5, 2008 A BLEND OF INFORMATION THEORY AND

More information

Regression Clustering

Regression Clustering Regression Clustering In regression clustering, we assume a model of the form y = f g (x, θ g ) + ɛ g for observations y and x in the g th group. Usually, of course, we assume linear models of the form

More information