ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI

Size: px

Start display at page:

Download "ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI"

Meagan Welch
5 years ago
Views:

1 ZOBECNĚNÉ PHI-DIVERGENCE A EM-ALGORITMUS V AKUSTICKÉ EMISI Jan Tláskal a Václav Kůs Faculty of Nuclear Sciences and Physical Engineering, Czech Technical University in Prague

2 Concepts What the φ-divergences stand for. We give a simple method of construction of φ-divergences using a normalization of convex or concave functions. We mention the well-known divergences (Kullback, Hellinger, χ 2, Power,...) We introduce several of its modifications: generalized LeCam and Hellinger divergences We apply selected φ-divergence distance measure to the real classification problem (experimental design of steel material 16530). These new families of divergences open new research possibilities in the area of statistical treatment of acoustic emission sources.

3 Overall assumptions: (X, A) measurable space, P(X, A) the set of all distributions, P P(X, A), nonvoid, P P, P,Q P are dominated by a σ finite measure µ on (X, A), p = dp/dµ, q = dq/dµ denotes the Radon-Nikodym derivative of P,Q with respect to µ, X n = (X 1,...,X n ) vector i.i.d. P 0 P, P n P(X, A) empirical distribution (measure), P n (A) = 1 n n I A (X i ), A A, n = 1,2,.... i=1

4 Definition of φ divergences: D φ (P,Q) = X q φ ( ) p dµ, P,Q P(X, A), q φ : (0, ) R, convex on (0, ), φ is strictly convex at t = 1, φ(1) = 0. Invariant w.r.t. linear transformations (nonnegative version): φ(t) = φ(t) φ +(1)(t 1), t (0, ). Denotation: φ(0) := lim t 0 + φ(t), φ( )/ := lim t φ(t)/t.

5 Metric properties of divergences RANGE: 0 D φ (P,Q) φ(0) + φ( )/, REFLEXIVITY: D φ (P,Q) = 0 iff P = Q, SYMMETRY: D φ (P,Q) = D φ (Q,P)????, TRIANGLE INEQUALITY: D φ (P,Q) D φ (P,R) + D φ (R,Q)????.

6 Examples of standard φ divergences: φ(t) φ(t) Dφ (P, Q) Kullback (I) lnt lnt + t 1 I(Q, P) = q ln q p dµ Shannon (I) t ln t t ln t t + 1 I(P, Q) = p ln p q dµ Variation (V) t 1 2 2t, t < 1 0, t 1 V(P, Q) = p q dµ Pearson (χ 2 ) t 2 1 (t 1) 2 χ 2 (P, Q) = (p q) 2 q dµ Neyman (χ 2 ) (1 t)/t (t 1) 2 /t χ 2 (Q, P) = (p q) 2 p dµ

7 Examples of standard φ divergences (continuation): Le Cam (LC 2 ) φ(t) φ(t) Dφ (P, Q) 2 1 t 1 + t (t 1) 2 t + 1 LC 2 (P, Q) = (p q) 2 p+q dµ Hellinger (H 2 ) 2(1 t) ( t 1) 2 H 2 (P, Q) = ( p q ) 2 dµ Vajda (χ a ) t 1 a χ a (P, Q) = q 1 a p q a dµ Matusita (M a ) t a 1 1/a M a (P, Q) = p a q a 1 a dµ Power (I a ) t a 1 a(a 1) t a a(t 1) 1 ( a(a 1) I a = 1 a(a 1) p a q 1 a dµ 1 )

8 Simple construction of φ-divergences Assume the following conditions: ψ : (0, ) R is a convex or concave function, twice differentiable at t = 1 with ψ (1) 0. CONSTRUCTION: φ(t) = ψ(t) ψ(1) ψ (1)(t 1) ψ, t (0, ), (1) (1) is a fully normalized nonnegative divergence function (i.e. φ (1) = 0 and φ (1) = 1).

9 Example 1: Generalized Le Cam divergence LC β (P, Q) LC β (P,Q) defined by means of strictly convex ψ function: ψ b (t) = 1, t (0, ), b 0. b + t Applying CONSTRUCTION (1) we get: φ b (t) = b (1 t) 2, t > 0, b 0. b + t Blended divergences for β = 1/(b + 1) [0, 1]: LC β (P,Q) := D φβ (P,Q) = 1 2 (p q) 2 dµ, P,Q P. βp + (1 β)q Lindsay (1994) introduced blending parameter β.

10 The special cases: Pearson s χ 2 (P,Q)/2 for β = 0 (b ), Neyman s χ 2 (Q,P)/2 for β = 1 (b = 0), LeCam s distance LC 2 (P,Q) for β = 1/2 (b = 1). LC β (P,Q) bounded for all β (0,1). LC β (P,Q) satisfies the skew symmetry about β = 1/2, LC β (P,Q) = LC 1 β (Q,P), P,Q P, The only one symmetric divergence: LC 1(P,Q) = LC 2 (P,Q). 2

11 Example 2: Generalized Hellinger divergence H β (P, Q) Let start with the ψ function: ψ a,b (t) = 1 t2a, b 0, a 0,1. b + ta For a = 1/2 and the convex transformation g(y) = y 2 we get: ( ) 1 t 2 φ b (t) := ψ 2 1,b(t) = 2 b +, t (0, ). t Applying CONSTRUCTION (1) with β = 1/(1 + b) [0, 1]: φ β (t) = 1 ( ) 1 t β + β, t (0, ), β [0,1], t H β (P,Q) = 1 (p q) 2 2 (β p + (1 β) dµ, P,Q P. q) 2

12 Particular cases: Pearson s χ 2 (P,Q)/2 for β = 0 (b ), Neyman s χ 2 (Q,P)/2 for β = 1 (b = 0), Hellinger s 2H 2 (P,Q) for β = 1/2 (b = 1). H β (P,Q) bounded for all β (0,1), H β (P,Q) satisfies the skew symmetry about β = 1/2, H β (P,Q) = H 1 β (Q,P), P,Q P, The only one symmetric divergence: H1(P,Q) = 2H 2 (P,Q). 2 Hellinger robust estimation techniques in Statistics.

13 Clustering via distribution mixtures The distribution mixture p(x Θ) = α j p j (x θ j ), j M α j = 1, α j 0, j M Θ = (α 1,...,α M,θ 1,...,θ M ). Clustering with the mixture / M (t i ) k = α k p k (x i θ k ) m=1 α m p m (x i θ m ), k (1,...,M). The k-th component of t i evaluates the probability of x i belonging to the k-th component of the mixture.

14 Fitting the mixture The distribution mixture that fits the observed data best is chosen via the maximum likelihood method. Likelihood function N l(θ x) = ln p(x i Θ) = i=1 N M ln α j p j (x i θ j ), i=1 j=1 Maximum likelihood estimate Θ = arg max Θ l(θ x).

15 EM algorithm Missing data principle z = ((x 1,y 1 ) T,...,(x N,y N ) T ). EM algorithm P(x Θ) = P(z Θ)/P(x x, Θ), l(θ x) = l(θ z) ln P(x x,θ), l(θ x) = E[l(Θ z) x,ψ] E[ln P(x x,θ) x,ψ]. E-Step: Calculate Q(Θ, Ψ) = E[l(Θ z) X, Ψ], M-Step: Maximize Q(Θ,Ψ) with respect to Θ.

16 Convenient properties of the mixture A model based method, best fit on elliptical clusters, Not susceptible to uneven number of members of clusters, Does not require independent variables, Robust against outliers, EM algorithm yields straightforward iteration pattern, EM spontaneously suppresses needless components.

17 Number of clusters Penalization criteria based on likelihood function Akaike s information criterion: M 0 = arg min M N [ 2l(Θ M x) + d M(Θ M )], Bayes information criterion: M 0 = arg min M N [ 2l(Θ M x) + d M(Θ M )ln N], CLC-ICL information criterion: M 0 = arg min M N [ 2l(Θ M x) + 2EN(Θ M,x) + d M ln N +...].

18 Experimental tasks The practical results and experiments (the main tasks): Measure different types of acoustic signals, Combine both the main approaches, i.e. the generalized φ-divergences and the distribution mixture method, Verify proposed classification method on experimental data sets Verify the advantage of the combined method, i.e. the ability of assessing the number of clusters, the robustness against sparse outliers that would distort classical classification.

19 Experimental setup on the metal plate

20 Piezo-ceramic sensors (mini and medium sizes)

21 Experimental measurement devices

22 Acoustic signals detected through the piezo-ceramic sensors, Attached to the thin metal plate of sizes 1,8m 0,6m 3mm, Measuring detection device: DAKEL-XEDO 5 with the properties: 4 MHz sampling rate, 12-bit accuracy, i.e. Voltage in the interval [-2048mV, 2048mV ]. Computed classification attributes: The spectral densities of the signal {X t } T t=0 1 Signal attributes given above W, Q 0.33, Z c, denoted by S(f ), D φ discrete form of the generalized Hellinger divergence of the Example 2 with blending parameter β = 0.5 (denoted as H D ).

23 Experiment in action

24 Separation by means of the proposed attributes Type 1 Type 2 Type 3 Type 4 Type Type 1 Type 2 Type 3 Type 4 Type Z c 120 Kv Observation number Observation number Parameters Q 0.33, W Z c, Q 0.33 Z c, Q 0.33, H D Success rate 49% 83% 86%

25 Design of the Experiment Experiment consisted of destructive testing of the steel plate material No which was exposed to strong pressure of a very hard ball. The experiment proceeds till the destruction of the steel material emerged, i.e. the ball passed through the plate so that the material was no longer disposable for any reasonable applications in the industry. The main task of acoustic emission is to distinguish the level of material degradation and prevent us against the critical accident or emergency affair in transportation, chemical or energetic industry.

26 Acoustic signals in the Experiment We had at disposal overall 587 acoustic signals, which were divided into three groups: initial phase (without apparent damage) the first 100 signals considered, middle phase of the experiment the signals from 101 to 350, terminal phase (extensively damaged steel) the signals from 351 to 587. The main task of the experiment 16530: To separate detected AE signals of the initial period (when the steel material has not been too damaged yet) from the terminal period when the plate was close to destruction just before the failure.

27 Separation based on EM-algorithm and the attributes W, Q 0.33 with divergence H D D H W Q.33 0

28 Separation based on EM-algorithm and parameters W, Q komponenta 2. komponenta 3. komponenta 250 W Q 0.33

29 EM algorithm combined with Hellinger φ-divergence The success rates were the following: Distinguishing between all the three phases of experiment: in 63 percents. Separation of the only initial (safe) period against the terminal (critical dangerous) period: about 81 percents. The general conditions of the experiment: We yield these results without any signal preprocessing and without any expert based purification of the data set. We only restrict ourselves to middle part of the signals detected, it means the cutout part ranged from 500 to of digitally sampled values for each acoustic signal. Powerful detection of the mass concentration in signal frequency domain depending on the level of destruction of material.

30 Very similar overstrain processes can be found in our real life, when repeated load of pressure is caused by any sharp and hard object ending in a complete damage of the industrial device under consideration. As an example we present the testing design: Wear Detection of the Axial Ball and Roller Bearings In cooperation with Technical University in Brno, Czech Rep. Doc. Pavel Mazal, VUT Brno

31 Wear Detection of the Axial Ball Bearings

32 Wear Detection of the Roller Bearings

37 :-) Thanks a lot for your attention.

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.