Statistical machine learning and its application to neonatal seizure detection

Size: px

Start display at page:

Download "Statistical machine learning and its application to neonatal seizure detection"

Neil Cannon
5 years ago
Views:

1 19/Oct/2009 Statstcal machne learnng and ts applcaton to neonatal sezure detecton Presented by Andry Temko Department of Electrcal and Electronc Engneerng

2 Page 2 of 42 A. Temko, Statstcal Machne Learnng Outlne Introducton to Pattern Recognton Constructon of a SVM classfer Constructon of a GMM classfer EEG and neonatal sezure detecton Expermental results

3 Page 3 of 42 A. Temko, Statstcal Machne Learnng Introducton to PR and Machne Learnng Human tasks: face / spoken words / Machne Percepton: ASR / fngerprnt / DNA / Structure: pre-processng / segmentaton / feature extracton / classfcaton

4 Page 4 of 42 A. Temko, Statstcal Machne Learnng Example: Fsh Classfcaton Camera snapshots

5 Page 5 of 42 A. Temko, Statstcal Machne Learnng Example: Features Wdth Lghtness

6 Page 6 of 42 A. Temko, Statstcal Machne Learnng Example: Decson Feature vector of two dmensons

7 Page 7 of 42 A. Temko, Statstcal Machne Learnng Example: Decson Feature vector of two dmensons

8 Page 8 of 42 A. Temko, Statstcal Machne Learnng Example: Decson Feature vector of two dmensons

9 Page 9 of 42 A. Temko, Statstcal Machne Learnng Man Concept of PR Fnd/desgn a model/functon whch leads to the lowest error on test (unseen) data

10 Page 10 of 42 A. Temko, Statstcal Machne Learnng Mathematcs Tranng data vectors n X = { x1,... x L }, x R Correspondng labels Y y,... y L }, y 1, 1 { { } = 1 Fnd a functon Rsk (Emprcal) f ( x, θ ) θ R emp wth parameters dfferent values of generate dfferent learnng functons f. m 1 ( θ ) = ( f ( x, θ ) y m wdely used n learnng algorthms (EM, least square, etc ) = 1 ) θ

11 Page 11 of 42 A. Temko, Statstcal Machne Learnng Generalzaton Danger for a researcher!!! (θ ) R emp can be as low as desred for desred for the arbtrarly-chosen parameters of f (, ) θ x θ R actual ( θ ) = ( f ( x, θ ) y ) dp( x, y) man target Remp R actual n the lmt of the nfnte sample sze

12 Page 12 of 42 A. Temko, Statstcal Machne Learnng Bounds for Actual Rsk Exstng bounds: Chernoff, Bhattacharyya, Loose, normal dstrbuton assumpton, Tghter dstrbuton-free bound s based on VC dmenson concept (Vapnk - Chervonenks) Structural Rsk Mnmzaton R actual ( θ ) R emp ( θ ) + h(ln(2m / h) + 1) ln( η / 4) m where h s a capacty term (VC dmenson), m number of tranng ponts

13 Page 13 of 42 A. Temko, Statstcal Machne Learnng VC Dmenson & Testng Error Optmal pont wth mnmal R

14 Page 14 of 42 A. Temko, Statstcal Machne Learnng Outlne Introducton to Pattern Recognton Constructon of a SVM classfer Constructon of a GMM classfer EEG and neonatal sezure detecton Expermental results

15 Page 15 of 42 A. Temko, Statstcal Machne Learnng Support Vector Machnes Lnear classfer f ( x) = w x + b Data of two classes f ( x ) + 1 for y = + 1 Separatng hyperplanes f ( x ) 1 for y = 1

16 Page 16 of 42 A. Temko, Statstcal Machne Learnng SVM. Margn Capacty term (VC dmenson) s related to the margn h < R 2 w 2 w f ( x) = 1 f ( x) = w x + margn b 2 w f ( x) = 1 Separatng hyperplane f ( x) = 0

17 Page 17 of 42 A. Temko, Statstcal Machne Learnng SVM. Formulaton Lnear classfer f ( x) = w x + b mnmze 1 w 2 2 mnmze the capacty term subject to y f ( x ) 1,. classfy correctly all tranng data Structural Rsk Mnmzaton R actual ( θ ) R emp ( θ ) + h(ln(2m / h) + 1) m ln( η / 4)

18 Page 18 of 42 A. Temko, Statstcal Machne Learnng Optmzaton Problem Lagrange Functon L 1 2 L( w, b) = w α [ y 2 = 1 f ( x ) 1] b L( w, b) = 0, L( w, b) = w 0 Dual formulaton L = 1 α 1 2 L L = 1 j= 1 α α y j y j x x j subject to L = 1 α y α 0, = 0

19 Page 19 of 42 A. Temko, Statstcal Machne Learnng Nonlnear SVM. Kernels 1 Data of two classes. Input space Transformaton functon φ Data of two classes. Feature space w Labeled {-1} Labeled {1} Radal Bass Functon (RBF): Polynomal: K( x, x j j ) = e x x K ( x, x ) = ( x x ) Separatng hyperplane j j 2 /2σ d

20 Page 20 of 42 A. Temko, Statstcal Machne Learnng Summary: SVM Structural RM (control on capacty) Absence of local mnma (convexty) Kernels (nonlnear mplct transformaton) Sparseness (small part of data are SVs) Bnary classfer (mult-class complcated) Large-scale (quadratc/superlneal complexty) Lack of probablstc nterpretaton for output

21 Page 21 of 42 A. Temko, Statstcal Machne Learnng Outlne Introducton to Pattern Recognton Constructon of a SVM classfer Constructon of a GMM classfer EEG and neonatal sezure detecton Expermental results

22 Page 22 of 42 A. Temko, Statstcal Machne Learnng Gaussan Mxture Models (I) Based on Bayesan probablty theory: A feature vector s denoted as x = [x 1 ; x 2 ; : : : ; x D ] T The probablty that a feature vector x belongs to class w k s p(w k x) and ths posteror probablty can be computed va p( w x) k = p( x w ) P( w k p( x) k )

23 Page 23 of 42 A. Temko, Statstcal Machne Learnng Gaussan Mxture Models (II) C p( x w; θ ) = αν( x; μ, Σ = 1 ) N( x, μ, Σ) = 1 (2π ) D Σ e 1 T 1 ( x μ) 2 Σ ( x μ) pdf class w k μ 1 Σ 1 α 1 μ 2 Σ 2 α 2 μ C Σ C α C

24 Page 24 of 42 A. Temko, Statstcal Machne Learnng Summary: GMM Probablstc framework (lkelhoods, prors) Drect extenson to mult-class problem Onlne adaptaton (exstng MAP/MLLR) Large-scale tranng possble (DB extenson) Local mnma (Expectaton-Maxmzaton) A number of free parameters (NGaus, Nfeatures, ) Emprcal RM (no control of complexty)

25 A. Temko, Statstcal Machne Learnng A. Temko, Statstcal Machne Learnng Page Page of 42 of 42 Testng process: SVM and GMM + = = = b x z K x z K sgn x f y S z y S z 1 1 ), ( ), ( ) ( α α y 1 where - Labels of Support Vectors b - Bas of the hyperplane K - Kernel functon α - Weghts (Lagrange multplers) - Support Vectors z N SVM GMM = = = ), ; ( ), ; ( ) ( C C Ν Ν sgn x f Σ μ x Σ μ x α α - Gaussan dstrbuton α - Weghts - Centrods μ

26 Page 26 of 42 A. Temko, Statstcal Machne Learnng Outlne Introducton to Pattern Recognton Constructon of a SVM classfer Constructon of a GMM classfer EEG and neonatal sezure detecton Expermental results

27 Page 27 of 42 A. Temko, Statstcal Machne Learnng Neonatal Sezures Background per 1000 lve brths (hgher n babes wth low brthweght and <38 wks GA) clncal dagnoss 25-30% of hgh-rsk babes wll develop sezures Occur early n lfe 87% wthn frst 48 hrs Harmful to the developng bran

28 Page 28 of 42 A. Temko, Statstcal Machne Learnng Clncal manfestaton 9% documented Clncal sezures

29 Page 29 of 42 A. Temko, Statstcal Machne Learnng Gold Standard Contnuous EEG: Sezure detecton rate 100% Montorng of sezure treatment Sezure onset Long-term prognoss EEG nterpretaton: Requres specal expertse Not wdely avalable 50μV 2 sec Not avalable 24/7

30 Page 30 of 42 A. Temko, Statstcal Machne Learnng Objectve To develop an automated sezure detecton algorthm for mplementaton n NICU strong collaboraton between clncans and bomedcal engneers

31 Page 31 of 42 A. Temko, Statstcal Machne Learnng Challenges Human computer s dffcult to replcate Neonatal sezures demonstrate nter and ntra ndvdual varablty They evolve both temporally and spatally Can be of relatvely low ampltude Influenced by background EEG actvty Artefacts are common and mpact on detecton

32 Page 32 of 42 A. Temko, Statstcal Machne Learnng Automated Sezure Detecton Artefact removal, re-samplng, and segmentaton 55 features. Frequency (envelope), model-based (AR), structural (entropy), tme-doman (ZCR,E), etc SVM/GMM Smoothng and thresholdng

33 Page 33 of 42 A. Temko, Statstcal Machne Learnng Outlne Introducton to Pattern Recognton Constructon of a SVM classfer Constructon of a GMM classfer EEG and neonatal sezure detecton Expermental results

34 Page 34 of 42 A. Temko, Statstcal Machne Learnng Developed Systems SVM-based system GMM-based system The output of systems s a probablty of the sezure! Dfferent confdence levels dfferent decsons Flexblty for clncal needs (unlke rule-based methods)

35 Page 35 of 42 A. Temko, Statstcal Machne Learnng Database & Expermental Setup Recordngs from 17 newborns 691 sezure events (average duraton 4 mns) 267 hours of EEG Non-sezure (89%) Sezure (11%) Leave-one-out testng EEG has been annotated by a traned neurophysologst

36 Page 36 of 42 A. Temko, Statstcal Machne Learnng Performance Measures Epoch-based metrcs Senstvty = % Sezure epochs correctly classfed Specfcty = % Non-sezure epochs correctly classfed ROC curve = Plot of all senstvty and specfcty pars Event-based metrcs GDR = % sezures detected FD/h = Mean number of false detectons per hour Mean duraton of false detectons

37 Page 37 of 42 A. Temko, Statstcal Machne Learnng Expermental Results ROC GDR vs. FD/h SVM outperforms GMM More than 50% of false detectons are dfferent for SVM and GMM a good condton for successful applcaton of fuson technques to mprove the performance ->Current work More data -> GMM benefts

38 Page 38 of 42 A. Temko, Statstcal Machne Learnng Comparson wth recently reported systems method Temko et al. (2009) Gotman et al. (1997) event detecton rate (%) false alarms/hr senstvty (%) specfcty (%) Aarab et al. (2007) Navakatkyan et al. (2006) Mtra et al. (2009) Greene et al. (2008)

39 Page 39 of 42 A. Temko, Statstcal Machne Learnng Movng forward The developed automated sezure detecton algorthm s the best performng algorthm to date Intellectual property has been protected by a patent Demo s mplemented Testng of algorthm n clncal envronment (clncal tral)

40 Page 40 of 42 A. Temko, Statstcal Machne Learnng Thank you

Support Vector Machines. Vibhav Gogate The University of Texas at dallas

Support Vector Machines. Vibhav Gogate The University of Texas at dallas Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest