Machine Learning for Signal Processing Applications of Linear Gaussian Models
|
|
- Judith White
- 5 years ago
- Views:
Transcription
1 Machne Learnng for Sgnal Processng Applcatons of Lnear Gaussan Models Class 8. 3 Nov 207 Instructor Najm Dehak In collaboraton th Prof Bhksha Raj
2 Recap: MAP stmators MAP (Mamum A Posteror): Fnd most probable value of y gven y = argma Y P(Y ) /
3 MAP estmaton and y are jontly Gaussan z y z s Gaussan z z y C Cy ar ( z) Czz Cy ( )( y y ) C y Cyy P( z) N(, C z zz ) 2 C zz ep 0.5( z ) z C zz ( z ) z /
4 MAP estmaton: Gaussan PDF Y F X /
5 MAP estmaton: he Gaussan at a partcular value of X /
6 Condtonal Probablty of y P( y ) N( y C y C ( ), C yy C y C C y ) y C C ( ) y y y y ar( y ) C yy C y C C y he condtonal probablty of y gven s also Gaussan he slce n the fgure s Gaussan he mean of ths Gaussan s a functon of he varance of y reduces f s knon Uncertanty s reduced /
7 MAP estmaton: he Gaussan at a partcular value of X Most lkely value F /
8 Its also a mnmum-mean-squared error estmate Mnmze error: Dfferentatng and equatng to 0: / ˆ ˆ ˆ 2 y y y y y y rr 2ˆ ˆ ˆ 2ˆ ˆ ˆ y y y y y y y y y y y y rr 0 ˆ 2 ˆ 2ˆ. y y y y d d rr d ˆ y y he MMS estmate s the mean of the dstrbuton
9 For the Gaussan: MAP = MMS Most lkely value s also he MAN value Would be true of any symmetrc dstrbuton /
10 Gaussans and more Gaussans.. Lnear Gaussan Models.. PCA to develop the dea of LGM 0
11 A Bref Recap D C B D BC Prncpal component analyss: Fnd the K bases that best eplan the gven data Fnd B and C such that the dfference beteen D and BC s mnmum Whle constranng that the columns of B are orthonormal
12 Learnng PCA For the gven data: fnd the K-dmensonal subspace such that t captures most of the varance n the data arance n remanng subspace s mnmal 2
13 A Statstcal Formulaton of PCA rror s at 90 o to the egenface e 22 e D 2 e ~ N(0, B) ~ N(0, ) s a random varable generated accordng to a lnear relaton s dran from an K-dmensonal Gaussan th dagonal covarance e s dran from a 0-mean (D-K)-rank D-dmensonal Gaussan stmate (and B) gven eamples of 3
14 Lnear Gaussan Models!! e ~ N(0, B) e ~ N(0, ) s a random varable generated accordng to a lnear relaton s dran from a Gaussan e s dran from a 0-mean Gaussan stmate gven eamples of In the process also estmate B and 4
15 stmatng the varables of the μ model e ~ N(0, I) e ~ N(0, ) ~ N( μ, ) stmatng the varables of the LGM s equvalent to estmatng P() he varables are,, and 5
16 he Mamum Lkelhood stmate ~ N( μ, ) Gven tranng set, 2,.. N, fnd,, he ML estmate of does not depend on the covarance of the Gaussan μ N 6
17 Smplfed Model e ~ N(0, I) e ~ N(0, ) ~ N(0, ) stmatng the varables of the LGM s equvalent to estmatng P() he varables are, and 7
18 LGM: he complete M algorthm Intalze and step: M step: 8 ) ( I ) ( N N
19 So hat have e acheved mployed a complcated M algorthm to learn a Gaussan PDF for a varable What have e ganed??? ample uses: PCA Sensble PCA M algorthms for PCA Factor Analyss FA for feature etracton 9
20 LGMs : Applcaton Learnng prncpal components e ~ N(0, I ) e ~ N(0, ) Fnd drectons that capture most of the varaton n the data rror s orthogonal to prncpal drectons e = 0; e = 0 20
21 Some Observatons: e e ~ N(0, ) ee ee 0e 0 he covarance of e s orthogonal to 2
22 Observaton ) ( ) ( ) ( ) ( ) ( ) ( Proof ) ( ) ( 0 ) ( I
23 Observaton 3 0 ( ) ( ) pnv() 23
24 LGM: he complete M algorthm Intalze and step: M step: 24 ) ( I ) ( N N W X e
25 LGM: he complete M algorthm Intalze and step: M step: 25 ) ( I ) ( N N W X e
26 LGM: he complete M algorthm Intalze and step: M step: 26 I ) ( N N pnv ) ( ) ( W X e
27 LGM: he complete M algorthm Intalze and step: M step: 27 N N pnv ) ( I ) ( W X e
28 LGM: he complete M algorthm Intalze and step: M step: 28 N N I ) ( X W ) pnv( X W pnv ) (
29 LGM: he complete M algorthm Intalze and step: M step: 29 I ) ( N N X W ) pnv( X W pnv ) (
30 M for PCA Intalze and step: M step: 30 N N X W ) pnv( X W I ) ( pnv ) (
31 M for PCA Intalze and step: M step: 3 N N X W ) pnv( pnv ) ( I ) (
32 M for PCA Intalze and step: M step: 32 ) ( WW XW N N X W ) pnv( pnv ) ( I ) (
33 M for PCA Intalze and step: M step: 33 ) ( ) ( W X WW XW pnv N N X W ) pnv( pnv ) ( I ) (
34 M for PCA Intalze and step: pnv( ) W pnv( ) X I ( ) M step: X pnv(w) N N 34
35 M for PCA Intalze and step: M step: 35 (W) X pnv N N X W ) pnv( I ) (
36 I ) ( M for PCA Intalze and step: M step: 36 (W) X pnv N N X W ) pnv( rrelevant
37 M for PCA Intalze Iterate W pnv( ) X X pnv(w) Note: ll not be actual egenvectors, but a set of bases n space spanned by prncpal egenvectors Addtonal decorrelaton thn PC space may be needed 37
38 Why M PCA? X XX ample: Computng egenfaces ach face s 0000 : 0000 dmensonal But only 300 eamples X s What s the sze of the covarance matr? What s ts rank? 38
39 PCA on llcondtoned data Fe nstances of hgh-dmensonal data No. nstances < dmensonalty Covarance matr s very large gen decomposton s epensve.g dmensonal data: Covarance has 0 2 elements But the rank of the covarance s lo Only the no. of nstances of data 39
40 Why M PCA? X X W , W Consequence of lo rank X he actual number of bases s lmted to the rank of X Note actual sze of Ma number of columns = mn(dmenson, no. data ponts) No. of columns = rank of (XX ) Note sze of W Ma number of ros = mn(dmenson, no. of data ponts) 40
41 Why M PCA? X X W , W If X s hgh dmensonal Partcularly f the number of vectors n X s smaller than the dmensonalty Pnv() and pnv(w) are effcent to compute ll have a ma of 300 columns n the eample W ll have a ma of 300 ros 4
42 PCA as an nstance of LGM eng PCA as an nstance of lnear Gaussan models leads to M soluton ery effectve n dealng th hghdmensonal and/or data poor stuatons An asde: Another smpler soluton for the same stuaton.. 42
43 An Asde: he GRAM trck X XX he number of non-zero gen values s no more than the length of the smallest edge of X 300 n ths case hs leads to the gram trck.. Assumpton X X s nvertble: the nstances are lnearly ndependent 43
44 An Asde: he GRAM trck X X If X s , XX = XX s large but X X s not X X If X s , X X = Dffcult to compute gen vectors of XX But easy to compute gen vectors of X X 44
45 he Gram rck o compute prncpal vectors e gendecompose XX XX Let us fnd the gen vectors of X X nstead X X ˆ ˆ ˆ Manpulatng t slghtly Note that for a dagonal matr: -0.5 = -0.5 X Xˆ ˆ ˆ ˆ ˆ 45
46 he Gram rck gendecompose X X nstead of XX X X ˆ ˆ ˆ X Xˆ ˆ ˆ ˆ ˆ ˆ ˆ 0.5 ˆ ˆ 0. 5 XX X X ˆ Lettng: Xˆ ˆ 0. 5 XX ˆ s the matr of genvectors of XX!!! 46
47 he Gram rck When X s lo rank or XX s too large: Compute X X nstead Wll be manageable sze Perform gen Decomposton of X X X X ˆ ˆ ˆ Compute genvectors of XX as Xˆ ˆ 0. 5 hese are the prncpal components of X 47
48 Why M PCA Dmensonalty / Rank has alternate potental soluton Gram rck Other uses? Nose Incomplete data 48
49 PCA th nosy data e n ~ N(0, I ) e ~ N(0, ) n ~ N(0, B) rror s orthogonal to prncpal drectons e = 0; e = 0 Nose s sotropc B s dagonal Nose s not orthogonal to ether or e 49
50 LGM: he complete M algorthm Intalze and step: M step: 50 ) ( I ) ( N N
51 PCA th Nosy Data Intalze and B step: ( B) W X C NI N WW M step: XW C B N dag XX WX 5
52 PCA th Incomplete Data Ho to compute prncpal drectons hen some components n your tranng data are mssng? gen decomposton s not possble Cannot compute correlaton matr th mssng data 52
53 PCA th mssng data Ho t goes Gven : X = {X c, X m } X m are mssng components. Intalze: Intalze X m 2. Buld complete data X = {X c, X m } 3. PCA (X = W): stmate must have feer bases than dmensons of X 4. W = X 5. Xˆ = W 6. Select X m from 7. Return to 2 Xˆ 53
54 LGM for PCA Obvously many uses: Ill-condtoned data Nose Mssng data Any combnaton of the above.. 54
55 LGMs : Applcaton 2 Learnng th nsuffcent data he full covarance matr of a Gaussan has D 2 terms Fully captures the relatonshps beteen varables Problem: Needs a lot of data to estmate robustly 55
56 An Appromaton Assume the covarance s dagonal Gaussan s algned to aes : no correlaton beteen dmensons Covarance has only D terms Needs less data Problem : Model loses all nformaton about correlaton beteen dmensons 56
57 Is here an Intermedate Capture the most mportant correlatons But requre less data Soluton: Fnd the key subspaces n the data Capture the complete correlatons n these subspaces Assume data s otherse uncorrelated 57
58 Factor Analyss e ~ N(0, I) e ~ N(0, ) ~ N(0, ) s a full rank dagonal matr has K columns: K-dmensonal subspace We ll capture all the correlatons n the subspace represented by stmated covarance: Dagonal covarance plus the covarance beteen dmensons n 58
59 Factor Analyss Intalze and step: M step: 59 ) ( I ) ( N dag N
60 FA Gaussan Wll get a full covarance matr But only estmate DK terms Data nsuffcency less of a problem 60
61 he Factor Analyss Model e ~ N(0, I) e ~ N(0, ) LOADINGS FACORS Often used to learn dstrbuton of data hen e have nsuffcent data Often used n psychometrcs Underlyng model: he actual systematc varatons n the data are totally eplaned by a small number of factors FA uncovers these factors 6
62 FA, PCA etc. e ~ N(0, I) e ~ N(0, ) Note: dstncton beteen PCA and FA s only n the assumptons about e FA looks a lot lke PCA th nose FA can also be performed th ncomplete data 62
63 FA, PCA etc. PCA: rror s alays at 90 degrees to the bases n FA: rror may be at any angle PCA used manly to fnd prncpal drectons that capture most of the varance Bases n ll be orthogonal to one another FA tres to capture most of the covarance 63
64 FA: A very successful use oce bometrcs: Speaker recognton Gven: Only a small amount of tranng data from a speaker to learn ts model Use to verfy speaker later Problem: Immense varaton n ays people can speak Less than mnute of tranng data totally nsuffcent! 64
65 Speaker Recognton Speaker Identfcaton Speaker erfcaton Is ths Bob s Whose voce s voce? ths????? Speaker Darzaton : Segmentaton and clusterng Where are speaker changes? Speaker A Whch segments are from the same speaker? Speaker B 65
66 Frequency (Hz) Modelng Sequence of Features Gaussan Mture Models For most recognton tasks, e need to model the dstrbuton of feature vector sequences In practce, e often use Gaussan Mture Models (GMMs) Sgnal Space me (sec) MANY ranng Utterances vec/sec Feature Space GMM
67 Why GMMs oel Classfcaton PCA 67
68 Speaker erfcaton A model represents dstrbuton of cepstral vectors for the speaker A second model represents everyone else (potental mposters) he cepstra computed from a test recordng are scored aganst both models Accept the speaker f the speaker model scores hgher 68
69 GMM for speaker verfcaton We enroll a gven speaker by adaptng the UBM usng the speaker s nput speech. Reynolds 2000 Speaker Jm UBM est Utterance Yes / No? 69
70 Speaker erfcaton Problem: One typcally has only a fe seconds or mnutes of tranng data from the speaker Hard to estmate speaker model est data may be spoken dfferently, or come over a dfferent channel, or n nose Wont really match 70
71 Frequency (Hz) Modelng Sequence of Features Gaussan Mture Models For most recognton tasks, e need to model the dstrbuton of feature vector sequences In practce, e often use Gaussan Mture Models (GMMs) Sgnal Space me (sec) MANY ranng Utterances Feature Space GMM vec/sec 2 k hs supervector s the feature that represents the recordng
72 ranng Supervectors are obtaned for each tranng speaker by adaptng a Unversal background model traned from large amounts of data Fe data by each speaker to tran a GMM based on Mamum lkelhood 72 k 2 k 2 k 2 k 2
73 ranng the Factor Analyzer he supervectors are assumed to be the output of a lnear Gaussan process Use FA to estmate are the factors that cause varatons he real nformaton s n the factor 73 k 2 k 2 k 2 k 2 e ) (0, ~ N e ) (0, ~ I N
74 I-vector : otal varablty space
75 I-ector Factor analyss as feature etractor Speaker and channel dependent supervector M = m + s rectangular, lo rank (total varablty matr) standard Normal random (total factors ntermedate vector or -vector) Factor Analyss M F C C M F C C M F C C M F C C m 2 I - e c t o r
76 ranng models for a speaker 2 k = +e Use Lnear Dscrmnant Analyss to mamze ~ N(0, I the dscrmnaton beteen the speakers ) e ~ N(0, ) 76
77 Data sualzaton based on Graph Nce performance of the cosne smlarty for speaker recognton Data vsualzaton usng the Graph ploraton System (GUSS) Represent segment as a node th connectons (edges) to nearest neghbors (3 NN used) NN computed usng blnd system (th and thout channel normalzaton) Appled to 5438 utterances from the NIS SR0 core Multple telephone and mcrophone channels Absolute locatons of nodes not mportant Relatve locatons of nodes to one another s mportant: he vsualzaton clusters nodes that are hghly connected together Meta data (speaker ID, channel nfo) not used n layout Colors and shapes of nodes used to hghlght nterestng phenomena
78 Females data th ntersesson compensaton Colors represent speakers
79 Females data th no ntersesson compensaton Colors represent speakers
80 Females data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI
81 Females data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI L
82 Females data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI * =room HI t=room LDC MIC
83 Females data th no ntersesson compensaton Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI * =room HI t=room LDC MIC
84 Females data th ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI
85 Males data th ntersesson compensaton Colors represent speakers
86 Males data th no ntersesson compensaton Colors represent speakers
87 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI
88 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI L
89 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI * =room HI t=room LDC MIC
90 Males data th no ntersesson compensaton Cell phone Landlne 25573qqn 25573no Mc_CH08 Mc_CH04 Mc_CH2 Mc_CH3 Mc_CH02 Mc_CH07 Mc_CH05 = hgh = lo = normal t=room LDC * =room HI
91 Speaker representaton - v e c t o r - v e c t o r - v e c t o r - v e c t o r - v e c t o r Clusterng /
92 Speaker clusterng /
93 PCA sualzaton /
94 520-42/
Machine Learning for Signal Processing Linear Gaussian Models
Machne Learnng for Sgnal rocessng Lnear Gaussan Models lass 2. 2 Nov 203 Instructor: Bhsha Raj 2 Nov 203 755/8797 HW3 s up. Admnstrva rojects please send us an update 2 Nov 203 755/8797 2 Recap: MA stmators
More informationMachine Learning for Signal Processing Linear Gaussian Models
Machne Learnng for Sgnal Processng Lnear Gaussan Models Class 7. 30 Oct 204 Instructor: Bhksha Raj 755/8797 Recap: MAP stmators MAP (Mamum A Posteror: Fnd a best guess for (statstcall, gven knon = argma
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng 2 Logstc Regresson Gven tranng set D stc regresson learns the condtonal dstrbuton We ll assume onl to classes and a parametrc form for here s
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationStatistical pattern recognition
Statstcal pattern recognton Bayes theorem Problem: decdng f a patent has a partcular condton based on a partcular test However, the test s mperfect Someone wth the condton may go undetected (false negatve
More informationLECTURE :FACTOR ANALYSIS
LCUR :FACOR ANALYSIS Rta Osadchy Based on Lecture Notes by A. Ng Motvaton Dstrbuton coes fro MoG Have suffcent aount of data: >>n denson Use M to ft Mture of Gaussans nu. of tranng ponts If
More informationSupport Vector Machines. Vibhav Gogate The University of Texas at dallas
Support Vector Machnes Vbhav Gogate he Unversty of exas at dallas What We have Learned So Far? 1. Decson rees. Naïve Bayes 3. Lnear Regresson 4. Logstc Regresson 5. Perceptron 6. Neural networks 7. K-Nearest
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationLecture 10: Dimensionality reduction
Lecture : Dmensonalt reducton g The curse of dmensonalt g Feature etracton s. feature selecton g Prncpal Components Analss g Lnear Dscrmnant Analss Intellgent Sensor Sstems Rcardo Guterrez-Osuna Wrght
More informationOutline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil
Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationA Tutorial on Data Reduction. Linear Discriminant Analysis (LDA) Shireen Elhabian and Aly A. Farag. University of Louisville, CVIP Lab September 2009
A utoral on Data Reducton Lnear Dscrmnant Analss (LDA) hreen Elhaban and Al A Farag Unverst of Lousvlle, CVIP Lab eptember 009 Outlne LDA objectve Recall PCA No LDA LDA o Classes Counter eample LDA C Classes
More informationMultigradient for Neural Networks for Equalizers 1
Multgradent for Neural Netorks for Equalzers 1 Chulhee ee, Jnook Go and Heeyoung Km Department of Electrcal and Electronc Engneerng Yonse Unversty 134 Shnchon-Dong, Seodaemun-Ku, Seoul 1-749, Korea ABSTRACT
More informationPattern Classification
Pattern Classfcaton All materals n these sldes ere taken from Pattern Classfcaton (nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wley & Sons, 000 th the permsson of the authors and the publsher
More informationMACHINE APPLIED MACHINE LEARNING LEARNING. Gaussian Mixture Regression
11 MACHINE APPLIED MACHINE LEARNING LEARNING MACHINE LEARNING Gaussan Mture Regresson 22 MACHINE APPLIED MACHINE LEARNING LEARNING Bref summary of last week s lecture 33 MACHINE APPLIED MACHINE LEARNING
More informationComposite Hypotheses testing
Composte ypotheses testng In many hypothess testng problems there are many possble dstrbutons that can occur under each of the hypotheses. The output of the source s a set of parameters (ponts n a parameter
More informationINF 5860 Machine learning for image classification. Lecture 3 : Image classification and regression part II Anne Solberg January 31, 2018
INF 5860 Machne learnng for mage classfcaton Lecture 3 : Image classfcaton and regresson part II Anne Solberg January 3, 08 Today s topcs Multclass logstc regresson and softma Regularzaton Image classfcaton
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationP R. Lecture 4. Theory and Applications of Pattern Recognition. Dept. of Electrical and Computer Engineering /
Theory and Applcatons of Pattern Recognton 003, Rob Polkar, Rowan Unversty, Glassboro, NJ Lecture 4 Bayes Classfcaton Rule Dept. of Electrcal and Computer Engneerng 0909.40.0 / 0909.504.04 Theory & Applcatons
More informationKernels in Support Vector Machines. Based on lectures of Martin Law, University of Michigan
Kernels n Support Vector Machnes Based on lectures of Martn Law, Unversty of Mchgan Non Lnear separable problems AND OR NOT() The XOR problem cannot be solved wth a perceptron. XOR Per Lug Martell - Systems
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More informationLINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity
LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 30 Multcollnearty Dr. Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur 2 Remedes for multcollnearty Varous technques have
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationCS 523: Computer Graphics, Spring Shape Modeling. PCA Applications + SVD. Andrew Nealen, Rutgers, /15/2011 1
CS 523: Computer Graphcs, Sprng 20 Shape Modelng PCA Applcatons + SVD Andrew Nealen, utgers, 20 2/5/20 emnder: PCA Fnd prncpal components of data ponts Orthogonal drectons that are domnant n the data (have
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More information13 Principal Components Analysis
Prncpal Components Analyss 13 Prncpal Components Analyss We now dscuss an unsupervsed learnng algorthm, called Prncpal Components Analyss, or PCA. The method s unsupervsed because we are learnng a mappng
More informationComputation of Higher Order Moments from Two Multinomial Overdispersion Likelihood Models
Computaton of Hgher Order Moments from Two Multnomal Overdsperson Lkelhood Models BY J. T. NEWCOMER, N. K. NEERCHAL Department of Mathematcs and Statstcs, Unversty of Maryland, Baltmore County, Baltmore,
More informationMean Field / Variational Approximations
Mean Feld / Varatonal Appromatons resented by Jose Nuñez 0/24/05 Outlne Introducton Mean Feld Appromaton Structured Mean Feld Weghted Mean Feld Varatonal Methods Introducton roblem: We have dstrbuton but
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationChapter 2 - The Simple Linear Regression Model S =0. e i is a random error. S β2 β. This is a minimization problem. Solution is a calculus exercise.
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where y + = β + β e for =,..., y and are observable varables e s a random error How can an estmaton rule be constructed for the
More informationTracking with Kalman Filter
Trackng wth Kalman Flter Scott T. Acton Vrgna Image and Vdeo Analyss (VIVA), Charles L. Brown Department of Electrcal and Computer Engneerng Department of Bomedcal Engneerng Unversty of Vrgna, Charlottesvlle,
More informationUnified Subspace Analysis for Face Recognition
Unfed Subspace Analyss for Face Recognton Xaogang Wang and Xaoou Tang Department of Informaton Engneerng The Chnese Unversty of Hong Kong Shatn, Hong Kong {xgwang, xtang}@e.cuhk.edu.hk Abstract PCA, LDA
More informationCSE 252C: Computer Vision III
CSE 252C: Computer Vson III Lecturer: Serge Belonge Scrbe: Catherne Wah LECTURE 15 Kernel Machnes 15.1. Kernels We wll study two methods based on a specal knd of functon k(x, y) called a kernel: Kernel
More informationLecture Nov
Lecture 18 Nov 07 2008 Revew Clusterng Groupng smlar obects nto clusters Herarchcal clusterng Agglomeratve approach (HAC: teratvely merge smlar clusters Dfferent lnkage algorthms for computng dstances
More informationMixture o f of Gaussian Gaussian clustering Nov
Mture of Gaussan clusterng Nov 11 2009 Soft vs hard lusterng Kmeans performs Hard clusterng: Data pont s determnstcally assgned to one and only one cluster But n realty clusters may overlap Soft-clusterng:
More informationSupport Vector Machines
CS 2750: Machne Learnng Support Vector Machnes Prof. Adrana Kovashka Unversty of Pttsburgh February 17, 2016 Announcement Homework 2 deadlne s now 2/29 We ll have covered everythng you need today or at
More informationDiscriminative classifier: Logistic Regression. CS534-Machine Learning
Dscrmnatve classfer: Logstc Regresson CS534-Machne Learnng robablstc Classfer Gven an nstance, hat does a probablstc classfer do dfferentl compared to, sa, perceptron? It does not drectl predct Instead,
More informationInner Product. Euclidean Space. Orthonormal Basis. Orthogonal
Inner Product Defnton 1 () A Eucldean space s a fnte-dmensonal vector space over the reals R, wth an nner product,. Defnton 2 (Inner Product) An nner product, on a real vector space X s a symmetrc, blnear,
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationWhich Separator? Spring 1
Whch Separator? 6.034 - Sprng 1 Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng Whch Separator? Mamze the margn to closest ponts 6.034 - Sprng 3 Margn of a pont " # y (w $ + b) proportonal
More informationSpeech and Language Processing
Speech and Language rocessng Lecture 3 ayesan network and ayesan nference Informaton and ommuncatons Engneerng ourse Takahro Shnozak 08//5 Lecture lan (Shnozak s part) I gves the frst 6 lectures about
More informationCS 3710: Visual Recognition Classification and Detection. Adriana Kovashka Department of Computer Science January 13, 2015
CS 3710: Vsual Recognton Classfcaton and Detecton Adrana Kovashka Department of Computer Scence January 13, 2015 Plan for Today Vsual recognton bascs part 2: Classfcaton and detecton Adrana s research
More informationChapter 13: Multiple Regression
Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationSingular Value Decomposition: Theory and Applications
Sngular Value Decomposton: Theory and Applcatons Danel Khashab Sprng 2015 Last Update: March 2, 2015 1 Introducton A = UDV where columns of U and V are orthonormal and matrx D s dagonal wth postve real
More informationChapter 12 Analysis of Covariance
Chapter Analyss of Covarance Any scentfc experment s performed to know somethng that s unknown about a group of treatments and to test certan hypothess about the correspondng treatment effect When varablty
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationChapter 11: Simple Linear Regression and Correlation
Chapter 11: Smple Lnear Regresson and Correlaton 11-1 Emprcal Models 11-2 Smple Lnear Regresson 11-3 Propertes of the Least Squares Estmators 11-4 Hypothess Test n Smple Lnear Regresson 11-4.1 Use of t-tests
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationMachine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationx i1 =1 for all i (the constant ).
Chapter 5 The Multple Regresson Model Consder an economc model where the dependent varable s a functon of K explanatory varables. The economc model has the form: y = f ( x,x,..., ) xk Approxmate ths by
More informationWhy Bayesian? 3. Bayes and Normal Models. State of nature: class. Decision rule. Rev. Thomas Bayes ( ) Bayes Theorem (yes, the famous one)
Why Bayesan? 3. Bayes and Normal Models Alex M. Martnez alex@ece.osu.edu Handouts Handoutsfor forece ECE874 874Sp Sp007 If all our research (n PR was to dsappear and you could only save one theory, whch
More information15-381: Artificial Intelligence. Regression and cross validation
15-381: Artfcal Intellgence Regresson and cross valdaton Where e are Inputs Densty Estmator Probablty Inputs Classfer Predct category Inputs Regressor Predct real no. Today Lnear regresson Gven an nput
More informationFeature Selection: Part 1
CSE 546: Machne Learnng Lecture 5 Feature Selecton: Part 1 Instructor: Sham Kakade 1 Regresson n the hgh dmensonal settng How do we learn when the number of features d s greater than the sample sze n?
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationNegative Binomial Regression
STATGRAPHICS Rev. 9/16/2013 Negatve Bnomal Regresson Summary... 1 Data Input... 3 Statstcal Model... 3 Analyss Summary... 4 Analyss Optons... 7 Plot of Ftted Model... 8 Observed Versus Predcted... 10 Predctons...
More informationRegularized Discriminant Analysis for Face Recognition
1 Regularzed Dscrmnant Analyss for Face Recognton Itz Pma, Mayer Aladem Department of Electrcal and Computer Engneerng, Ben-Guron Unversty of the Negev P.O.Box 653, Beer-Sheva, 845, Israel. Abstract Ths
More informationAutomatic Object Trajectory- Based Motion Recognition Using Gaussian Mixture Models
Automatc Object Trajectory- Based Moton Recognton Usng Gaussan Mxture Models Fasal I. Bashr, Ashfaq A. Khokhar, Dan Schonfeld Electrcal and Computer Engneerng, Unversty of Illnos at Chcago. Chcago, IL,
More informatione i is a random error
Chapter - The Smple Lnear Regresson Model The lnear regresson equaton s: where + β + β e for,..., and are observable varables e s a random error How can an estmaton rule be constructed for the unknown
More informationChat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall, 1980
MT07: Multvarate Statstcal Methods Mke Tso: emal mke.tso@manchester.ac.uk Webpage for notes: http://www.maths.manchester.ac.uk/~mkt/new_teachng.htm. Introducton to multvarate data. Books Chat eld, C. and
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More information4.3 Poisson Regression
of teratvely reweghted least squares regressons (the IRLS algorthm). We do wthout gvng further detals, but nstead focus on the practcal applcaton. > glm(survval~log(weght)+age, famly="bnomal", data=baby)
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationβ0 + β1xi and want to estimate the unknown
SLR Models Estmaton Those OLS Estmates Estmators (e ante) v. estmates (e post) The Smple Lnear Regresson (SLR) Condtons -4 An Asde: The Populaton Regresson Functon B and B are Lnear Estmators (condtonal
More informationGenerative and Discriminative Models. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
Generatve and Dscrmnatve Models Je Tang Department o Computer Scence & Technolog Tsnghua Unverst 202 ML as Searchng Hpotheses Space ML Methodologes are ncreasngl statstcal Rule-based epert sstems beng
More informationLecture 6: Introduction to Linear Regression
Lecture 6: Introducton to Lnear Regresson An Manchakul amancha@jhsph.edu 24 Aprl 27 Lnear regresson: man dea Lnear regresson can be used to study an outcome as a lnear functon of a predctor Example: 6
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far So far Supervsed machne learnng Lnear models Non-lnear models Unsupervsed machne learnng Generc scaffoldng So far
More information2016 Wiley. Study Session 2: Ethical and Professional Standards Application
6 Wley Study Sesson : Ethcal and Professonal Standards Applcaton LESSON : CORRECTION ANALYSIS Readng 9: Correlaton and Regresson LOS 9a: Calculate and nterpret a sample covarance and a sample correlaton
More informationLecture 12: Classification
Lecture : Classfcaton g Dscrmnant functons g The optmal Bayes classfer g Quadratc classfers g Eucldean and Mahalanobs metrcs g K Nearest Neghbor Classfers Intellgent Sensor Systems Rcardo Guterrez-Osuna
More informationImage classification. Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing i them?
Image classfcaton Gven te bag-of-features representatons of mages from dfferent classes ow do we learn a model for dstngusng tem? Classfers Learn a decson rule assgnng bag-offeatures representatons of
More informationIntro to Visual Recognition
CS 2770: Computer Vson Intro to Vsual Recognton Prof. Adrana Kovashka Unversty of Pttsburgh February 13, 2018 Plan for today What s recognton? a.k.a. classfcaton, categorzaton Support vector machnes Separable
More informationClustering & Unsupervised Learning
Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationBasically, if you have a dummy dependent variable you will be estimating a probability.
ECON 497: Lecture Notes 13 Page 1 of 1 Metropoltan State Unversty ECON 497: Research and Forecastng Lecture Notes 13 Dummy Dependent Varable Technques Studenmund Chapter 13 Bascally, f you have a dummy
More informationChapter 7 Generalized and Weighted Least Squares Estimation. In this method, the deviation between the observed and expected values of
Chapter 7 Generalzed and Weghted Least Squares Estmaton The usual lnear regresson model assumes that all the random error components are dentcally and ndependently dstrbuted wth constant varance. When
More informationj) = 1 (note sigma notation) ii. Continuous random variable (e.g. Normal distribution) 1. density function: f ( x) 0 and f ( x) dx = 1
Random varables Measure of central tendences and varablty (means and varances) Jont densty functons and ndependence Measures of assocaton (covarance and correlaton) Interestng result Condtonal dstrbutons
More informationSupport Vector Machines
Support Vector Machnes Konstantn Tretyakov (kt@ut.ee) MTAT.03.227 Machne Learnng So far Supervsed machne learnng Lnear models Least squares regresson Fsher s dscrmnant, Perceptron, Logstc model Non-lnear
More informationHomework 9 STAT 530/J530 November 22 nd, 2005
Homework 9 STAT 530/J530 November 22 nd, 2005 Instructor: Bran Habng 1) Dstrbuton Q-Q plot Boxplot Heavy Taled Lght Taled Normal Skewed Rght Department of Statstcs LeConte 203 ch-square dstrbuton, Telephone:
More informationTHE ROYAL STATISTICAL SOCIETY 2006 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE
THE ROYAL STATISTICAL SOCIETY 6 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER I STATISTICAL THEORY The Socety provdes these solutons to assst canddates preparng for the eamnatons n future years and for
More informationTensor Subspace Analysis
Tensor Subspace Analyss Xaofe He 1 Deng Ca Partha Nyog 1 1 Department of Computer Scence, Unversty of Chcago {xaofe, nyog}@cs.uchcago.edu Department of Computer Scence, Unversty of Illnos at Urbana-Champagn
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationClassification. Representing data: Hypothesis (classifier) Lecture 2, September 14, Reading: Eric CMU,
Machne Learnng 10-701/15-781, 781, Fall 2011 Nonparametrc methods Erc Xng Lecture 2, September 14, 2011 Readng: 1 Classfcaton Representng data: Hypothess (classfer) 2 1 Clusterng 3 Supervsed vs. Unsupervsed
More informationFace Recognition CS 663
Face Recognton CS 663 Importance of face recognton The most common way for humans to recognze each other Study of the process of face recognton has applcatons n () securty/survellance/authentcaton, ()
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Experment-I MODULE VII LECTURE - 3 ANALYSIS OF COVARIANCE Dr Shalabh Department of Mathematcs and Statstcs Indan Insttute of Technology Kanpur Any scentfc experment s performed
More informationSupport Vector Machines CS434
Support Vector Machnes CS434 Lnear Separators Many lnear separators exst that perfectly classfy all tranng examples Whch of the lnear separators s the best? Intuton of Margn Consder ponts A, B, and C We
More informationHopfield networks and Boltzmann machines. Geoffrey Hinton et al. Presented by Tambet Matiisen
Hopfeld networks and Boltzmann machnes Geoffrey Hnton et al. Presented by Tambet Matsen 18.11.2014 Hopfeld network Bnary unts Symmetrcal connectons http://www.nnwj.de/hopfeld-net.html Energy functon The
More informationA Robust Method for Calculating the Correlation Coefficient
A Robust Method for Calculatng the Correlaton Coeffcent E.B. Nven and C. V. Deutsch Relatonshps between prmary and secondary data are frequently quantfed usng the correlaton coeffcent; however, the tradtonal
More informationPrimer on High-Order Moment Estimators
Prmer on Hgh-Order Moment Estmators Ton M. Whted July 2007 The Errors-n-Varables Model We wll start wth the classcal EIV for one msmeasured regressor. The general case s n Erckson and Whted Econometrc
More informationThe Ordinary Least Squares (OLS) Estimator
The Ordnary Least Squares (OLS) Estmator 1 Regresson Analyss Regresson Analyss: a statstcal technque for nvestgatng and modelng the relatonshp between varables. Applcatons: Engneerng, the physcal and chemcal
More informationChapter 9: Statistical Inference and the Relationship between Two Variables
Chapter 9: Statstcal Inference and the Relatonshp between Two Varables Key Words The Regresson Model The Sample Regresson Equaton The Pearson Correlaton Coeffcent Learnng Outcomes After studyng ths chapter,
More informationC4B Machine Learning Answers II. = σ(z) (1 σ(z)) 1 1 e z. e z = σ(1 σ) (1 + e z )
C4B Machne Learnng Answers II.(a) Show that for the logstc sgmod functon dσ(z) dz = σ(z) ( σ(z)) A. Zsserman, Hlary Term 20 Start from the defnton of σ(z) Note that Then σ(z) = σ = dσ(z) dz = + e z e z
More informationEcon Statistical Properties of the OLS estimator. Sanjaya DeSilva
Econ 39 - Statstcal Propertes of the OLS estmator Sanjaya DeSlva September, 008 1 Overvew Recall that the true regresson model s Y = β 0 + β 1 X + u (1) Applyng the OLS method to a sample of data, we estmate
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationBayesian Planning of Hit-Miss Inspection Tests
Bayesan Plannng of Ht-Mss Inspecton Tests Yew-Meng Koh a and Wllam Q Meeker a a Center for Nondestructve Evaluaton, Department of Statstcs, Iowa State Unversty, Ames, Iowa 5000 Abstract Although some useful
More informationA kernel method for canonical correlation analysis
A kernel method for canoncal correlaton analyss Shotaro Akaho AIST Neuroscence Research Insttute, Central 2, - Umezono, Tsukuba, Ibarak 3058568, Japan s.akaho@ast.go.jp http://staff.ast.go.jp/s.akaho/
More informationBoostrapaggregating (Bagging)
Boostrapaggregatng (Baggng) An ensemble meta-algorthm desgned to mprove the stablty and accuracy of machne learnng algorthms Can be used n both regresson and classfcaton Reduces varance and helps to avod
More informationNonlinear Classifiers II
Nonlnear Classfers II Nonlnear Classfers: Introducton Classfers Supervsed Classfers Lnear Classfers Perceptron Least Squares Methods Lnear Support Vector Machne Nonlnear Classfers Part I: Mult Layer Neural
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More information