Unsupervised Learning, Kmeans and Derivative algorithms. Virginia de Sa desa at cogsci
|
|
- Jody Shields
- 6 years ago
- Views:
Transcription
1 Unsupervised Learning, Kmeans and Derivative algorithms 1 Virginia de Sa desa at cogsci
2 Unsupervised Learning 2 No target data required Extract structure (density estimates, cluster memberships, or produce a reduced dimensional representation) from the data
3 Unsupervised algorithms are often forms of Hebbian Learning 3 Hebbian learning refers to modifying the strength of a connection according to a function of the input and output activity (often simply the product). It is based on a rule specified by the Canadian Donald Hebb in his 1949 book The Organization of Behavior When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased (Hebb 1949)(figure below from
4 Data Compression 4 We might want to compress data from high-dimensional spaces for several reasons: to enable us (and also machine learning algorithms) to better see relationships for more efficient storage and transmission of information (gzip, jpg) We want to do this while preserving as much as the useful information as possible. (Of course how useful is determined is critical). Clustering and PCA are different methods of dimensionality reduction.
5 PCA and Clustering 5 PCA represents a point using a fewer number of dimensions. The directions are the directions of greatest variance in the data Clustering represents a point using prototype points.
6 K-means 6 a simple but effective clustering algorithm partitions the data in to K disjoint sets (clusters) iterative batch algorithm Start with initial guess of k centers S (j) is all points closest to µ (j) Update µ (j) = 1/N j n S (j) x (n) until no change in the means
7 K-means 7
8 K-means 8
9 K-means 9
10 K-means 10
11 K-means 11
12 K-means 12
13 K-means 13
14 K-means 14
15 K-means 15
16 K-means 16
17 K-means 17 a simple but effective clustering algorithm partitions the data in to K disjoint sets (clusters) iterative batch algorithm Start with initial guess of k centers S (j) is all points closest to µ (j) Update µ (j) = 1/N j n S (j) x (n) until no change in the means
18 Stochastic K-means = Competitive Learning 18 Find weight w (j) that minimizes w (j) x (n) (weight closest to the pattern) and move it closer to the pattern w (j) = η(t)(x (n) w (j) ) decrease learning rate with time W x1 x2 x3 x4
19 Competitive Learning 19 "!#"$"! %'&#& (")+*#,"-*#./("021"3#45-76"893#("6 :7& *#-715:76"1E6"("<245:7� 89MON73#P7-7.3#89-7QO*#, > -T<2-71UI7B *#<[:]\^-7I7*#( <2_`(") *#,"-`.-73#C7,"* <289Q./,"3#I,a8Y*#:7<2*:7* *#,"-R<2-71a; : I7&dB"89*#-7< l7mxm:7<=*#q:76 1`j: prq7s#s#t7u2vrwsx#q7yyy9z#{" 7q7s#z#} v~ # 7 " `ˆ Š"Œ"Œ" DŽ+ L " D G # # 7 š7 " "œ9 "žd "Ÿ7
20 Competitive Learning 20
21 Competitive Learning 21
22 Competitive Learning 22
23 Competitive Learning 23
24 Competitive Learning 24
25 Competitive Learning 25
26 Competitive Learning 26
27 Competitive Learning 27
28 Competitive Learning 28
29 Competitive Learning 29
30 Competitive Learning 30
31 Competitive Learning 31
32 Competitive Learning 32
33 Competitive Learning 33
34 Competitive Learning 34
35 Competitive Learning 35
36 Competitive Learning 36
37 Kohonen Feature Mapping 37 Update the neighbours (in output topography) as well as the winner. If y refers to the winning output neuron then we update weights w (k) = η(t)λ( y (k) y, t)(x w (k) ) window function decreases with time
38 Kohonen Feature Mapping 38 : ;1<)=#>@?'AB!< CEDGFA!;1B<)FIHJ: A!K)<! #"!$%'&)(!*+$!(!,- *+!.!/102&3*54,!02& 89,!/16!&27*54,0)& L M NPORQTSVUXWXYZX[XY \^]+_a`bc#dxe#fgahxijihxfvklgmnbe#dxkpoqx_vrostdxceuxikl_hx]vidxhxg`wxuxi]+yv]+dxzxee{a_ ]+mxga{_ od}o~qx_rdxh7_c#uxikl_ahx]+idxhxg`~w `ihx_d7b oqx_poge#f_o ]+mxga{_{gah} X_`_age#hX_u}g] bdx``d7sr]vƒ )dxe3_ag{ q mxd7ihxo lˆ ŠŒX ŠŽa # Š ˆ X Š~ŒX # aˆ +Š Ž X E # + X X X 7ˆ X X Xˆ XŠ ˆ~ ŠŒX + 7šX E + XŽa ŠŒXŽŠ~ ˆ + a X + X œt Xš7 V Ža VŠ ž PŸ7 X l X + ª «aa ) X ª E ²± ~³X X±Ŕ ª V ~ Xµ ³X + 3 X X X ³X v X X Eªa ' ¹ º» ³X¼ l ~ X¼ ¹ X ªa½ž ž ~³X + X 7 Eª¼ + X ª»¾ ¼ª ³7 ¹ ³7 X EÀ# l a XºI ³X ¹ # aï + X ªa IÁxÏ ³7 + ~ +³X 7 t X±I ³X ¹ X # ªa X +a X +½ X X XÏ ~ ½X  lãtä ÅXÆaÇÈÉPÊlËXÌ+Í ÎÏÍ~È~Ð ÆÑ Ò!ÓXÆÕÔÆÎÖ#ÉXÈÉ7Ç ÖE XÔÆÕØÙÚXÑ ÛXÛXÜ7Ý ÊlÎÞÆÌßÈÍÌßÌ+ËX 7ÖEÏÆÕàXËXÈÉ7Í ÊlË7ÐaÆÕÍË7á ÎaÖ#âÍ~ÓXÆ Ì+ÆaÉXÌ+ÆâàXËXÈÉ7ÍãTÎÌÌvÓXË7áRÉÅIä¹Í~ÓXƹÌ+ÊlÎÔÔTÎaÖ#Ö#Ë7áßÑTå ÆaÏÎ XÌ+ƹËXæTÍÓ7ƹáRÈÉ7âXË7áçæ XÉXÏaÍÈËXÉéèêìë ítî!ï}ð ñ òvó ôõ7ör XøEöù#úûlüaýöRþXÿ7 XþXú Xô ü ìü äö Xô7ôþ! " #$%& '$&()$* '$ + #$$,$-(#$ '$.(/,$0#$132 43&,$0#5 "'$* '-6* /"()$ 5 ' 7 * 8 8 #$6 "9: '$$$ 76& '$& 8 #!#$ ;&$,$:0 ;&)$ 0 +<; )$$ $- * 0( 7;0 $" 0 < )=*! ("'$ $1<>?"$@<AB* 0# "=CD1<E,$ + F ( 8G 1IH " + '$JE * JK1IL $"M +ON P(Q Q R S"TJUWV P XX&Y Z$[ P Q Y \T]I^W_$` a b"c d e$f=g hji$k$k$lnmodprq$s$t uwv x y oz { q$t}&~$ t$
39 Kohonen Feature Mapping )%**"7)%'%5& Q#974.&5OSRUTWVXY[Z\]7Z^;_=`%]%Zba:c^%d;]7`%]%ZbZ\]7e^;_=`%]%a:Zbf]%X`%\Zbgch^%Z]7fC\Xe Xa:Z^%YZm`%]%Zna j ^%ee]%_ogch^%z]%pmqr_ kjbsnt Xd#\^;_=hvuipmw-gh^%VmxU]%Z]%_oy;pnz-^%_=Z(Vn^%Yh{w-^} %ƒƒ ; = B ˆ %ŠFŠ: Œ % %ƒ Ž 1 % = % ĩ šœ ž7ÿ1i B J %-ª «% : %±
40 Kohonen Feature Mapping 40! #"!$%&')($! +*,!--$! #%.,#$!/0$1%-!-2*%/3!#$')!4$&/ 2%"-5$/0$!-&,.*672"!89:8;<= $!A$!4'B!,C4$!D1$1*,/08EF5$G%&'B($!H*,!-I-$! #%,A$!/0$!%-!-I2*%/J2/K2/0-$!48L6 A*'BMNO2?75!#4P8QR&41 QR D!24)VR8W!-*AX!=Y Z![[I\!]#^B_a`Z1b0b0cde!Z![Icf^ghaijlk!m#no1pqGr sutvvw)xlykz {}~) ƒ!yr 1{~ 0ˆ ~Š!
41 Kohonen Feature Mapping 41! "$# %$& ' ( ) *+) ( &,! '+-. %$/0' 12! %$# /3'4. ' %$#. %65,71 ' 5 - #8'4. ' %$9 & %$) : ;<- ' %$&+ -1 ' 5 =< C = D E F< ' 5AG2& %$) :21 # ) ( -<#!<- G - ' %$& -<1 ' 5+) H7G # I(2'+- # 12- G 1 # ) ( -<) (2- G #. %J5 1 ' 5+- G ' -< ' : H/K' L ) /K N- G ' -O- ' %J& -O1 # ) ( - =O@P%$# /KQORS) 5AG ' %$:UT8=OV. : ' FXW< - %ZY =O[N' %$- FO'+( :\V'^] ) :U_=X` - # %$a+fcb^d e e f+g$h\i>j d+kkl m n d+ v$w x y z8{ ~} 0 u cƒ 0ˆŠ Œ N Ž
42 Kohonen Feature Mapping 42!"#$%! &' )(+*,-./ 01!'&2/ 3'!$456 "!-$ '5 :+3'!$4 0;#$%! &' )"<5=/ 08 A/ B!3 CDE=?F- 08 E-B:G6 $4/ $H$%! '.B& -0@ ")'!B/ I> >JK?086'5-/A5!080MLD"!$4 '.B G&O*,/P$%!- & )GQ&O*./ 0R! &P N "<!*,& -$S*,'& '*T:6 5'P! &P08"'*,$U& 5! <P" 5-/'!$4&ZYAJH[,6'&!LH\+-$^]J7_,!$4L7! &`[,! E&`a,JHG $4>L;b c-ddef4g`hsicj8j8kl'mcdkn go7psq rdst4uvw'xy z { } } ~` ƒ 8 -
43 More examples 43
44 Some SOM applets 44 applet from rfhs8012.fh-regensburg.de/ saj39122/jfroehl/diplom/e-sample.html applet from applet from
45 Let s look at visual cortex example 45 Obermayer1990.pdf
46 Neural Gas Learn the Topology 46 fritzke/fuzzypaper/node6.html
47 Aside to related supervised algorithms (Kohonen s Learning Vector Quantization) 47 Supervised methods for moving cluster centers (makes use of given class label) Can have more than one center per class. Move centers to reduce the number of misclassified patterns. Various flavours. LVQ2.1 minimizes number of misclassified patterns
48 LVQ2.1 Learning rule 48 Let w (i), and w (j) be the closest codebook vectors Only if exactly one of w (i) and w (j) belongs to the correct class and min( x w (i) / x w (j), x w (j) / x w (i) ) < s (x lies within a window of the border region) do the following (the below rules assume w (i) is from the correct class, switch the rules if not) w (i) = w (i) + ɛ(x w (i) ) w (j) = w (j) ɛ(x w (j) )
49 Improved LVQ2.1 Learning rule 49 Let w (i), and w (j) be the closest codebook vectors Only if exactly one of w (i) and w (j) belongs to the correct class and min( x w (i) / x w (j), x w (j) / x w (i) ) < s(t) (x lies within a window of the border region that decreases with time) do we apply the following (the below rules assume w (i) is from the correct class, switch the rules if not) w (i) = w (i) + ɛ (x w(i) ) x w (i) w (j) = w (j) ɛ (x w(j) ) x w (j)
50 LVQ2.1 in 2-D 50 w (i) = w (i) + ɛ (x w(i) ) x w (i) w (j) = w (j) ɛ (x w(j) ) x w (j) w (i) is from the correct class, w (j) from an incorrect class y y y y y a b c d e x 2 x x 1 2 x 1
51 LVQ in 1-D 51 P LVQ 2.1 P LVQ 2.0 P(C )p(x C ) A A P(C )p(x C ) A A P(C )p(x C ) B B P(C )p(x C ) B B Class A decision Class B decision x Class A decision Class B decision x Force to the left <-- Force to the right -->
52 LVQ in 1-D, Separable Distributions 52 P LVQ 2.1 P LVQ 2.0 P(C A )p(x C A ) P(C B )p(x C B ) P(C A )p(x C A ) P(C B )p(x C B ) Class A decision Class B decision x Class A decision Class B decision x Force to the left <-- Force to the right -->
53 Problem with K-means 53 What will happen here?
54 Solution 54 Model the clusters as Gaussian s and learn the covariance ellipses with the data and use probabilities associated with the Gaussian density to determine membership.
55 Mixture of Gaussians (MOG) = A softer k-means 55 Model the data as coming from a mixture of Gaussian s and you don t know which Gaussian generated which data point Each Gaussian cluster has an associated proportion or prior probability π k p(x) = c π k p k (x) k=1 In the mixture of Gaussian s case p k (x) N(µ (k), Σ k ) p k (x) = 1 2πΣ k.5e (x µ (k) ) T Σ 1 k (x µ(k) ) 2 mixture models can be generalized
56 MOG Solution 56 Normalize the probabilities to determine the responsibility of each cluster for each data point (soft-responsibility). r k (x (n) ) = π kp k (x (n) ) i π ip i (x (n) ) Now solve, similarly to k-means solution Recompute the mean, covariance and overall weighting, for each cluster with each datapoint contributing weight according to its responsibility. Then iterate as in k-means. µ (k) = n r k(x (n) )x (n) n r k(x (n) ) Σ k = π k = n n r k(x (n) )(x (n) µ (i) ) 2 N n r k(x (n) ) r k (x (n) )/ r i (x (n) ) i n
57 Issues with MOG 57 Quite sensitive to initial conditions applet it s a good idea to initialize with k-means There are a large number of parameters. We can reduce parameters by
58 Issues with MOG 57 Quite sensitive to initial conditions applet it s a good idea to initialize with k-means There are a large number of parameters. We can reduce parameters by a) constraining Gaussians to have diagonal covariance matrices b) constraining Gaussians to have the same covariance matrix
59 58 Note to other teachers and users of these sl Andrew would be delighted if you found this material useful in giving your own lectures. to use these slides verbatim, or to modify th your own needs. PowerPoint originals are av you make use of a significant portion of thes your own lecture, please include this messa following link to the source repository of And tutorials: awm/tu Comments and corrections gratefully receive
60 59 After first iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 41
61 60 After 2nd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 42
62 61 After 3rd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 43
63 62 After 4th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 44
64 63 After 5th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 45
65 64 After 6th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 46
66 65 After 20th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 47
Learning Vector Quantization
Learning Vector Quantization Neural Computation : Lecture 18 John A. Bullinaria, 2015 1. SOM Architecture and Algorithm 2. Vector Quantization 3. The Encoder-Decoder Model 4. Generalized Lloyd Algorithms
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationLearning Vector Quantization (LVQ)
Learning Vector Quantization (LVQ) Introduction to Neural Computation : Guest Lecture 2 John A. Bullinaria, 2007 1. The SOM Architecture and Algorithm 2. What is Vector Quantization? 3. The Encoder-Decoder
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationUnsupervised Learning: K-Means, Gaussian Mixture Models
Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online.
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationIntroduction to Neural Networks
Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning
More informationUnsupervised Learning: K- Means & PCA
Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationCSE446: Clustering and EM Spring 2017
CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled
More informationAn Example file... log.txt
# ' ' Start of fie & %$ " 1 - : 5? ;., B - ( * * B - ( * * F I / 0. )- +, * ( ) 8 8 7 /. 6 )- +, 5 5 3 2( 7 7 +, 6 6 9( 3 5( ) 7-0 +, => - +< ( ) )- +, 7 / +, 5 9 (. 6 )- 0 * D>. C )- +, (A :, C 0 )- +,
More informationVC-dimension for characterizing classifiers
VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to
More informationESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,
Relevance determination in learning vector quantization Thorsten Bojer, Barbara Hammer, Daniel Schunk, and Katharina Tluk von Toschanowitz University of Osnabrück, Department of Mathematics/ Computer Science,
More informationGaussian Mixture Models
Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some
More informationData Preprocessing. Cluster Similarity
1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M
More informationClustering, K-Means, EM Tutorial
Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means
More informationVectors. Teaching Learning Point. Ç, where OP. l m n
Vectors 9 Teaching Learning Point l A quantity that has magnitude as well as direction is called is called a vector. l A directed line segment represents a vector and is denoted y AB Å or a Æ. l Position
More informationArtificial Neural Networks Examination, March 2004
Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum
More informationExpectation Maximization Algorithm
Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters
More informationLecture 11: Unsupervised Machine Learning
CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:
More informationAnalysis of Interest Rate Curves Clustering Using Self-Organising Maps
Analysis of Interest Rate Curves Clustering Using Self-Organising Maps M. Kanevski (1), V. Timonin (1), A. Pozdnoukhov(1), M. Maignan (1,2) (1) Institute of Geomatics and Analysis of Risk (IGAR), University
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationRedoing the Foundations of Decision Theory
Redoing the Foundations of Decision Theory Joe Halpern Cornell University Joint work with Larry Blume and David Easley Economics Cornell Redoing the Foundations of Decision Theory p. 1/21 Decision Making:
More informationVariables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)
CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationBut if z is conditioned on, we need to model it:
Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or
More informationExpectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationGaussian Mixture Models, Expectation Maximization
Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More informationGeneral Neoclassical Closure Theory: Diagonalizing the Drift Kinetic Operator
General Neoclassical Closure Theory: Diagonalizing the Drift Kinetic Operator E. D. Held eheld@cc.usu.edu Utah State University General Neoclassical Closure Theory:Diagonalizing the Drift Kinetic Operator
More informationMixtures of Gaussians continued
Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others
More informationRegression with Numerical Optimization. Logistic
CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationMixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate
Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means
More information! " # $! % & '! , ) ( + - (. ) ( ) * + / 0 1 2 3 0 / 4 5 / 6 0 ; 8 7 < = 7 > 8 7 8 9 : Œ Š ž P P h ˆ Š ˆ Œ ˆ Š ˆ Ž Ž Ý Ü Ý Ü Ý Ž Ý ê ç è ± ¹ ¼ ¹ ä ± ¹ w ç ¹ è ¼ è Œ ¹ ± ¹ è ¹ è ä ç w ¹ ã ¼ ¹ ä ¹ ¼ ¹ ±
More informationK-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1
EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess
More informationUnsupervised Learning
2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and
More informationSolution. Daozheng Chen. Challenge 1
Solution Daozheng Chen 1 For all the scatter plots and 2D histogram plots within this solution, the x axis is for the saturation component, and the y axis is the value component. Through out the solution,
More informationCross-validation for detecting and preventing overfitting
Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if
More informationHebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning
PCA by neurons Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning Learning takes place in synapses. Synapses get modified, they get stronger when the pre- and
More informationECE 5984: Introduction to Machine Learning
ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:
More informationSample Exam COMP 9444 NEURAL NETWORKS Solutions
FAMILY NAME OTHER NAMES STUDENT ID SIGNATURE Sample Exam COMP 9444 NEURAL NETWORKS Solutions (1) TIME ALLOWED 3 HOURS (2) TOTAL NUMBER OF QUESTIONS 12 (3) STUDENTS SHOULD ANSWER ALL QUESTIONS (4) QUESTIONS
More informationClustering and Gaussian Mixtures
Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationLatent Variable Models and EM Algorithm
SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationBayesian Networks Structure Learning (cont.)
Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic
More informationFramework for functional tree simulation applied to 'golden delicious' apple trees
Purdue University Purdue e-pubs Open Access Theses Theses and Dissertations Spring 2015 Framework for functional tree simulation applied to 'golden delicious' apple trees Marek Fiser Purdue University
More informationComputer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization
Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization
More informationMachine Learning and Adaptive Systems. Lectures 5 & 6
ECE656- Lectures 5 & 6, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 c. Performance Learning-LMS Algorithm (Widrow 1960) The iterative procedure in steepest
More informationMaster Recherche IAC TC2: Apprentissage Statistique & Optimisation
Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History
More informationExpectation Maximization
Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger
More information6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008
MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
More informationCOMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017
COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationInformatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries
Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability
More informationLatent Variable Models and Expectation Maximization
Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15
More informationNeural Networks Lecture 7: Self Organizing Maps
Neural Networks Lecture 7: Self Organizing Maps H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural
More informationK-Means and Gaussian Mixture Models
K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David
More informationSynaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:
Synaptic Plasticity Introduction Dayan and Abbott (2001) Chapter 8 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Activity-dependent synaptic plasticity: underlies learning and memory, and plays
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:
More informationLA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce
> ƒ? @ Z [ \ _ ' µ `. l 1 2 3 z Æ Ñ 6 = Ð l sl (~131 1606) rn % & +, l r s s, r 7 nr ss r r s s s, r s, r! " # $ s s ( ) r * s, / 0 s, r 4 r r 9;: < 10 r mnz, rz, r ns, 1 s ; j;k ns, q r s { } ~ l r mnz,
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More information1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.
Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationNeural Networks Lecture 2:Single Layer Classifiers
Neural Networks Lecture 2:Single Layer Classifiers H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural
More informationA Study on the Analysis of Measurement Errors of Specific Gravity Meter
HWAHAK KONGHAK Vol. 40, No. 6, December, 2002, pp. 676-680 (2001 7 2, 2002 8 5 ) A Study on the Analysis of Measurement Errors of Specific Gravity Meter Kang-Jin Lee, Jae-Young Her, Young-Cheol Ha, Seung-Hee
More informationManifold Regularization
Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1
More informationAN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS
AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS First the X, then the AR, finally the MA Jan C. Willems, K.U. Leuven Workshop on Observation and Estimation Ben Gurion University, July 3, 2004 p./2 Joint
More informationArtificial Neural Network : Training
Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural
More information" #$ P UTS W U X [ZY \ Z _ `a \ dfe ih j mlk n p q sr t u s q e ps s t x q s y i_z { U U z W } y ~ y x t i e l US T { d ƒ ƒ ƒ j s q e uˆ ps i ˆ p q y
" #$ +. 0. + 4 6 4 : + 4 ; 6 4 < = =@ = = =@ = =@ " #$ P UTS W U X [ZY \ Z _ `a \ dfe ih j mlk n p q sr t u s q e ps s t x q s y i_z { U U z W } y ~ y x t i e l US T { d ƒ ƒ ƒ j s q e uˆ ps i ˆ p q y h
More informationTechniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods
Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent
More informationClustering VS Classification
MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:
More informationHopfield Network Recurrent Netorks
Hopfield Network Recurrent Netorks w 2 w n E.P. P.E. y (k) Auto-Associative Memory: Given an initial n bit pattern returns the closest stored (associated) pattern. No P.E. self-feedback! w 2 w n2 E.P.2
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning
More informationFactor Analysis (10/2/13)
STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.
More informationNeural networks and support vector machines
Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith
More informationClustering. CSL465/603 - Fall 2016 Narayanan C Krishnan
Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification
More informationCS 4700: Foundations of Artificial Intelligence
CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons
More informationLecture 6. Regression
Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron
More informationFASTGEO - A HISTOGRAM BASED APPROACH TO LINEAR GEOMETRIC ICA. Andreas Jung, Fabian J. Theis, Carlos G. Puntonet, Elmar W. Lang
FASTGEO - A HISTOGRAM BASED APPROACH TO LINEAR GEOMETRIC ICA Andreas Jung, Fabian J Theis, Carlos G Puntonet, Elmar W Lang Institute for Theoretical Physics, University of Regensburg Institute of Biophysics,
More informationQUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS
International Workshop Quarkonium Working Group QUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS ALBERTO POLLERI TU München and ECT* Trento CERN - November 2002 Outline What do we know for sure?
More informationClassic K -means clustering. Classic K -means example (K = 2) Finding the optimal w k. Finding the optimal s n J =
Review of classic (GOF K -means clustering x 2 Fall 2015 x 1 Lecture 8, February 24, 2015 K-means is traditionally a clustering algorithm. Learning: Fit K prototypes w k (the rows of some matrix, W to
More informationPlanning for Reactive Behaviors in Hide and Seek
University of Pennsylvania ScholarlyCommons Center for Human Modeling and Simulation Department of Computer & Information Science May 1995 Planning for Reactive Behaviors in Hide and Seek Michael B. Moore
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationALTER TABLE Employee ADD ( Mname VARCHAR2(20), Birthday DATE );
!! "# $ % & '( ) # * +, - # $ "# $ % & '( ) # *.! / 0 "# "1 "& # 2 3 & 4 4 "# $ % & '( ) # *!! "# $ % & # * 1 3 - "# 1 * #! ) & 3 / 5 6 7 8 9 ALTER ; ? @ A B C D E F G H I A = @ A J > K L ; ?
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationCOMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017
COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS
More informationMachine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.
Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning
More informationBrief Introduction of Machine Learning Techniques for Content Analysis
1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview
More informationInformation Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.
te to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary
More information