Unsupervised Learning, Kmeans and Derivative algorithms. Virginia de Sa desa at cogsci

Size: px
Start display at page:

Download "Unsupervised Learning, Kmeans and Derivative algorithms. Virginia de Sa desa at cogsci"

Transcription

1 Unsupervised Learning, Kmeans and Derivative algorithms 1 Virginia de Sa desa at cogsci

2 Unsupervised Learning 2 No target data required Extract structure (density estimates, cluster memberships, or produce a reduced dimensional representation) from the data

3 Unsupervised algorithms are often forms of Hebbian Learning 3 Hebbian learning refers to modifying the strength of a connection according to a function of the input and output activity (often simply the product). It is based on a rule specified by the Canadian Donald Hebb in his 1949 book The Organization of Behavior When an axon of cell A is near enough to excite a cell B and repeatedly or persistently takes part in firing it, some growth process or metabolic change takes place in one or both cells such that A s efficiency, as one of the cells firing B, is increased (Hebb 1949)(figure below from

4 Data Compression 4 We might want to compress data from high-dimensional spaces for several reasons: to enable us (and also machine learning algorithms) to better see relationships for more efficient storage and transmission of information (gzip, jpg) We want to do this while preserving as much as the useful information as possible. (Of course how useful is determined is critical). Clustering and PCA are different methods of dimensionality reduction.

5 PCA and Clustering 5 PCA represents a point using a fewer number of dimensions. The directions are the directions of greatest variance in the data Clustering represents a point using prototype points.

6 K-means 6 a simple but effective clustering algorithm partitions the data in to K disjoint sets (clusters) iterative batch algorithm Start with initial guess of k centers S (j) is all points closest to µ (j) Update µ (j) = 1/N j n S (j) x (n) until no change in the means

7 K-means 7

8 K-means 8

9 K-means 9

10 K-means 10

11 K-means 11

12 K-means 12

13 K-means 13

14 K-means 14

15 K-means 15

16 K-means 16

17 K-means 17 a simple but effective clustering algorithm partitions the data in to K disjoint sets (clusters) iterative batch algorithm Start with initial guess of k centers S (j) is all points closest to µ (j) Update µ (j) = 1/N j n S (j) x (n) until no change in the means

18 Stochastic K-means = Competitive Learning 18 Find weight w (j) that minimizes w (j) x (n) (weight closest to the pattern) and move it closer to the pattern w (j) = η(t)(x (n) w (j) ) decrease learning rate with time W x1 x2 x3 x4

19 Competitive Learning 19 "!#"$"! %'&#& (")+*#,"-*#./("021"3#45-76"893#("6 :7& *#-715:76"1E6"("<245:7&#0 89MON73#P7-7.3#89-7QO*#, > -T<2-71UI7B *#<[:]\^-7I7*#( <2_`(") *#,"-`.-73#C7,"* <289Q./,"3#I,a8Y*#:7<2*:7* *#,"-R<2-71a; : I7&dB"89*#-7< l7mxm:7<=*#q:76 1`j: prq7s#s#t7u2vrwsx#q7yyy9z#{" 7q7s#z#} v~ # 7 " `ˆ Š"Œ"Œ" DŽ+ L " D G # # 7 š7 " "œ9 "žd "Ÿ7

20 Competitive Learning 20

21 Competitive Learning 21

22 Competitive Learning 22

23 Competitive Learning 23

24 Competitive Learning 24

25 Competitive Learning 25

26 Competitive Learning 26

27 Competitive Learning 27

28 Competitive Learning 28

29 Competitive Learning 29

30 Competitive Learning 30

31 Competitive Learning 31

32 Competitive Learning 32

33 Competitive Learning 33

34 Competitive Learning 34

35 Competitive Learning 35

36 Competitive Learning 36

37 Kohonen Feature Mapping 37 Update the neighbours (in output topography) as well as the winner. If y refers to the winning output neuron then we update weights w (k) = η(t)λ( y (k) y, t)(x w (k) ) window function decreases with time

38 Kohonen Feature Mapping 38 : ;1<)=#>@?'AB!< CEDGFA!;1B<)FIHJ: A!K)<! #"!$%'&)(!*+$!(!,- *+!.!/102&3*54,!02& 89,!/16!&27*54,0)& L M NPORQTSVUXWXYZX[XY \^]+_a`bc#dxe#fgahxijihxfvklgmnbe#dxkpoqx_vrostdxceuxikl_hx]vidxhxg`wxuxi]+yv]+dxzxee{a_ ]+mxga{_ od}o~qx_rdxh7_c#uxikl_ahx]+idxhxg`~w `ihx_d7b oqx_poge#f_o ]+mxga{_{gah} X_`_age#hX_u}g] bdx``d7sr]vƒ )dxe3_ag{ q mxd7ihxo lˆ ŠŒX ŠŽa # Š ˆ X Š~ŒX # aˆ +Š Ž X E # + X X X 7ˆ X X Xˆ XŠ ˆ~ ŠŒX + 7šX E + XŽa ŠŒXŽŠ~ ˆ + a X + X œt Xš7 V Ža VŠ ž PŸ7 X l X + ª «aa ) X ª E ²± ~³X X±Ŕ ª V ~ Xµ ³X + 3 X X X ³X v X X Eªa ' ¹ º» ³X¼ l ~ X¼ ¹ X ªa½ž ž ~³X + X 7 Eª¼ + X ª»¾ ¼ª ³7 ¹ ³7 X EÀ# l a XºI ³X ¹ # aï + X ªa IÁxÏ ³7 + ~ +³X 7 t X±I ³X ¹ X # ªa X +a X +½ X X XÏ ~ ½X  lãtä ÅXÆaÇÈÉPÊlËXÌ+Í ÎÏÍ~È~Ð ÆÑ Ò!ÓXÆÕÔÆÎÖ#ÉXÈÉ7Ç ÖE XÔÆÕØÙÚXÑ ÛXÛXÜ7Ý ÊlÎÞÆÌßÈÍÌßÌ+ËX 7ÖEÏÆÕàXËXÈÉ7Í ÊlË7ÐaÆÕÍË7á ÎaÖ#âÍ~ÓXÆ Ì+ÆaÉXÌ+ÆâàXËXÈÉ7ÍãTÎÌÌvÓXË7áRÉÅIä¹Í~ÓXƹÌ+ÊlÎÔÔTÎaÖ#Ö#Ë7áßÑTå ÆaÏÎ XÌ+ƹËXæTÍÓ7ƹáRÈÉ7âXË7áçæ XÉXÏaÍÈËXÉéèêìë ítî!ï}ð ñ òvó ôõ7ör XøEöù#úûlüaýöRþXÿ7 XþXú Xô ü ìü äö Xô7ôþ! " #$%& '$&()$* '$ + #$$,$-(#$ '$.(/,$0#$132 43&,$0#5 "'$* '-6* /"()$ 5 ' 7 * 8 8 #$6 "9: '$$$ 76& '$& 8 #!#$ ;&$,$:0 ;&)$ 0 +<; )$$ $- * 0( 7;0 $" 0 < )=*! ("'$ $1<>?"$@<AB* 0# "=CD1<E,$ + F ( 8G 1IH " + '$JE * JK1IL $"M +ON P(Q Q R S"TJUWV P XX&Y Z$[ P Q Y \T]I^W_$` a b"c d e$f=g hji$k$k$lnmodprq$s$t uwv x y oz { q$t}&~$ t$

39 Kohonen Feature Mapping )%**"7)%'%5& Q#974.&5OSRUTWVXY[Z\]7Z^;_=`%]%Zba:c^%d;]7`%]%ZbZ\]7e^;_=`%]%a:Zbf]%X`%\Zbgch^%Z]7fC\Xe Xa:Z^%YZm`%]%Zna j ^%ee]%_ogch^%z]%pmqr_ kjbsnt Xd#\^;_=hvuipmw-gh^%VmxU]%Z]%_oy;pnz-^%_=Z(Vn^%Yh{w-^} %ƒƒ ; = B ˆ %ŠFŠ: Œ % %ƒ Ž 1 % = % ĩ šœ ž7ÿ1i B J %-ª «% : %±

40 Kohonen Feature Mapping 40! #"!$%&')($! +*,!--$! #%.,#$!/0$1%-!-2*%/3!#$')!4$&/ 2%"-5$/0$!-&,.*672"!89:8;<= $!A$!4'B!,C4$!D1$1*,/08EF5$G%&'B($!H*,!-I-$! #%,A$!/0$!%-!-I2*%/J2/K2/0-$!48L6 A*'BMNO2?75!#4P8QR&41 QR D!24)VR8W!-*AX!=Y Z![[I\!]#^B_a`Z1b0b0cde!Z![Icf^ghaijlk!m#no1pqGr sutvvw)xlykz {}~) ƒ!yr 1{~ 0ˆ ~Š!

41 Kohonen Feature Mapping 41! "$# %$& ' ( ) *+) ( &,! '+-. %$/0' 12! %$# /3'4. ' %$#. %65,71 ' 5 - #8'4. ' %$9 & %$) : ;<- ' %$&+ -1 ' 5 =< C = D E F< ' 5AG2& %$) :21 # ) ( -<#!<- G - ' %$& -<1 ' 5+) H7G # I(2'+- # 12- G 1 # ) ( -<) (2- G #. %J5 1 ' 5+- G ' -< ' : H/K' L ) /K N- G ' -O- ' %J& -O1 # ) ( - =O@P%$# /KQORS) 5AG ' %$:UT8=OV. : ' FXW< - %ZY =O[N' %$- FO'+( :\V'^] ) :U_=X` - # %$a+fcb^d e e f+g$h\i>j d+kkl m n d+ v$w x y z8{ ~} 0 u cƒ 0ˆŠ Œ N Ž

42 Kohonen Feature Mapping 42!"#$%! &' )(+*,-./ 01!'&2/ 3'!$456 "!-$ '5 :+3'!$4 0;#$%! &' )"<5=/ 08 A/ B!3 CDE=?F- 08 E-B:G6 $4/ $H$%! '.B& -0@ ")'!B/ I> >JK?086'5-/A5!080MLD"!$4 '.B G&O*,/P$%!- & )GQ&O*./ 0R! &P N "<!*,& -$S*,'& '*T:6 5'P! &P08"'*,$U& 5! <P" 5-/'!$4&ZYAJH[,6'&!LH\+-$^]J7_,!$4L7! &`[,! E&`a,JHG $4>L;b c-ddef4g`hsicj8j8kl'mcdkn go7psq rdst4uvw'xy z { } } ~` ƒ 8 -

43 More examples 43

44 Some SOM applets 44 applet from rfhs8012.fh-regensburg.de/ saj39122/jfroehl/diplom/e-sample.html applet from applet from

45 Let s look at visual cortex example 45 Obermayer1990.pdf

46 Neural Gas Learn the Topology 46 fritzke/fuzzypaper/node6.html

47 Aside to related supervised algorithms (Kohonen s Learning Vector Quantization) 47 Supervised methods for moving cluster centers (makes use of given class label) Can have more than one center per class. Move centers to reduce the number of misclassified patterns. Various flavours. LVQ2.1 minimizes number of misclassified patterns

48 LVQ2.1 Learning rule 48 Let w (i), and w (j) be the closest codebook vectors Only if exactly one of w (i) and w (j) belongs to the correct class and min( x w (i) / x w (j), x w (j) / x w (i) ) < s (x lies within a window of the border region) do the following (the below rules assume w (i) is from the correct class, switch the rules if not) w (i) = w (i) + ɛ(x w (i) ) w (j) = w (j) ɛ(x w (j) )

49 Improved LVQ2.1 Learning rule 49 Let w (i), and w (j) be the closest codebook vectors Only if exactly one of w (i) and w (j) belongs to the correct class and min( x w (i) / x w (j), x w (j) / x w (i) ) < s(t) (x lies within a window of the border region that decreases with time) do we apply the following (the below rules assume w (i) is from the correct class, switch the rules if not) w (i) = w (i) + ɛ (x w(i) ) x w (i) w (j) = w (j) ɛ (x w(j) ) x w (j)

50 LVQ2.1 in 2-D 50 w (i) = w (i) + ɛ (x w(i) ) x w (i) w (j) = w (j) ɛ (x w(j) ) x w (j) w (i) is from the correct class, w (j) from an incorrect class y y y y y a b c d e x 2 x x 1 2 x 1

51 LVQ in 1-D 51 P LVQ 2.1 P LVQ 2.0 P(C )p(x C ) A A P(C )p(x C ) A A P(C )p(x C ) B B P(C )p(x C ) B B Class A decision Class B decision x Class A decision Class B decision x Force to the left <-- Force to the right -->

52 LVQ in 1-D, Separable Distributions 52 P LVQ 2.1 P LVQ 2.0 P(C A )p(x C A ) P(C B )p(x C B ) P(C A )p(x C A ) P(C B )p(x C B ) Class A decision Class B decision x Class A decision Class B decision x Force to the left <-- Force to the right -->

53 Problem with K-means 53 What will happen here?

54 Solution 54 Model the clusters as Gaussian s and learn the covariance ellipses with the data and use probabilities associated with the Gaussian density to determine membership.

55 Mixture of Gaussians (MOG) = A softer k-means 55 Model the data as coming from a mixture of Gaussian s and you don t know which Gaussian generated which data point Each Gaussian cluster has an associated proportion or prior probability π k p(x) = c π k p k (x) k=1 In the mixture of Gaussian s case p k (x) N(µ (k), Σ k ) p k (x) = 1 2πΣ k.5e (x µ (k) ) T Σ 1 k (x µ(k) ) 2 mixture models can be generalized

56 MOG Solution 56 Normalize the probabilities to determine the responsibility of each cluster for each data point (soft-responsibility). r k (x (n) ) = π kp k (x (n) ) i π ip i (x (n) ) Now solve, similarly to k-means solution Recompute the mean, covariance and overall weighting, for each cluster with each datapoint contributing weight according to its responsibility. Then iterate as in k-means. µ (k) = n r k(x (n) )x (n) n r k(x (n) ) Σ k = π k = n n r k(x (n) )(x (n) µ (i) ) 2 N n r k(x (n) ) r k (x (n) )/ r i (x (n) ) i n

57 Issues with MOG 57 Quite sensitive to initial conditions applet it s a good idea to initialize with k-means There are a large number of parameters. We can reduce parameters by

58 Issues with MOG 57 Quite sensitive to initial conditions applet it s a good idea to initialize with k-means There are a large number of parameters. We can reduce parameters by a) constraining Gaussians to have diagonal covariance matrices b) constraining Gaussians to have the same covariance matrix

59 58 Note to other teachers and users of these sl Andrew would be delighted if you found this material useful in giving your own lectures. to use these slides verbatim, or to modify th your own needs. PowerPoint originals are av you make use of a significant portion of thes your own lecture, please include this messa following link to the source repository of And tutorials: awm/tu Comments and corrections gratefully receive

60 59 After first iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 41

61 60 After 2nd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 42

62 61 After 3rd iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 43

63 62 After 4th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 44

64 63 After 5th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 45

65 64 After 6th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 46

66 65 After 20th iteration Copyright 2001, 2004, Andrew W. Moore Clustering with Gaussian Mixtures: Slide 47

Learning Vector Quantization

Learning Vector Quantization Learning Vector Quantization Neural Computation : Lecture 18 John A. Bullinaria, 2015 1. SOM Architecture and Algorithm 2. Vector Quantization 3. The Encoder-Decoder Model 4. Generalized Lloyd Algorithms

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Learning Vector Quantization (LVQ)

Learning Vector Quantization (LVQ) Learning Vector Quantization (LVQ) Introduction to Neural Computation : Guest Lecture 2 John A. Bullinaria, 2007 1. The SOM Architecture and Algorithm 2. What is Vector Quantization? 3. The Encoder-Decoder

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Unsupervised Learning: K-Means, Gaussian Mixture Models

Unsupervised Learning: K-Means, Gaussian Mixture Models Unsupervised Learning: K-Means, Gaussian Mixture Models These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online.

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks What are (Artificial) Neural Networks? Models of the brain and nervous system Highly parallel Process information much more like the brain than a serial computer Learning

More information

Unsupervised Learning: K- Means & PCA

Unsupervised Learning: K- Means & PCA Unsupervised Learning: K- Means & PCA Unsupervised Learning Supervised learning used labeled data pairs (x, y) to learn a func>on f : X Y But, what if we don t have labels? No labels = unsupervised learning

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

VC-dimension for characterizing classifiers

VC-dimension for characterizing classifiers VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to

More information

CSE446: Clustering and EM Spring 2017

CSE446: Clustering and EM Spring 2017 CSE446: Clustering and EM Spring 2017 Ali Farhadi Slides adapted from Carlos Guestrin, Dan Klein, and Luke Zettlemoyer Clustering systems: Unsupervised learning Clustering Detect patterns in unlabeled

More information

An Example file... log.txt

An Example file... log.txt # ' ' Start of fie & %$ " 1 - : 5? ;., B - ( * * B - ( * * F I / 0. )- +, * ( ) 8 8 7 /. 6 )- +, 5 5 3 2( 7 7 +, 6 6 9( 3 5( ) 7-0 +, => - +< ( ) )- +, 7 / +, 5 9 (. 6 )- 0 * D>. C )- +, (A :, C 0 )- +,

More information

VC-dimension for characterizing classifiers

VC-dimension for characterizing classifiers VC-dimension for characterizing classifiers Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to

More information

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN ,

ESANN'2001 proceedings - European Symposium on Artificial Neural Networks Bruges (Belgium), April 2001, D-Facto public., ISBN , Relevance determination in learning vector quantization Thorsten Bojer, Barbara Hammer, Daniel Schunk, and Katharina Tluk von Toschanowitz University of Osnabrück, Department of Mathematics/ Computer Science,

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 Some slides courtesy of Eric Xing, Carlos Guestrin (One) bad case for K- means Clusters may overlap Some

More information

Data Preprocessing. Cluster Similarity

Data Preprocessing. Cluster Similarity 1 Cluster Similarity Similarity is most often measured with the help of a distance function. The smaller the distance, the more similar the data objects (points). A function d: M M R is a distance on M

More information

Clustering, K-Means, EM Tutorial

Clustering, K-Means, EM Tutorial Clustering, K-Means, EM Tutorial Kamyar Ghasemipour Parts taken from Shikhar Sharma, Wenjie Luo, and Boris Ivanovic s tutorial slides, as well as lecture notes Organization: Clustering Motivation K-Means

More information

Vectors. Teaching Learning Point. Ç, where OP. l m n

Vectors. Teaching Learning Point. Ç, where OP. l m n Vectors 9 Teaching Learning Point l A quantity that has magnitude as well as direction is called is called a vector. l A directed line segment represents a vector and is denoted y AB Å or a Æ. l Position

More information

Artificial Neural Networks Examination, March 2004

Artificial Neural Networks Examination, March 2004 Artificial Neural Networks Examination, March 2004 Instructions There are SIXTY questions (worth up to 60 marks). The exam mark (maximum 60) will be added to the mark obtained in the laborations (maximum

More information

Expectation Maximization Algorithm

Expectation Maximization Algorithm Expectation Maximization Algorithm Vibhav Gogate The University of Texas at Dallas Slides adapted from Carlos Guestrin, Dan Klein, Luke Zettlemoyer and Dan Weld The Evils of Hard Assignments? Clusters

More information

Lecture 11: Unsupervised Machine Learning

Lecture 11: Unsupervised Machine Learning CSE517A Machine Learning Spring 2018 Lecture 11: Unsupervised Machine Learning Instructor: Marion Neumann Scribe: Jingyu Xin Reading: fcml Ch6 (Intro), 6.2 (k-means), 6.3 (Mixture Models); [optional]:

More information

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps

Analysis of Interest Rate Curves Clustering Using Self-Organising Maps Analysis of Interest Rate Curves Clustering Using Self-Organising Maps M. Kanevski (1), V. Timonin (1), A. Pozdnoukhov(1), M. Maignan (1,2) (1) Institute of Geomatics and Analysis of Risk (IGAR), University

More information

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a

Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a

More information

Redoing the Foundations of Decision Theory

Redoing the Foundations of Decision Theory Redoing the Foundations of Decision Theory Joe Halpern Cornell University Joint work with Larry Blume and David Easley Economics Cornell Redoing the Foundations of Decision Theory p. 1/21 Decision Making:

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Gaussian Mixture Models, Expectation Maximization

Gaussian Mixture Models, Expectation Maximization Gaussian Mixture Models, Expectation Maximization Instructor: Jessica Wu Harvey Mudd College The instructor gratefully acknowledges Andrew Ng (Stanford), Andrew Moore (CMU), Eric Eaton (UPenn), David Kauchak

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

General Neoclassical Closure Theory: Diagonalizing the Drift Kinetic Operator

General Neoclassical Closure Theory: Diagonalizing the Drift Kinetic Operator General Neoclassical Closure Theory: Diagonalizing the Drift Kinetic Operator E. D. Held eheld@cc.usu.edu Utah State University General Neoclassical Closure Theory:Diagonalizing the Drift Kinetic Operator

More information

Mixtures of Gaussians continued

Mixtures of Gaussians continued Mixtures of Gaussians continued Machine Learning CSE446 Carlos Guestrin University of Washington May 17, 2013 1 One) bad case for k-means n Clusters may overlap n Some clusters may be wider than others

More information

Regression with Numerical Optimization. Logistic

Regression with Numerical Optimization. Logistic CSG220 Machine Learning Fall 2008 Regression with Numerical Optimization. Logistic regression Regression with Numerical Optimization. Logistic regression based on a document by Andrew Ng October 3, 204

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information

! " # $! % & '! , ) ( + - (. ) ( ) * + / 0 1 2 3 0 / 4 5 / 6 0 ; 8 7 < = 7 > 8 7 8 9 : Œ Š ž P P h ˆ Š ˆ Œ ˆ Š ˆ Ž Ž Ý Ü Ý Ü Ý Ž Ý ê ç è ± ¹ ¼ ¹ ä ± ¹ w ç ¹ è ¼ è Œ ¹ ± ¹ è ¹ è ä ç w ¹ ã ¼ ¹ ä ¹ ¼ ¹ ±

More information

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1

K-means. Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University. November 19 th, Carlos Guestrin 1 EM Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University November 19 th, 2007 2005-2007 Carlos Guestrin 1 K-means 1. Ask user how many clusters they d like. e.g. k=5 2. Randomly guess

More information

Unsupervised Learning

Unsupervised Learning 2018 EE448, Big Data Mining, Lecture 7 Unsupervised Learning Weinan Zhang Shanghai Jiao Tong University http://wnzhang.net http://wnzhang.net/teaching/ee448/index.html ML Problem Setting First build and

More information

Solution. Daozheng Chen. Challenge 1

Solution. Daozheng Chen. Challenge 1 Solution Daozheng Chen 1 For all the scatter plots and 2D histogram plots within this solution, the x axis is for the saturation component, and the y axis is the value component. Through out the solution,

More information

Cross-validation for detecting and preventing overfitting

Cross-validation for detecting and preventing overfitting Cross-validation for detecting and preventing overfitting A Regression Problem = f() + noise Can we learn f from this data? Note to other teachers and users of these slides. Andrew would be delighted if

More information

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning

Hebb rule book: 'The Organization of Behavior' Theory about the neural bases of learning PCA by neurons Hebb rule 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning Learning takes place in synapses. Synapses get modified, they get stronger when the pre- and

More information

ECE 5984: Introduction to Machine Learning

ECE 5984: Introduction to Machine Learning ECE 5984: Introduction to Machine Learning Topics: (Finish) Expectation Maximization Principal Component Analysis (PCA) Readings: Barber 15.1-15.4 Dhruv Batra Virginia Tech Administrativia Poster Presentation:

More information

Sample Exam COMP 9444 NEURAL NETWORKS Solutions

Sample Exam COMP 9444 NEURAL NETWORKS Solutions FAMILY NAME OTHER NAMES STUDENT ID SIGNATURE Sample Exam COMP 9444 NEURAL NETWORKS Solutions (1) TIME ALLOWED 3 HOURS (2) TOTAL NUMBER OF QUESTIONS 12 (3) STUDENTS SHOULD ANSWER ALL QUESTIONS (4) QUESTIONS

More information

Clustering and Gaussian Mixtures

Clustering and Gaussian Mixtures Clustering and Gaussian Mixtures Oliver Schulte - CMPT 883 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15 2 25 5 1 15 2 25 detected tures detected

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Latent Variable Models and EM Algorithm

Latent Variable Models and EM Algorithm SC4/SM8 Advanced Topics in Statistical Machine Learning Latent Variable Models and EM Algorithm Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/atsml/

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2 Agenda Expectation-maximization

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Bayesian Networks Structure Learning (cont.)

Bayesian Networks Structure Learning (cont.) Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic

More information

Framework for functional tree simulation applied to 'golden delicious' apple trees

Framework for functional tree simulation applied to 'golden delicious' apple trees Purdue University Purdue e-pubs Open Access Theses Theses and Dissertations Spring 2015 Framework for functional tree simulation applied to 'golden delicious' apple trees Marek Fiser Purdue University

More information

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization

Computer Vision Group Prof. Daniel Cremers. 6. Mixture Models and Expectation-Maximization Prof. Daniel Cremers 6. Mixture Models and Expectation-Maximization Motivation Often the introduction of latent (unobserved) random variables into a model can help to express complex (marginal) distributions

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Expectation Maximization (EM) and Mixture Models Hamid R. Rabiee Jafar Muhammadi, Mohammad J. Hosseini Spring 203 http://ce.sharif.edu/courses/9-92/2/ce725-/ Agenda Expectation-maximization

More information

Machine Learning and Adaptive Systems. Lectures 5 & 6

Machine Learning and Adaptive Systems. Lectures 5 & 6 ECE656- Lectures 5 & 6, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 c. Performance Learning-LMS Algorithm (Widrow 1960) The iterative procedure in steepest

More information

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation

Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Master Recherche IAC TC2: Apprentissage Statistique & Optimisation Alexandre Allauzen Anne Auger Michèle Sebag LIMSI LRI Oct. 4th, 2012 This course Bio-inspired algorithms Classical Neural Nets History

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Bishop PRML Ch. 9 Alireza Ghane c Ghane/Mori 4 6 8 4 6 8 4 6 8 4 6 8 5 5 5 5 5 5 4 6 8 4 4 6 8 4 5 5 5 5 5 5 µ, Σ) α f Learningscale is slightly Parameters is slightly larger larger

More information

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 6.047 / 6.878 Computational Biology: Genomes, Networks, Evolution Fall 2008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017

COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 COMS 4721: Machine Learning for Data Science Lecture 16, 3/28/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University SOFT CLUSTERING VS HARD CLUSTERING

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries

Informatics 2B: Learning and Data Lecture 10 Discriminant functions 2. Minimal misclassifications. Decision Boundaries Overview Gaussians estimated from training data Guido Sanguinetti Informatics B Learning and Data Lecture 1 9 March 1 Today s lecture Posterior probabilities, decision regions and minimising the probability

More information

Latent Variable Models and Expectation Maximization

Latent Variable Models and Expectation Maximization Latent Variable Models and Expectation Maximization Oliver Schulte - CMPT 726 Bishop PRML Ch. 9 2 4 6 8 1 12 14 16 18 2 4 6 8 1 12 14 16 18 5 1 15 2 25 5 1 15 2 25 2 4 6 8 1 12 14 2 4 6 8 1 12 14 5 1 15

More information

Neural Networks Lecture 7: Self Organizing Maps

Neural Networks Lecture 7: Self Organizing Maps Neural Networks Lecture 7: Self Organizing Maps H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011 H. A. Talebi, Farzaneh Abdollahi Neural

More information

K-Means and Gaussian Mixture Models

K-Means and Gaussian Mixture Models K-Means and Gaussian Mixture Models David Rosenberg New York University October 29, 2016 David Rosenberg (New York University) DS-GA 1003 October 29, 2016 1 / 42 K-Means Clustering K-Means Clustering David

More information

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity:

Synaptic Plasticity. Introduction. Biophysics of Synaptic Plasticity. Functional Modes of Synaptic Plasticity. Activity-dependent synaptic plasticity: Synaptic Plasticity Introduction Dayan and Abbott (2001) Chapter 8 Instructor: Yoonsuck Choe; CPSC 644 Cortical Networks Activity-dependent synaptic plasticity: underlies learning and memory, and plays

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Expectation Maximization Mark Schmidt University of British Columbia Winter 2018 Last Time: Learning with MAR Values We discussed learning with missing at random values in data:

More information

LA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce

LA PRISE DE CALAIS. çoys, çoys, har - dis. çoys, dis. tons, mantz, tons, Gas. c est. à ce. C est à ce. coup, c est à ce > ƒ? @ Z [ \ _ ' µ `. l 1 2 3 z Æ Ñ 6 = Ð l sl (~131 1606) rn % & +, l r s s, r 7 nr ss r r s s s, r s, r! " # $ s s ( ) r * s, / 0 s, r 4 r r 9;: < 10 r mnz, rz, r ns, 1 s ; j;k ns, q r s { } ~ l r mnz,

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM.

1 EM algorithm: updating the mixing proportions {π k } ik are the posterior probabilities at the qth iteration of EM. Université du Sud Toulon - Var Master Informatique Probabilistic Learning and Data Analysis TD: Model-based clustering by Faicel CHAMROUKHI Solution The aim of this practical wor is to show how the Classification

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 Discriminative vs Generative Models Discriminative: Just learn a decision boundary between your

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Neural Networks Lecture 2:Single Layer Classifiers

Neural Networks Lecture 2:Single Layer Classifiers Neural Networks Lecture 2:Single Layer Classifiers H.A Talebi Farzaneh Abdollahi Department of Electrical Engineering Amirkabir University of Technology Winter 2011. A. Talebi, Farzaneh Abdollahi Neural

More information

A Study on the Analysis of Measurement Errors of Specific Gravity Meter

A Study on the Analysis of Measurement Errors of Specific Gravity Meter HWAHAK KONGHAK Vol. 40, No. 6, December, 2002, pp. 676-680 (2001 7 2, 2002 8 5 ) A Study on the Analysis of Measurement Errors of Specific Gravity Meter Kang-Jin Lee, Jae-Young Her, Young-Cheol Ha, Seung-Hee

More information

Manifold Regularization

Manifold Regularization Manifold Regularization Vikas Sindhwani Department of Computer Science University of Chicago Joint Work with Mikhail Belkin and Partha Niyogi TTI-C Talk September 14, 24 p.1 The Problem of Learning is

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 218 Outlines Overview Introduction Linear Algebra Probability Linear Regression 1

More information

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS

AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS AN IDENTIFICATION ALGORITHM FOR ARMAX SYSTEMS First the X, then the AR, finally the MA Jan C. Willems, K.U. Leuven Workshop on Observation and Estimation Ben Gurion University, July 3, 2004 p./2 Joint

More information

Artificial Neural Network : Training

Artificial Neural Network : Training Artificial Neural Networ : Training Debasis Samanta IIT Kharagpur debasis.samanta.iitgp@gmail.com 06.04.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 06.04.2018 1 / 49 Learning of neural

More information

" #$ P UTS W U X [ZY \ Z _ `a \ dfe ih j mlk n p q sr t u s q e ps s t x q s y i_z { U U z W } y ~ y x t i e l US T { d ƒ ƒ ƒ j s q e uˆ ps i ˆ p q y

 #$ P UTS W U X [ZY \ Z _ `a \ dfe ih j mlk n p q sr t u s q e ps s t x q s y i_z { U U z W } y ~ y x t i e l US T { d ƒ ƒ ƒ j s q e uˆ ps i ˆ p q y " #$ +. 0. + 4 6 4 : + 4 ; 6 4 < = =@ = = =@ = =@ " #$ P UTS W U X [ZY \ Z _ `a \ dfe ih j mlk n p q sr t u s q e ps s t x q s y i_z { U U z W } y ~ y x t i e l US T { d ƒ ƒ ƒ j s q e uˆ ps i ˆ p q y h

More information

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods

Techniques for Dimensionality Reduction. PCA and Other Matrix Factorization Methods Techniques for Dimensionality Reduction PCA and Other Matrix Factorization Methods Outline Principle Compoments Analysis (PCA) Example (Bishop, ch 12) PCA as a mixture model variant With a continuous latent

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Hopfield Network Recurrent Netorks

Hopfield Network Recurrent Netorks Hopfield Network Recurrent Netorks w 2 w n E.P. P.E. y (k) Auto-Associative Memory: Given an initial n bit pattern returns the closest stored (associated) pattern. No P.E. self-feedback! w 2 w n2 E.P.2

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 2: Bayesian Basics https://people.orie.cornell.edu/andrew/orie6741 Cornell University August 25, 2016 1 / 17 Canonical Machine Learning

More information

Factor Analysis (10/2/13)

Factor Analysis (10/2/13) STA561: Probabilistic machine learning Factor Analysis (10/2/13) Lecturer: Barbara Engelhardt Scribes: Li Zhu, Fan Li, Ni Guan Factor Analysis Factor analysis is related to the mixture models we have studied.

More information

Neural networks and support vector machines

Neural networks and support vector machines Neural netorks and support vector machines Perceptron Input x 1 Weights 1 x 2 x 3... x D 2 3 D Output: sgn( x + b) Can incorporate bias as component of the eight vector by alays including a feature ith

More information

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan

Clustering. CSL465/603 - Fall 2016 Narayanan C Krishnan Clustering CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Supervised vs Unsupervised Learning Supervised learning Given x ", y " "%& ', learn a function f: X Y Categorical output classification

More information

CS 4700: Foundations of Artificial Intelligence

CS 4700: Foundations of Artificial Intelligence CS 4700: Foundations of Artificial Intelligence Prof. Bart Selman selman@cs.cornell.edu Machine Learning: Neural Networks R&N 18.7 Intro & perceptron learning 1 2 Neuron: How the brain works # neurons

More information

Lecture 6. Regression

Lecture 6. Regression Lecture 6. Regression Prof. Alan Yuille Summer 2014 Outline 1. Introduction to Regression 2. Binary Regression 3. Linear Regression; Polynomial Regression 4. Non-linear Regression; Multilayer Perceptron

More information

FASTGEO - A HISTOGRAM BASED APPROACH TO LINEAR GEOMETRIC ICA. Andreas Jung, Fabian J. Theis, Carlos G. Puntonet, Elmar W. Lang

FASTGEO - A HISTOGRAM BASED APPROACH TO LINEAR GEOMETRIC ICA. Andreas Jung, Fabian J. Theis, Carlos G. Puntonet, Elmar W. Lang FASTGEO - A HISTOGRAM BASED APPROACH TO LINEAR GEOMETRIC ICA Andreas Jung, Fabian J Theis, Carlos G Puntonet, Elmar W Lang Institute for Theoretical Physics, University of Regensburg Institute of Biophysics,

More information

QUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS

QUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS International Workshop Quarkonium Working Group QUESTIONS ON QUARKONIUM PRODUCTION IN NUCLEAR COLLISIONS ALBERTO POLLERI TU München and ECT* Trento CERN - November 2002 Outline What do we know for sure?

More information

Classic K -means clustering. Classic K -means example (K = 2) Finding the optimal w k. Finding the optimal s n J =

Classic K -means clustering. Classic K -means example (K = 2) Finding the optimal w k. Finding the optimal s n J = Review of classic (GOF K -means clustering x 2 Fall 2015 x 1 Lecture 8, February 24, 2015 K-means is traditionally a clustering algorithm. Learning: Fit K prototypes w k (the rows of some matrix, W to

More information

Planning for Reactive Behaviors in Hide and Seek

Planning for Reactive Behaviors in Hide and Seek University of Pennsylvania ScholarlyCommons Center for Human Modeling and Simulation Department of Computer & Information Science May 1995 Planning for Reactive Behaviors in Hide and Seek Michael B. Moore

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

ALTER TABLE Employee ADD ( Mname VARCHAR2(20), Birthday DATE );

ALTER TABLE Employee ADD ( Mname VARCHAR2(20), Birthday DATE ); !! "# $ % & '( ) # * +, - # $ "# $ % & '( ) # *.! / 0 "# "1 "& # 2 3 & 4 4 "# $ % & '( ) # *!! "# $ % & # * 1 3 - "# 1 * #! ) & 3 / 5 6 7 8 9 ALTER ; ? @ A B C D E F G H I A = @ A J > K L ; ?

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017

COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 COMS 4721: Machine Learning for Data Science Lecture 10, 2/21/2017 Prof. John Paisley Department of Electrical Engineering & Data Science Institute Columbia University FEATURE EXPANSIONS FEATURE EXPANSIONS

More information

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang.

Machine Learning. CUNY Graduate Center, Spring Lectures 11-12: Unsupervised Learning 1. Professor Liang Huang. Machine Learning CUNY Graduate Center, Spring 2013 Lectures 11-12: Unsupervised Learning 1 (Clustering: k-means, EM, mixture models) Professor Liang Huang huang@cs.qc.cuny.edu http://acl.cs.qc.edu/~lhuang/teaching/machine-learning

More information

Brief Introduction of Machine Learning Techniques for Content Analysis

Brief Introduction of Machine Learning Techniques for Content Analysis 1 Brief Introduction of Machine Learning Techniques for Content Analysis Wei-Ta Chu 2008/11/20 Outline 2 Overview Gaussian Mixture Model (GMM) Hidden Markov Model (HMM) Support Vector Machine (SVM) Overview

More information

Information Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Information Gain. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University. te to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Kernel Density Estimation, Factor Analysis Mark Schmidt University of British Columbia Winter 2017 Admin Assignment 2: 2 late days to hand it in tonight. Assignment 3: Due Feburary

More information