Clustering: Mixture Models

Size: px
Start display at page:

Download "Clustering: Mixture Models"

Transcription

1 Clusterig: Mixture Models Machie Learig B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks!

2 Problem with K- meas

3 Hard Assigmet of Samples ito Three Clusters Idividual 1 Idividual 2 Idividual 3 Idividual 4 Idividual 5 Idividual 6 Idividual 7 Idividual 8 Idividual 9 Idividual 10 Cluster 1 Cluster 2 Cluster

4 ProbabilisBc SoD- Clusterig of Samples ito Three Clusters Probability of Idividual 1 Idividual 2 Idividual 3 Idividual 4 Idividual 5 Idividual 6 Idividual 7 Idividual 8 Idividual 9 Cluster 1 Cluster 2 Cluster Idividual 10 1 Each sample ca be assiged to more tha oe clusters with a certai probability. For each sample, the probabilises for all clusters should sum to 1. (i.e., each row should sum to 1.) Each cluster is explaied by a cluster ceter variable (i.e., cluster mea) Sum

5 Probability Model for Data P(X)?

6 Mixture Model A desity model p(x) may be muls- modal. MulS- model: how do we model this? Uimodal - Gaussia

7 Mixture Model We may be able to model it as a mixture of ui- modal distribusos (e.g., Gaussias). Each mode may correspod to a differet sub- populaso (e.g., male ad female).

8 Learig Mixture Models from Data Give data geerated from muls- modal distribuso, ca we fid a represetaso of the muls- model distribuso as a mixture of ui- modal distribusos?

9 Gaussia Mixture Models (GMMs) Cosider a mixture of K Gaussia compoets: p(x ) = p(x z = k) p(z k = k) = N(x µ k,σ k ) k π k mixture proporso mixture compoet

10 Gaussia Mixture Models (GMMs) Cosider a mixture of K Gaussia compoets: p(x ) = p(x,z = k) p(z k = k) = N(x µ k,σ k ) k π k mixture proporso mixture compoet This probability model describes how each data poit x ca be geerated Step 1: Flip a K- sided die (with probability for the k- th side) to select a cluster c π k Step 2: Geerate the values of the data poit from N(µ c,σ c )

11 Gaussia Mixture Models (GMMs) Cosider a mixture of K Gaussia compoets: p(x ) = p(x,z = k) p(z k = k) = N(x µ k,σ k ) k π k mixture proporso mixture compoet Parameters for K clusters: θ = {µ k,σ k,π k,k =1,...,K}

12 Learig mixture models Latet variable model: data are oly parsally observed! x i : observed sample data z i ={z i 1. z ik } : Uobserved cluster labels (each elemet 0 or 1, oly oe of them is 1) MLE essmate What if all data (x i, z i ) are observed? Maximize the data log likelihood for (x i, z i ) based o p(x i, z i ) Easy to opsmize! I pracsce, oly x i, s are observed Maximize the data log likelihood for (x i ) based o p(x i ) Difficult to opsmize! Maximize the expected data log likelihood for (x i, z i ) based o p(x i, z i ) ExpectaSo- MaximizaSo (EM) algorithm

13 Learig mixture models: fully observed data I fully observed iid seggs, assumig the cluster labels z i s were observed, the log likelihood decomposes ito a sum of local terms. l c (θ;d) = log p(x,z θ) = log p(z θ ) + log p(x z,θ) Depeds o π k Depeds o µ k,σ k µ k,σ k π k The opsmizaso problems for ad for are decoupled, ad a closed- form soluso for MLE exists.

14 MLE for GMM with fully observed data! If we are doig MLE for completely observed data! Data log- likelihood l(θ;d) = log p(z, x ) = log p(z π) p(x z,µ,σ)! MLE = k z log π k + log N(x ;µ k,σ) z k k k k k = z k k 1 logπ k - z (x 2σ 2 - µ k ) 2 + C! What if we do ot kow z?

15 Learig mixture models I fully observed iid seggs, assumig the cluster labels z i s were observed, the log likelihood decomposes ito a sum of local terms. l c (θ;d) = With latet variables for cluster labels l c (θ;d) = all the parameters become coupled together via margializa2o Are they equally difficult? log p(x,z θ) log p(x θ) = log p(x,z θ) = log p(z θ) p(x z,θ) z z Depeds o π k Depeds o µ k,σ k

16 Theory uderlyig EM Recall that accordig to MLE, we ited to lear the model parameter that would have maximized the likelihood of the data. But we do ot observe z, so compusg l c (θ;d) = log p(x,z θ) = log p(z θ)p(x z,θ) is difficult! z z OpSmizig the log- likelihood for MLE is difficult! What shall we do?

17 Complete vs. Expected Complete Log Likelihoods The complete log likelihood: The expected complete log likelihood Depeds o π k Depeds o µ k,σ k

18 Complete vs. Expected Complete Log Likelihoods The complete log likelihood: The expected complete log likelihood EM opsmizes the expected complete log likelihood

19 EM Algorithm MaximizaSo (M)- step: - Fid mixture parameters ExpectaSo (E)- step: - Re- assig samples x i s to clusters - Impute the uobserved values z i Iterate usl covergece

20 K- Meas Clusterig Algorithm Fid the cluster meas Re- assig samples x i s to clusters argmax k x i µ k 2 2 Iterate usl covergece

21 The ExpectaBo- MaximizaBo (EM) Algorithm Start: "Guess" the cetroid µ k ad covariace Σ k of each of the K clusters Loop

22 The ExpectaBo- MaximizaBo (EM) Algorithm A som k- meas E step: M step: π k (t +1) = k(t ) τ N = k N (t µ +1) k = (t Σ +1) k = τ k(t ) x τ k(t ) τ k(t ) (x µ k (t +1) )(x µ k (t +1) ) T τ k(t )

23 Compare: K- meas The EM algorithm for mixtures of Gaussias is like a "som versio" of the K- meas algorithm. I the K- meas E- step we do hard assigmet: I the K- meas M- step we update the meas as the weighted sum of the data, but ow the weights are 0 or 1:

24 Expected Complete Log Likelihood Lower- bouds Complete Log Likelihood For ay distribuso q(z), defie expected complete log likelihood: Does maximizig this surrogate yield a maximizer of the likelihood? Jese s iequality

25 Closig otes Covergece Seed choice Quality of cluster How may clusters

26 Covergece Why should the K- meas algorithm ever reach a fixed poit? - - A state i which clusters do t chage. K- meas is a special case of a geeral procedure the ExpectaSo MaximizaSo (EM) algorithm. Both are kow to coverge. Number of iterasos could be large.

27 Seed Choice Results ca vary based o radom seed selecso. Some seeds ca result i covergece to sub- opsmal clusterigs. Select good seeds usig a heurissc (e.g., doc least similar to ay exissg mea) Try out mulsple starsg poits (very importat!!!) IiSalize with the results of aother method.

28 What Is A Good Clusterig? Iteral criterio: A good clusterig will produce high quality clusters i which: the itra- class (that is, itra- cluster) similarity is high the iter- class similarity is low The measured quality of a clusterig depeds o both the obj represetaso ad the similarity measure used Exteral criteria for clusterig quality Quality measured by its ability to discover some or all of the hidde pasers or latet classes i gold stadard data Assesses a clusterig with respect to groud truth

29 How May Clusters? Number of clusters K is give ParSSo docs ito predetermied umber of clusters Fidig the right umber of clusters is part of the problem Give objs, parsso ito a appropriate umber of subsets. E.g., for query results - ideal value of K ot kow up frot - though UI may impose limits. Tradeoff betwee havig more clusters (beser focus withi each cluster) ad havig too may clusters Noparametric Bayesia Iferece

30 Cross validabo We ca also use cross validaso to determie the correct umber of classes Recall that GMMs is a geerasve model. We ca compute the likelihood of the held- out data to determie which model (umber of clusters) is more accurate

31 Cross validabo

32 Gaussia mixture clusterig

33 Clusterig methods: Compariso Hierarchical K-meas GMM Ruig time aively, O(N 3 ) Assumptios requires a similarity / distace measure Iput parameters oe fastest (each iteratio is liear) strog assumptios K (umber of clusters) fast (each iteratio is liear) strogest assumptios K (umber of clusters) Clusters subjective (oly a tree is retured) exactly K clusters exactly K clusters

34 What you should kow about Mixture Models Gaussia mixture models ProbabilisSc extesio of K- meas for som- clusterig EM algorithm for learig by assumig data are oly parsally observed Cluster labels are treated as the uobserved part of data EM algorithm for learig from partly uobserved data MLE of θ = EM essmate: θ = Where X is observed part of data, Z is uobserved

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical Last Time EM Clusterig Soft versio of K-meas clusterig Iput: m dimesioal objects X = {

More information

5.1 Review of Singular Value Decomposition (SVD)

5.1 Review of Singular Value Decomposition (SVD) MGMT 69000: Topics i High-dimesioal Data Aalysis Falll 06 Lecture 5: Spectral Clusterig: Overview (cotd) ad Aalysis Lecturer: Jiamig Xu Scribe: Adarsh Barik, Taotao He, September 3, 06 Outlie Review of

More information

Distributional Similarity Models (cont.)

Distributional Similarity Models (cont.) Distributioal Similarity Models (cot.) Regia Barzilay EECS Departmet MIT October 19, 2004 Sematic Similarity Vector Space Model Similarity Measures cosie Euclidea distace... Clusterig k-meas hierarchical

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

CS 270 Algorithms. Oliver Kullmann. Growth of Functions. Divide-and- Conquer Min-Max- Problem. Tutorial. Reading from CLRS for week 2

CS 270 Algorithms. Oliver Kullmann. Growth of Functions. Divide-and- Conquer Min-Max- Problem. Tutorial. Reading from CLRS for week 2 Geeral remarks Week 2 1 Divide ad First we cosider a importat tool for the aalysis of algorithms: Big-Oh. The we itroduce a importat algorithmic paradigm:. We coclude by presetig ad aalysig two examples.

More information

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach STAT 425: Itroductio to Noparametric Statistics Witer 28 Lecture 7: Desity Estimatio: k-nearest Neighbor ad Basis Approach Istructor: Ye-Chi Che Referece: Sectio 8.4 of All of Noparametric Statistics.

More information

Lecture 2: Monte Carlo Simulation

Lecture 2: Monte Carlo Simulation STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do?

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

CS / MCS 401 Homework 3 grader solutions

CS / MCS 401 Homework 3 grader solutions CS / MCS 401 Homework 3 grader solutios assigmet due July 6, 016 writte by Jāis Lazovskis maximum poits: 33 Some questios from CLRS. Questios marked with a asterisk were ot graded. 1 Use the defiitio of

More information

Regression and generalization

Regression and generalization Regressio ad geeralizatio CE-717: Machie Learig Sharif Uiversity of Techology M. Soleymai Fall 2016 Curve fittig: probabilistic perspective Describig ucertaity over value of target variable as a probability

More information

Lecture 19: Convergence

Lecture 19: Convergence Lecture 19: Covergece Asymptotic approach I statistical aalysis or iferece, a key to the success of fidig a good procedure is beig able to fid some momets ad/or distributios of various statistics. I may

More information

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + 62. Power series Defiitio 16. (Power series) Give a sequece {c }, the series c x = c 0 + c 1 x + c 2 x 2 + c 3 x 3 + is called a power series i the variable x. The umbers c are called the coefficiets of

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Sequences and Series of Functions

Sequences and Series of Functions Chapter 6 Sequeces ad Series of Fuctios 6.1. Covergece of a Sequece of Fuctios Poitwise Covergece. Defiitio 6.1. Let, for each N, fuctio f : A R be defied. If, for each x A, the sequece (f (x)) coverges

More information

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3 MATH 337 Sequeces Dr. Neal, WKU Let X be a metric space with distace fuctio d. We shall defie the geeral cocept of sequece ad limit i a metric space, the apply the results i particular to some special

More information

CS:3330 (Prof. Pemmaraju ): Assignment #1 Solutions. (b) For n = 3, we will have 3 men and 3 women with preferences as follows: m 1 : w 3 > w 1 > w 2

CS:3330 (Prof. Pemmaraju ): Assignment #1 Solutions. (b) For n = 3, we will have 3 men and 3 women with preferences as follows: m 1 : w 3 > w 1 > w 2 Shiyao Wag CS:3330 (Prof. Pemmaraju ): Assigmet #1 Solutios Problem 1 (a) Cosider iput with me m 1, m,..., m ad wome w 1, w,..., w with the followig prefereces: All me have the same prefereces for wome:

More information

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13

BHW #13 1/ Cooper. ENGR 323 Probabilistic Analysis Beautiful Homework # 13 BHW # /5 ENGR Probabilistic Aalysis Beautiful Homework # Three differet roads feed ito a particular freeway etrace. Suppose that durig a fixed time period, the umber of cars comig from each road oto the

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Lecture 2 February 8, 2016

Lecture 2 February 8, 2016 MIT 6.854/8.45: Advaced Algorithms Sprig 206 Prof. Akur Moitra Lecture 2 February 8, 206 Scribe: Calvi Huag, Lih V. Nguye I this lecture, we aalyze the problem of schedulig equal size tasks arrivig olie

More information

1 Review and Overview

1 Review and Overview DRAFT a fial versio will be posted shortly CS229T/STATS231: Statistical Learig Theory Lecturer: Tegyu Ma Lecture #3 Scribe: Migda Qiao October 1, 2013 1 Review ad Overview I the first half of this course,

More information

Math 61CM - Solutions to homework 3

Math 61CM - Solutions to homework 3 Math 6CM - Solutios to homework 3 Cédric De Groote October 2 th, 208 Problem : Let F be a field, m 0 a fixed oegative iteger ad let V = {a 0 + a x + + a m x m a 0,, a m F} be the vector space cosistig

More information

ECE 901 Lecture 13: Maximum Likelihood Estimation

ECE 901 Lecture 13: Maximum Likelihood Estimation ECE 90 Lecture 3: Maximum Likelihood Estimatio R. Nowak 5/7/009 The focus of this lecture is to cosider aother approach to learig based o maximum likelihood estimatio. Ulike earlier approaches cosidered

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

Ma 530 Introduction to Power Series

Ma 530 Introduction to Power Series Ma 530 Itroductio to Power Series Please ote that there is material o power series at Visual Calculus. Some of this material was used as part of the presetatio of the topics that follow. What is a Power

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information

Section 11.8: Power Series

Section 11.8: Power Series Sectio 11.8: Power Series 1. Power Series I this sectio, we cosider geeralizig the cocept of a series. Recall that a series is a ifiite sum of umbers a. We ca talk about whether or ot it coverges ad i

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract I this lecture we derive risk bouds for kerel methods. We will start by showig that Soft Margi kerel SVM correspods to miimizig

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Practice Problems: Taylor and Maclaurin Series

Practice Problems: Taylor and Maclaurin Series Practice Problems: Taylor ad Maclauri Series Aswers. a) Start by takig derivatives util a patter develops that lets you to write a geeral formula for the -th derivative. Do t simplify as you go, because

More information

CHAPTER 10 INFINITE SEQUENCES AND SERIES

CHAPTER 10 INFINITE SEQUENCES AND SERIES CHAPTER 10 INFINITE SEQUENCES AND SERIES 10.1 Sequeces 10.2 Ifiite Series 10.3 The Itegral Tests 10.4 Compariso Tests 10.5 The Ratio ad Root Tests 10.6 Alteratig Series: Absolute ad Coditioal Covergece

More information

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS Jauary 25, 207 INTRODUCTION TO MATHEMATICAL STATISTICS Abstract. A basic itroductio to statistics assumig kowledge of probability theory.. Probability I a typical udergraduate problem i probability, we

More information

Lecture 12: September 27

Lecture 12: September 27 36-705: Itermediate Statistics Fall 207 Lecturer: Siva Balakrisha Lecture 2: September 27 Today we will discuss sufficiecy i more detail ad the begi to discuss some geeral strategies for costructig estimators.

More information

MA131 - Analysis 1. Workbook 9 Series III

MA131 - Analysis 1. Workbook 9 Series III MA3 - Aalysis Workbook 9 Series III Autum 004 Cotets 4.4 Series with Positive ad Negative Terms.............. 4.5 Alteratig Series.......................... 4.6 Geeral Series.............................

More information

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece 1, 1, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Lecture 13: Maximum Likelihood Estimation

Lecture 13: Maximum Likelihood Estimation ECE90 Sprig 007 Statistical Learig Theory Istructor: R. Nowak Lecture 3: Maximum Likelihood Estimatio Summary of Lecture I the last lecture we derived a risk (MSE) boud for regressio problems; i.e., select

More information

Sequences. Notation. Convergence of a Sequence

Sequences. Notation. Convergence of a Sequence Sequeces A sequece is essetially just a list. Defiitio (Sequece of Real Numbers). A sequece of real umbers is a fuctio Z (, ) R for some real umber. Do t let the descriptio of the domai cofuse you; it

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Convergence of random variables. (telegram style notes) P.J.C. Spreij Covergece of radom variables (telegram style otes).j.c. Spreij this versio: September 6, 2005 Itroductio As we kow, radom variables are by defiitio measurable fuctios o some uderlyig measurable space

More information

Solutions to HW Assignment 1

Solutions to HW Assignment 1 Solutios to HW: 1 Course: Theory of Probability II Page: 1 of 6 Uiversity of Texas at Austi Solutios to HW Assigmet 1 Problem 1.1. Let Ω, F, {F } 0, P) be a filtered probability space ad T a stoppig time.

More information

Dimensionality Reduction vs. Clustering

Dimensionality Reduction vs. Clustering Dimesioality Reductio vs. Clusterig Lecture 9: Cotiuous Latet Variable Models Sam Roweis Traiig such factor models (e.g. FA, PCA, ICA) is called dimesioality reductio. You ca thik of this as (o)liear regressio

More information

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS MASSACHUSTTS INSTITUT OF TCHNOLOGY 6.436J/5.085J Fall 2008 Lecture 9 /7/2008 LAWS OF LARG NUMBRS II Cotets. The strog law of large umbers 2. The Cheroff boud TH STRONG LAW OF LARG NUMBRS While the weak

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 016 MODULE : Statistical Iferece Time allowed: Three hours Cadidates should aswer FIVE questios. All questios carry equal marks. The umber

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning Statistical Data Miig ad Machie Learig Hilary Term 2016 Dio Sejdiovic Departmet of Statistics Oxford Slides ad other materials available at: http://www.stats.ox.ac.u/~sejdiov/sdmml Probabilistic Methods

More information

Metric Space Properties

Metric Space Properties Metric Space Properties Math 40 Fial Project Preseted by: Michael Brow, Alex Cordova, ad Alyssa Sachez We have already poited out ad will recogize throughout this book the importace of compact sets. All

More information

Singular Continuous Measures by Michael Pejic 5/14/10

Singular Continuous Measures by Michael Pejic 5/14/10 Sigular Cotiuous Measures by Michael Peic 5/4/0 Prelimiaries Give a set X, a σ-algebra o X is a collectio of subsets of X that cotais X ad ad is closed uder complemetatio ad coutable uios hece, coutable

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS

Lecture 8: Non-parametric Comparison of Location. GENOME 560, Spring 2016 Doug Fowler, GS Lecture 8: No-parametric Compariso of Locatio GENOME 560, Sprig 2016 Doug Fowler, GS (dfowler@uw.edu) 1 Review What do we mea by oparametric? What is a desirable locatio statistic for ordial data? What

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

More information

8 : Learning Partially Observed GM: the EM algorithm

8 : Learning Partially Observed GM: the EM algorithm 10-708: Probabilistic Graphical Models, Sprig 2015 8 : Learig Partially Observed GM: the EM algorithm Lecturer: Eric P. Xig Scribes: Auric Qiao, Hao Zhag, Big Liu 1 Itroductio Two fudametal questios i

More information

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population A quick activity - Cetral Limit Theorem ad Proportios Lecture 21: Testig Proportios Statistics 10 Coli Rudel Flip a coi 30 times this is goig to get loud! Record the umber of heads you obtaied ad calculate

More information

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11 Machie Learig Theory Tübige Uiversity, WS 06/07 Lecture Tolstikhi Ilya Abstract We will itroduce the otio of reproducig kerels ad associated Reproducig Kerel Hilbert Spaces (RKHS). We will cosider couple

More information

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n

We are mainly going to be concerned with power series in x, such as. (x)} converges - that is, lims N n Review of Power Series, Power Series Solutios A power series i x - a is a ifiite series of the form c (x a) =c +c (x a)+(x a) +... We also call this a power series cetered at a. Ex. (x+) is cetered at

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

1 of 7 7/16/2009 6:06 AM Virtual Laboratories > 6. Radom Samples > 1 2 3 4 5 6 7 6. Order Statistics Defiitios Suppose agai that we have a basic radom experimet, ad that X is a real-valued radom variable

More information

Design and Analysis of ALGORITHM (Topic 2)

Design and Analysis of ALGORITHM (Topic 2) DR. Gatot F. Hertoo, MSc. Desig ad Aalysis of ALGORITHM (Topic 2) Algorithms + Data Structures = Programs Lessos Leared 1 Our Machie Model: Assumptios Geeric Radom Access Machie (RAM) Executes operatios

More information

Chapter 6 Principles of Data Reduction

Chapter 6 Principles of Data Reduction Chapter 6 for BST 695: Special Topics i Statistical Theory. Kui Zhag, 0 Chapter 6 Priciples of Data Reductio Sectio 6. Itroductio Goal: To summarize or reduce the data X, X,, X to get iformatio about a

More information

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators Topic 9: Samplig Distributios of Estimators Course 003, 2016 Page 0 Samplig distributios of estimators Sice our estimators are statistics (particular fuctios of radom variables), their distributio ca be

More information

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1 PH 425 Quatum Measuremet ad Spi Witer 23 SPIS Lab Measure the spi projectio S z alog the z-axis This is the experimet that is ready to go whe you start the program, as show below Each atom is measured

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Itroductio to Probability ad Statistics Lecture 23: Cotiuous radom variables- Iequalities, CLT Puramrita Sarkar Departmet of Statistics ad Data Sciece The Uiversity of Texas at Austi www.cs.cmu.edu/

More information

Analytic Continuation

Analytic Continuation Aalytic Cotiuatio The stadard example of this is give by Example Let h (z) = 1 + z + z 2 + z 3 +... kow to coverge oly for z < 1. I fact h (z) = 1/ (1 z) for such z. Yet H (z) = 1/ (1 z) is defied for

More information

Topic 9 - Taylor and MacLaurin Series

Topic 9 - Taylor and MacLaurin Series Topic 9 - Taylor ad MacLauri Series A. Taylors Theorem. The use o power series is very commo i uctioal aalysis i act may useul ad commoly used uctios ca be writte as a power series ad this remarkable result

More information

General IxJ Contingency Tables

General IxJ Contingency Tables page1 Geeral x Cotigecy Tables We ow geeralize our previous results from the prospective, retrospective ad cross-sectioal studies ad the Poisso samplig case to x cotigecy tables. For such tables, the test

More information

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise First Year Quatitative Comp Exam Sprig, 2012 Istructio: There are three parts. Aswer every questio i every part. Questio I-1 Part I - 203A A radom variable X is distributed with the margial desity: >

More information

Multilayer perceptrons

Multilayer perceptrons Multilayer perceptros If traiig set is ot liearly separable, a etwork of McCulloch-Pitts uits ca give a solutio If o loop exists i etwork, called a feedforward etwork (else, recurret etwork) A two-layer

More information

Lecture 3: MLE and Regression

Lecture 3: MLE and Regression STAT/Q SCI 403: Itroductio to Resamplig Methods Sprig 207 Istructor: Ye-Chi Che Lecture 3: MLE ad Regressio 3. Parameters ad Distributios Some distributios are idexed by their uderlyig parameters. Thus,

More information

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4 MATH 30: Probability ad Statistics 9. Estimatio ad Testig of Parameters Estimatio ad Testig of Parameters We have bee dealig situatios i which we have full kowledge of the distributio of a radom variable.

More information

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f. Lecture 5 Let us give oe more example of MLE. Example 3. The uiform distributio U[0, ] o the iterval [0, ] has p.d.f. { 1 f(x =, 0 x, 0, otherwise The likelihood fuctio ϕ( = f(x i = 1 I(X 1,..., X [0,

More information

Machine Learning Assignment-1

Machine Learning Assignment-1 Uiversity of Utah, School Of Computig Machie Learig Assigmet-1 Chadramouli, Shridhara sdhara@cs.utah.edu 00873255) Sigla, Sumedha sumedha.sigla@utah.edu 00877456) September 10, 2013 1 Liear Regressio a)

More information

Bertrand s Postulate

Bertrand s Postulate Bertrad s Postulate Lola Thompso Ross Program July 3, 2009 Lola Thompso (Ross Program Bertrad s Postulate July 3, 2009 1 / 33 Bertrad s Postulate I ve said it oce ad I ll say it agai: There s always a

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information