Three classification models Discriminant Model: learn the decision boundary directly and apply it to determine the class of each data point

Size: px
Start display at page:

Download "Three classification models Discriminant Model: learn the decision boundary directly and apply it to determine the class of each data point"

Transcription

1 Review of Last Wee Three classificatio models Discrimiat Model: lear the decisio boudary directly ad aly it to determie the class of each data oit Discrimiative Model: lear PY directly Geerative Model: Lear PY through PY ad PY. The oit robability PY ad margial robability P ca also be leared. SML vs. o statistical ML Obective fuctio for SML: MLE or MAP Obective fuctio for ML: MSE maimiig the margi or others. Usuervised Learig Clusterig grou data together EM learig give icomletedata data based o MLE

2 Revisit Bayesia: Prior vs. Smoothig Techique

3 Estimatig the Lielihood Assumig we radomly observe coi tosses to fid head occurs H times what is the frequecy of head for this coi? Assumig the robability is the accordig to MLE we wat to otimie log H H After erformig derivatives o we ca lear that the MLE solutio of is H/ However the H/ model suffers a maor drawbac that usee evets will receive ero robability If we toss a coi 6 times ad fid ero heads H/ model tells us the robability of head is 0 However usee obects should receive a tiy robability rather tha ero give the fact that we ow they do eist. Smoothig: a techique to assig o ero robability to usee obects Add oe esmoothig: assumig eeyt everythig goccurs at least oce. Uder this assumtio the frequecy of W becomes H+/+2 because both head ad tail occurs oce. This is a commoly alied techiques for gram LagaugeModel g Learig

4 Add oe Smoothig vs. Bayesia Prior Last wee after class I received a from a studet i this class Shao Chua Wag sayig that right after the class he roved that t add oe smoothig ca be iterreted as a MAP solutio for coi toss. Recall that the MLE solutio aims at otimiig the lielihood robability H H ad that maosterior robability malielihood robability* Prior robability Assumig the rior robability is set to be roortioal to to rohibit assigig a very large or very small value to the the osterior robability becomes H+ H+ otimiig the osterior w.r.t will obtai H+/+2. It ca also be roved that add lambda smoothig ca be regarded as a MAP give slightly differet rior. Shao Chua s fidig i fact tells us that this two basic smoothig techiques is simly a secial case for Bayesia learig.

5 Usuervised Learig II

6 Clusterig Prof. Shou de Li CSIE/GIM TU 2009/2/9 6

7 What is clusterig? Clusterig is the artitioig ii i of a data set ito subsets clusters so that the data i each subset ideally share some commo trait. Differece betwee clusterig ad classificatio Clusterig: divide iut ito artitios without label. It s usuervised. Classificatio: classify iuts ito Y labeled classes suervised We will itroduce two famous clusterig algorithms K meas clusterig EM clusterig for Gaussia Miture Model 2009/2/9 7

8 Stes of K meas clusterig. Radomly select oits as cluster ceter. 2. Assig the rest of the oits to the cluster of its closest ceter 3. Re calculatig the mea oit of each cluster. 4. Costructig a ew artitio by associatig each oit with the cluster whose cetroid is the closest. 5. Go bac to /2/9 8

9 K Meas Eamle K2 It is A Greedy based Traiig!! Pic seeds Reassig clusters Comute cetroids Reasssig clusters Comute cetroids Reassig clusters Coverged! 2009/2/9 9

10 Meas Clusterig sometimes failed Failure Cases: Viterbi or greedy Traiig got stuc to subotimal easily If a ode is equally close to several clusters it ca cause roblems sice we ca oly assig it to oe class. Ca we do better? Yes usig EM clusterig 2009/2/9 0

11 EM clusterig for Gaussia Mitures Problem Suose you measure a sigle cotiuous variable ibl i a large samle of observatios. Suose the samle cosists of severalclusters of Gaussia observatios with differet meas ad variaces. Our ob is to determie the value of the 3 arameters: The mea ad variace for cluster The mea ad variace for cluster 2 The mea ad variace for cluster The samlig robability for cluster 2009/2/9

12 Multivariate Gaussia Distributio Sigle value Gaussia Distributio: 2 σ e 2 /2 2σ 2σ 2 Multivariate Gaussia Distributio: e D/2 / T 2009/2/9 2

13 Gaussia Miture Distributio GMD A liear combiatio of several Gaussia Distributios It ca model almost ay cotiuous desity give sufficiet umber of Gaussias. K Ν K rior lielihood 2009/2/9 3

14 MLE for GMD Suose we have a dataset D of observatios { 2 } ad we wish to model this data usig a miture of Gaussias. The thelog lielihood fuctio is give by l l l K Ν K Ν 2009/2/9 4

15 MLE solutio for MLE solutio for K l l K Ν l K Ν Ν Ν P c Solve the above equatio: ad Ca be iterreted as the effective umber of oits assiged to cluster The weighted mea of all the oits i the dataset i which the weight for data oit is give by the osterior robability that belogs to a cluster c 2009/2/9 5

16 MLE solutio for If we fid the ero artial derivatives of we will lear the MLE solutio for is T 2009/2/9 6

17 MLE solutio for MLE solutio for Sice there is a costrait that the sum of is we aly Lagragemultilier to maimie aly Lagrage multilier to maimie l + K λ The derivatives of the above w.r.t. equal 0 gives K K + Ν Ν + Ν Ν λ λ λ 0 0 K + + Ν λ 0 0 Ν 2009/2/9 7

18 How to Produce the MLE solutios? How to Produce the MLE solutios? MLE solutio for ad caot be easily obtaied i.e. o close form solutio sice cotais ad o close form solutio sice cotais ad Sice ca be geerated by ad ; ad ad ca be geerated by. We ca treat as a hidde variable ad aly EM algorithm to iteratively lear the arameters. T Ν K Ν 2009/2/9 8

19 EM for Gaussia Mitures Clusterig Goal: Maimie i the Lielihood fuctio w.r.t. the arameters ad Stes Iitialie the arameters ad ad evaluate the iitial value of the log lielihood l. E ste: geerate the osterior robabilities usig curret arameters Ν Ν M ste. Re estimate the arameters usig the curret osterior robabilities see revious age. Checthe log lielihood give curret arameters to see if they coverge. If ot retur to E ste. 2009/2/9 9 K

20 Grahic Eamle of EM Clusterig 2009/2/9 20

21 EM Framewor ad EM Theory

22 Goal The goal of EM algorithm is to fid maimum lielihood solutios for models that cotai latet or missig variables. We rereset the iut data as ad the latet variables as. The arameters of the model is rereseted as. {} is called the comlete data ad is called the icomlete data. Assumig the osterior distributio P ca be geerated tdgive ad is ow. The log lielihood fuctio to maimie is l. Sice is uow we ca rereset l as l Problem: the summatio over the latet variables aears iside the logarithm. Therefore eve P belogs to Gaussia P will still be hard to comute. 22

23 The sirit of EM. Create a iitial iti model dl 0. The iitialiatio ca be arbitrarily radomly or with a small set of traiig eamles. 2. Use the eistig model to obtai aother model ew such that l ew > l 3. Reeat the above ste util reachig a local maimum. 4. Challege: How ca we guarateed to fid a better model after each iteratio give the hidde variable eists? ew As : arg ma l 23

24 EM Theorem If we ca fid a ew that guaratee ew l > l the the same ew will also satisfy the coditio l ew > l If EM theorem is true the we ca try to fid ew arg ma l The such ew will lead to better P How ca we rove EM theorem? If we ca rove the equatio below the we are doe ew l l ew l l 24

25 l Proof of EM Theorem /2 ew l ew l l Sice P P* P l Pl P l P l P ew l P {l P ew l P ew } {l P l P } Aly o both eds: [l ew l ] l ew l ew {l l } {l ew l If we ca rove <0 the we are doe. } 25

26 Proof of EM Theorem 2/2 ew {l } < The roof used Jese s Iequality P t y i P t yi log t P t yi More geerally if ad q are robability distributios log q

27 Otimiatio Process of EM ew arg ma l Iitial ste: radomly choose 0 E ste: usig a eistig arameter to estimate P M ste: fid ew that maimie Q l Chec the covergece of Q or if ot satisfied the set ew ad go bac to E ste. 27

28 EM: Why P i the E ste? EM: Why P i the E ste? Let s go bac to lplp lp l l l l l l l l l q q q q q l l l q q q q Ideedet of q Larger tha 0 equal hs Whe settig ql ca Larger tha 0 equal hs whe q cause The we fid ew that maimie l l q q The we fid that maimie this ew will also mae >0 l l 28

29 Global otimal l l 0l

30 Geeralied EM GEM l ew l ew l l Sice the above is always true. I the M ste we do t really eed to fid a ew that otimies l If we ca guaratee that ew always does a better ob tha i l the we are guaratee to reach a local otimal 30

31 Ideal vs. Available Data Aligmet MT: Problem for Machie Traslatio PE E oisy Chael PFE F Ideal: e e 2 e 3.. solvable by SL f f 2 f 3. Available: e e 2 e 3.. eed EM f f 2 f /2/9 3

32 E: Eglish Frech Aligmet Data: the house la maiso house maiso Aligmets are missig!! Theory: Eglish words are traslated first the ermuted. Parameters: Plathe maisothe lahouse maisohouse 2009/2/9 32

33 Model to lear: Plathe? E: EMTraiig o MT Pmaisothe? Possible assigmets: Plahouse? Pmaisohouse? a iitialie uiformly: Clathe0*/8+*/8/8 Plathe/2 E ste a /8 M ste Cmaisothe */8+0*/8/8 Pmaisothe/2 b /8 MLE Clahouse*/8+0*/8/8 Plahouse/2 Cmaisohouse*/8+2*/83/8 Pmaisohouse/2 lathe3/4 Pa7/256 maisothe/4 Pb47/256 lahouse/8 ormalie ormalie maisohouse7/8 lathe/2 Clathe9/32 E ste M ste Pa3/32 maisothe/2 Cmaisothe3/32 Pb9/32 lahouse/4 Clahouse3/ /2/9 maisohouse3/4 33 Cmaisohouse2/32 b

34 Data ad Model Comlete data Icomlete model arameters uow icomlete data comlete model Suervised learig argma Pmdata m Data geeratio argmapdm d comlete model comlete data icomlete data icomlete model EM comlete data & model argmapicomlete datam m 2009/2/9 34

35 Recommed Readig Patter Recogitio ad Machie Learig Bisho Chater 2 Chater 9 "Bayesia Iferece with Tears Kevi Kight Se 2009

Expectation-Maximization Algorithm.

Expectation-Maximization Algorithm. Expectatio-Maximizatio Algorithm. Petr Pošík Czech Techical Uiversity i Prague Faculty of Electrical Egieerig Dept. of Cyberetics MLE 2 Likelihood.........................................................................................................

More information

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008

Machine Learning. Logistic Regression -- generative verses discriminative classifier. Le Song /15-781, Spring 2008 Machie Learig 070/578 Srig 008 Logistic Regressio geerative verses discrimiative classifier Le Sog Lecture 5 Setember 4 0 Based o slides from Eric Xig CMU Readig: Cha. 3..34 CB Geerative vs. Discrimiative

More information

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019

Outline. CSCI-567: Machine Learning (Spring 2019) Outline. Prof. Victor Adamchik. Mar. 26, 2019 Outlie CSCI-567: Machie Learig Sprig 209 Gaussia mixture models Prof. Victor Adamchik 2 Desity estimatio U of Souther Califoria Mar. 26, 209 3 Naive Bayes Revisited March 26, 209 / 57 March 26, 209 2 /

More information

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians 02/22/11 Hidde Variables, the EM Algorithm, ad Mitures of Gaussias Comuter Visio CS 143, Brow James Hays May slides from Derek Hoiem Today s Class Eamles of Missig Data Problems Detectig outliers Backgroud

More information

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar.

Clustering. CM226: Machine Learning for Bioinformatics. Fall Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar. Clusterig CM226: Machie Learig for Bioiformatics. Fall 216 Sriram Sakararama Ackowledgmets: Fei Sha, Ameet Talwalkar Clusterig 1 / 42 Admiistratio HW 1 due o Moday. Email/post o CCLE if you have questios.

More information

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts

Basics of Inference. Lecture 21: Bayesian Inference. Review - Example - Defective Parts, cont. Review - Example - Defective Parts Basics of Iferece Lecture 21: Sta230 / Mth230 Coli Rudel Aril 16, 2014 U util this oit i the class you have almost exclusively bee reseted with roblems where we are usig a robability model where the model

More information

Hypothesis Testing. H 0 : θ 1 1. H a : θ 1 1 (but > 0... required in distribution) Simple Hypothesis - only checks 1 value

Hypothesis Testing. H 0 : θ 1 1. H a : θ 1 1 (but > 0... required in distribution) Simple Hypothesis - only checks 1 value Hyothesis estig ME's are oit estimates of arameters/coefficiets really have a distributio Basic Cocet - develo regio i which we accet the hyothesis ad oe where we reject it H - reresets all ossible values

More information

Tomoki Toda. Augmented Human Communication Laboratory Graduate School of Information Science

Tomoki Toda. Augmented Human Communication Laboratory Graduate School of Information Science Seuetial Data Modelig d class Basics of seuetial data odelig ooki oda Augeted Hua Couicatio Laboratory Graduate School of Iforatio Sciece Basic Aroaches How to efficietly odel joit robability of high diesioal

More information

Mixtures of Gaussians and the EM Algorithm

Mixtures of Gaussians and the EM Algorithm Mixtures of Gaussias ad the EM Algorithm CSE 6363 Machie Learig Vassilis Athitsos Computer Sciece ad Egieerig Departmet Uiversity of Texas at Arligto 1 Gaussias A popular way to estimate probability desity

More information

Elementary manipulations of probabilities

Elementary manipulations of probabilities Elemetary maipulatios of probabilities Set probability of multi-valued r.v. {=Odd} = +3+5 = /6+/6+/6 = ½ X X,, X i j X i j Multi-variat distributio: Joit probability: X true true X X,, X X i j i j X X

More information

Axis Aligned Ellipsoid

Axis Aligned Ellipsoid Machie Learig for Data Sciece CS 4786) Lecture 6,7 & 8: Ellipsoidal Clusterig, Gaussia Mixture Models ad Geeral Mixture Models The text i black outlies high level ideas. The text i blue provides simple

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

To make comparisons for two populations, consider whether the samples are independent or dependent.

To make comparisons for two populations, consider whether the samples are independent or dependent. Sociology 54 Testig for differeces betwee two samle meas Cocetually, comarig meas from two differet samles is the same as what we ve doe i oe-samle tests, ecet that ow the hyotheses focus o the arameters

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Patter Recogitio Classificatio: No-Parametric Modelig Hamid R. Rabiee Jafar Muhammadi Sprig 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Ageda Parametric Modelig No-Parametric Modelig

More information

Clustering: Mixture Models

Clustering: Mixture Models Clusterig: Mixture Models Machie Learig 10-601B Seyoug Kim May of these slides are derived from Tom Mitchell, Ziv- Bar Joseph, ad Eric Xig. Thaks! Problem with K- meas Hard Assigmet of Samples ito Three

More information

The Expectation-Maximization (EM) Algorithm

The Expectation-Maximization (EM) Algorithm The Expectatio-Maximizatio (EM) Algorithm Readig Assigmets T. Mitchell, Machie Learig, McGraw-Hill, 997 (sectio 6.2, hard copy). S. Gog et al. Dyamic Visio: From Images to Face Recogitio, Imperial College

More information

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015 ECE 8527: Itroductio to Machie Learig ad Patter Recogitio Midterm # 1 Vaishali Ami Fall, 2015 tue39624@temple.edu Problem No. 1: Cosider a two-class discrete distributio problem: ω 1 :{[0,0], [2,0], [2,2],

More information

Expectation maximization

Expectation maximization Motivatio Expectatio maximizatio Subhrasu Maji CMSCI 689: Machie Learig 14 April 015 Suppose you are builig a aive Bayes spam classifier. After your are oe your boss tells you that there is o moey to label

More information

Recursive Updating Fixed Parameter

Recursive Updating Fixed Parameter Recursive Udatig Fixed Parameter So far we ve cosidered the sceario where we collect a buch of data ad the use that (ad the rior PDF) to comute the coditioal PDF from which we ca the get the MAP, or ay

More information

15-780: Graduate Artificial Intelligence. Density estimation

15-780: Graduate Artificial Intelligence. Density estimation 5-780: Graduate Artificial Itelligece Desity estimatio Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J

More information

Pattern Classification

Pattern Classification Patter Classificatio All materials i these slides were tae from Patter Classificatio (d ed) by R. O. Duda, P. E. Hart ad D. G. Stor, Joh Wiley & Sos, 000 with the permissio of the authors ad the publisher

More information

Lecture 7: Linear Classification Methods

Lecture 7: Linear Classification Methods Homeork Homeork Lecture 7: Liear lassificatio Methods Fial rojects? Grous Toics Proosal eek 5 Lecture is oster sessio, Jacobs Hall Lobb, sacks Fial reort 5 Jue. What is liear classificatio? lassificatio

More information

Vector Quantization: a Limiting Case of EM

Vector Quantization: a Limiting Case of EM . Itroductio & defiitios Assume that you are give a data set X = { x j }, j { 2,,, }, of d -dimesioal vectors. The vector quatizatio (VQ) problem requires that we fid a set of prototype vectors Z = { z

More information

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5 CS434a/54a: Patter Recogitio Prof. Olga Veksler Lecture 5 Today Itroductio to parameter estimatio Two methods for parameter estimatio Maimum Likelihood Estimatio Bayesia Estimatio Itroducto Bayesia Decisio

More information

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance Hypothesis Testig Empirically evaluatig accuracy of hypotheses: importat activity i ML. Three questios: Give observed accuracy over a sample set, how well does this estimate apply over additioal samples?

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam will cover.-.9. This sheet has three sectios. The first sectio will remid you about techiques ad formulas that you should kow. The secod gives a umber of practice questios for you

More information

Confidence Intervals

Confidence Intervals Cofidece Itervals Berli Che Deartmet of Comuter Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Referece: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chater 5 & Teachig Material Itroductio

More information

Chapter 8: Estimating with Confidence

Chapter 8: Estimating with Confidence Chapter 8: Estimatig with Cofidece Sectio 8.2 The Practice of Statistics, 4 th editio For AP* STARNES, YATES, MOORE Chapter 8 Estimatig with Cofidece 8.1 Cofidece Itervals: The Basics 8.2 8.3 Estimatig

More information

ECE534, Spring 2018: Final Exam

ECE534, Spring 2018: Final Exam ECE534, Srig 2018: Fial Exam Problem 1 Let X N (0, 1) ad Y N (0, 1) be ideedet radom variables. variables V = X + Y ad W = X 2Y. Defie the radom (a) Are V, W joitly Gaussia? Justify your aswer. (b) Comute

More information

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014

Grouping 2: Spectral and Agglomerative Clustering. CS 510 Lecture #16 April 2 nd, 2014 Groupig 2: Spectral ad Agglomerative Clusterig CS 510 Lecture #16 April 2 d, 2014 Groupig (review) Goal: Detect local image features (SIFT) Describe image patches aroud features SIFT, SURF, HoG, LBP, Group

More information

10-701/ Machine Learning Mid-term Exam Solution

10-701/ Machine Learning Mid-term Exam Solution 0-70/5-78 Machie Learig Mid-term Exam Solutio Your Name: Your Adrew ID: True or False (Give oe setece explaatio) (20%). (F) For a cotiuous radom variable x ad its probability distributio fuctio p(x), it

More information

8 : Learning Partially Observed GM: the EM algorithm

8 : Learning Partially Observed GM: the EM algorithm 10-708: Probabilistic Graphical Models, Sprig 2015 8 : Learig Partially Observed GM: the EM algorithm Lecturer: Eric P. Xig Scribes: Auric Qiao, Hao Zhag, Big Liu 1 Itroductio Two fudametal questios i

More information

Probabilistic Unsupervised Learning

Probabilistic Unsupervised Learning HT2015: SC4 Statistical Data Miig ad Machie Learig Dio Sejdiovic Departmet of Statistics Oxford http://www.stats.ox.ac.u/~sejdiov/sdmml.html Probabilistic Methods Algorithmic approach: Data Probabilistic

More information

Putnam Training Exercise Counting, Probability, Pigeonhole Principle (Answers)

Putnam Training Exercise Counting, Probability, Pigeonhole Principle (Answers) Putam Traiig Exercise Coutig, Probability, Pigeohole Pricile (Aswers) November 24th, 2015 1. Fid the umber of iteger o-egative solutios to the followig Diohatie equatio: x 1 + x 2 + x 3 + x 4 + x 5 = 17.

More information

tests 17.1 Simple versus compound

tests 17.1 Simple versus compound PAS204: Lecture 17. tests UMP ad asymtotic I this lecture, we will idetify UMP tests, wherever they exist, for comarig a simle ull hyothesis with a comoud alterative. We also look at costructig tests based

More information

Algorithms for Clustering

Algorithms for Clustering CR2: Statistical Learig & Applicatios Algorithms for Clusterig Lecturer: J. Salmo Scribe: A. Alcolei Settig: give a data set X R p where is the umber of observatio ad p is the umber of features, we wat

More information

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification INF 4300 90 Itroductio to classifictio Ae Solberg ae@ifiuioo Based o Chapter -6 i Duda ad Hart: atter Classificatio 90 INF 4300 Madator proect Mai task: classificatio You must implemet a classificatio

More information

Integer Programming (IP)

Integer Programming (IP) Iteger Programmig (IP) The geeral liear mathematical programmig problem where Mied IP Problem - MIP ma c T + h Z T y A + G y + y b R p + vector of positive iteger variables y vector of positive real variables

More information

Chapter 18: Sampling Distribution Models

Chapter 18: Sampling Distribution Models Chater 18: Samlig Distributio Models This is the last bit of theory before we get back to real-world methods. Samlig Distributios: The Big Idea Take a samle ad summarize it with a statistic. Now take aother

More information

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0.

THE SOLUTION OF NONLINEAR EQUATIONS f( x ) = 0. THE SOLUTION OF NONLINEAR EQUATIONS f( ) = 0. Noliear Equatio Solvers Bracketig. Graphical. Aalytical Ope Methods Bisectio False Positio (Regula-Falsi) Fied poit iteratio Newto Raphso Secat The root of

More information

EE 6885 Statistical Pattern Recognition

EE 6885 Statistical Pattern Recognition EE 6885 Statistical Patter Recogitio Fall 5 Prof. Shih-Fu Chag http://www.ee.columbia.edu/~sfchag Lecture 6 (9/8/5 EE6887-Chag 6- Readig EM for Missig Features Textboo, DHS 3.9 Bayesia Parameter Estimatio

More information

PUTNAM TRAINING PROBABILITY

PUTNAM TRAINING PROBABILITY PUTNAM TRAINING PROBABILITY (Last udated: December, 207) Remark. This is a list of exercises o robability. Miguel A. Lerma Exercises. Prove that the umber of subsets of {, 2,..., } with odd cardiality

More information

( ) = is larger than. the variance of X V

( ) = is larger than. the variance of X V Stat 400, sectio 6. Methods of Poit Estimatio otes by Tim Pilachoski A oit estimate of a arameter is a sigle umber that ca be regarded as a sesible value for The selected statistic is called the oit estimator

More information

Lecture 15: Learning Theory: Concentration Inequalities

Lecture 15: Learning Theory: Concentration Inequalities STAT 425: Itroductio to Noparametric Statistics Witer 208 Lecture 5: Learig Theory: Cocetratio Iequalities Istructor: Ye-Chi Che 5. Itroductio Recall that i the lecture o classificatio, we have see that

More information

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty

Uncertainty. Variables. assigns to each sentence numerical degree of belief between 0 and 1. uncertainty Bayes Classificatio Ucertaity & robability Baye's rule Choosig Hypotheses- Maximum a posteriori Maximum Likelihood - Baye's cocept learig Maximum Likelihood of real valued fuctio Bayes optimal Classifier

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

CSE 527, Additional notes on MLE & EM

CSE 527, Additional notes on MLE & EM CSE 57 Lecture Notes: MLE & EM CSE 57, Additioal otes o MLE & EM Based o earlier otes by C. Grat & M. Narasimha Itroductio Last lecture we bega a examiatio of model based clusterig. This lecture will be

More information

Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or facts:

Statistics Definition: The science of assembling, classifying, tabulating, and analyzing data or facts: 8. Statistics Statistics Defiitio: The sciece of assemblig, classifyig, tabulatig, ad aalyzig data or facts: Descritive statistics The collectig, grouig ad resetig data i a way that ca be easily uderstood

More information

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1 EECS564 Estimatio, Filterig, ad Detectio Hwk 2 Sols. Witer 25 4. Let Z be a sigle observatio havig desity fuctio where. p (z) = (2z + ), z (a) Assumig that is a oradom parameter, fid ad plot the maximum

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

Empirical Process Theory and Oracle Inequalities

Empirical Process Theory and Oracle Inequalities Stat 928: Statistical Learig Theory Lecture: 10 Empirical Process Theory ad Oracle Iequalities Istructor: Sham Kakade 1 Risk vs Risk See Lecture 0 for a discussio o termiology. 2 The Uio Boud / Boferoi

More information

Exponential Families and Bayesian Inference

Exponential Families and Bayesian Inference Computer Visio Expoetial Families ad Bayesia Iferece Lecture Expoetial Families A expoetial family of distributios is a d-parameter family f(x; havig the followig form: f(x; = h(xe g(t T (x B(, (. where

More information

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians

Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Hidde Variables the EM Algorith ad Mitures of Gaussias Couter Visio Jia-Bi Huag Virgiia Tech May slides fro D. Hoie Adiistrative stuffs Fial roject roosal due Oct 7 (Thursday) Tis for fial roject Set u

More information

Intro to Learning Theory

Intro to Learning Theory Lecture 1, October 18, 2016 Itro to Learig Theory Ruth Urer 1 Machie Learig ad Learig Theory Comig soo 2 Formal Framework 21 Basic otios I our formal model for machie learig, the istaces to be classified

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

6.867 Machine learning

6.867 Machine learning 6.867 Machie learig Mid-term exam October, ( poits) Your ame ad MIT ID: Problem We are iterested here i a particular -dimesioal liear regressio problem. The dataset correspodig to this problem has examples

More information

3.1. Introduction Assumptions.

3.1. Introduction Assumptions. Sectio 3. Proofs 3.1. Itroductio. A roof is a carefully reasoed argumet which establishes that a give statemet is true. Logic is a tool for the aalysis of roofs. Each statemet withi a roof is a assumtio,

More information

Optimally Sparse SVMs

Optimally Sparse SVMs A. Proof of Lemma 3. We here prove a lower boud o the umber of support vectors to achieve geeralizatio bouds of the form which we cosider. Importatly, this result holds ot oly for liear classifiers, but

More information

day month year documentname/initials 1

day month year documentname/initials 1 ECE47-57 Patter Recogitio Lecture 0 Noaraetric Desity Estiatio -earest-eighbor (NN) Hairog Qi, Gozalez Faily Professor Electrical Egieerig ad Couter Sciece Uiversity of Teessee, Koxville htt://www.eecs.ut.edu/faculty/qi

More information

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32

Boosting. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 1, / 32 Boostig Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machie Learig Algorithms March 1, 2017 1 / 32 Outlie 1 Admiistratio 2 Review of last lecture 3 Boostig Professor Ameet Talwalkar CS260

More information

Recurrence Relations

Recurrence Relations Recurrece Relatios Aalysis of recursive algorithms, such as: it factorial (it ) { if (==0) retur ; else retur ( * factorial(-)); } Let t be the umber of multiplicatios eeded to calculate factorial(). The

More information

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian

Chapter 12 EM algorithms The Expectation-Maximization (EM) algorithm is a maximum likelihood method for models that have hidden variables eg. Gaussian Chapter 2 EM algorithms The Expectatio-Maximizatio (EM) algorithm is a maximum likelihood method for models that have hidde variables eg. Gaussia Mixture Models (GMMs), Liear Dyamic Systems (LDSs) ad Hidde

More information

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process.

Inferential Statistics. Inference Process. Inferential Statistics and Probability a Holistic Approach. Inference Process. Iferetial Statistics ad Probability a Holistic Approach Iferece Process Chapter 8 Poit Estimatio ad Cofidece Itervals This Course Material by Maurice Geraghty is licesed uder a Creative Commos Attributio-ShareAlike

More information

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression

Outline. Linear regression. Regularization functions. Polynomial curve fitting. Stochastic gradient descent for regression. MLE for regression REGRESSION 1 Outlie Liear regressio Regularizatio fuctios Polyomial curve fittig Stochastic gradiet descet for regressio MLE for regressio Step-wise forward regressio Regressio methods Statistical techiques

More information

New Definition of Density on Knapsack Cryptosystems

New Definition of Density on Knapsack Cryptosystems Africacryt008@Casablaca 008.06.1 New Defiitio of Desity o Kasac Crytosystems Noboru Kuihiro The Uiversity of Toyo, Jaa 1/31 Kasac Scheme rough idea Public Key: asac: a={a 1, a,, a } Ecrytio: message m=m

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter & Teachig Material.

More information

Solutions to Problem Sheet 1

Solutions to Problem Sheet 1 Solutios to Problem Sheet ) Use Theorem. to rove that loglog for all real 3. This is a versio of Theorem. with the iteger N relaced by the real. Hit Give 3 let N = [], the largest iteger. The, imortatly,

More information

Approximations and more PMFs and PDFs

Approximations and more PMFs and PDFs Approximatios ad more PMFs ad PDFs Saad Meimeh 1 Approximatio of biomial with Poisso Cosider the biomial distributio ( b(k,,p = p k (1 p k, k λ: k Assume that is large, ad p is small, but p λ at the limit.

More information

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model

The Bayesian Learning Framework. Back to Maximum Likelihood. Naïve Bayes. Simple Example: Coin Tosses. Given a generative model Back to Maximum Likelihood Give a geerative model f (x, y = k) =π k f k (x) Usig a geerative modellig approach, we assume a parametric form for f k (x) =f (x; k ) ad compute the MLE θ of θ =(π k, k ) k=

More information

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss ECE 90 Lecture : Complexity Regularizatio ad the Squared Loss R. Nowak 5/7/009 I the previous lectures we made use of the Cheroff/Hoeffdig bouds for our aalysis of classifier errors. Hoeffdig s iequality

More information

Hybridized Heredity In Support Vector Machine

Hybridized Heredity In Support Vector Machine Hybridized Heredity I Suort Vector Machie May 2015 Hybridized Heredity I Suort Vector Machie Timothy Idowu Yougmi Park Uiversity of Wiscosi-Madiso idowu@stat.wisc.edu yougmi@stat.wisc.edu May 2015 Abstract

More information

18. Two-sample problems for population means (σ unknown)

18. Two-sample problems for population means (σ unknown) 8. Two-samle roblems for oulatio meas (σ ukow) The Practice of Statistics i the Life Scieces Third Editio 04 W.H. Freema ad Comay Objectives (PSLS Chater 8) Comarig two meas (σ ukow) Two-samle situatios

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

NUMERICAL METHODS FOR SOLVING EQUATIONS

NUMERICAL METHODS FOR SOLVING EQUATIONS Mathematics Revisio Guides Numerical Methods for Solvig Equatios Page 1 of 11 M.K. HOME TUITION Mathematics Revisio Guides Level: GCSE Higher Tier NUMERICAL METHODS FOR SOLVING EQUATIONS Versio:. Date:

More information

Support vector machine revisited

Support vector machine revisited 6.867 Machie learig, lecture 8 (Jaakkola) 1 Lecture topics: Support vector machie ad kerels Kerel optimizatio, selectio Support vector machie revisited Our task here is to first tur the support vector

More information

COMPUTING FOURIER SERIES

COMPUTING FOURIER SERIES COMPUTING FOURIER SERIES Overview We have see i revious otes how we ca use the fact that si ad cos rereset comlete orthogoal fuctios over the iterval [-,] to allow us to determie the coefficiets of a Fourier

More information

Introductory statistics

Introductory statistics CM9S: Machie Learig for Bioiformatics Lecture - 03/3/06 Itroductory statistics Lecturer: Sriram Sakararama Scribe: Sriram Sakararama We will provide a overview of statistical iferece focussig o the key

More information

Latent Dirichlet Allocation (LDA)

Latent Dirichlet Allocation (LDA) Latet Dirichlet Allocatio LDA Brief Revie LDA Dirichlet Distributio The Model Theoretical isights Alicatios Parameter Estimatio Extesios to LDA Summary 1 We Model the text corora to : similarity/relevace

More information

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis

Factor Analysis. Lecture 10: Factor Analysis and Principal Component Analysis. Sam Roweis Lecture 10: Factor Aalysis ad Pricipal Compoet Aalysis Sam Roweis February 9, 2004 Whe we assume that the subspace is liear ad that the uderlyig latet variable has a Gaussia distributio we get a model

More information

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018)

Randomized Algorithms I, Spring 2018, Department of Computer Science, University of Helsinki Homework 1: Solutions (Discussed January 25, 2018) Radomized Algorithms I, Sprig 08, Departmet of Computer Sciece, Uiversity of Helsiki Homework : Solutios Discussed Jauary 5, 08). Exercise.: Cosider the followig balls-ad-bi game. We start with oe black

More information

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam

Introduction to Artificial Intelligence CAP 4601 Summer 2013 Midterm Exam Itroductio to Artificial Itelligece CAP 601 Summer 013 Midterm Exam 1. Termiology (7 Poits). Give the followig task eviromets, eter their properties/characteristics. The properties/characteristics of the

More information

Quick Review of Probability

Quick Review of Probability Quick Review of Probability Berli Che Departmet of Computer Sciece & Iformatio Egieerig Natioal Taiwa Normal Uiversity Refereces: 1. W. Navidi. Statistics for Egieerig ad Scietists. Chapter 2 & Teachig

More information

CS284A: Representations and Algorithms in Molecular Biology

CS284A: Representations and Algorithms in Molecular Biology CS284A: Represetatios ad Algorithms i Molecular Biology Scribe Notes o Lectures 3 & 4: Motif Discovery via Eumeratio & Motif Represetatio Usig Positio Weight Matrix Joshua Gervi Based o presetatios by

More information

Introduction to Machine Learning DIS10

Introduction to Machine Learning DIS10 CS 189 Fall 017 Itroductio to Machie Learig DIS10 1 Fu with Lagrage Multipliers (a) Miimize the fuctio such that f (x,y) = x + y x + y = 3. Solutio: The Lagragia is: L(x,y,λ) = x + y + λ(x + y 3) Takig

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

Estimation for Complete Data

Estimation for Complete Data Estimatio for Complete Data complete data: there is o loss of iformatio durig study. complete idividual complete data= grouped data A complete idividual data is the oe i which the complete iformatio of

More information

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 3 Suggested Solution

The Hong Kong University of Science & Technology ISOM551 Introductory Statistics for Business Assignment 3 Suggested Solution The Hog Kog Uiversity of ciece & Techology IOM55 Itroductory tatistics for Busiess Assigmet 3 uggested olutio Note All values of statistics i Q ad Q4 are obtaied by Excel. Qa. Let be the robability that

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3

Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 No-Parametric Techiques Jacob Hays Amit Pillay James DeFelice 4.1, 4.2, 4.3 Parametric vs. No-Parametric Parametric Based o Fuctios (e.g Normal Distributio) Uimodal Oly oe peak Ulikely real data cofies

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam. Probability ad Statistics FS 07 Secod Sessio Exam 09.0.08 Time Limit: 80 Miutes Name: Studet ID: This exam cotais 9 pages (icludig this cover page) ad 0 questios. A Formulae sheet is provided with the

More information

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting Lecture 6 Chi Square Distributio (χ ) ad Least Squares Fittig Chi Square Distributio (χ ) Suppose: We have a set of measuremets {x 1, x, x }. We kow the true value of each x i (x t1, x t, x t ). We would

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Expectation and Variance of a random variable

Expectation and Variance of a random variable Chapter 11 Expectatio ad Variace of a radom variable The aim of this lecture is to defie ad itroduce mathematical Expectatio ad variace of a fuctio of discrete & cotiuous radom variables ad the distributio

More information

CSIE/GINM, NTU 2009/11/30 1

CSIE/GINM, NTU 2009/11/30 1 Itroductio ti to Machie Learig (Part (at1: Statistical Machie Learig Shou de Li CSIE/GINM, NTU sdli@csie.tu.edu.tw 009/11/30 1 Syllabus of a Itro ML course ( Machie Learig, Adrew Ng, Staford, Autum 009

More information

Math 113 Exam 3 Practice

Math 113 Exam 3 Practice Math Exam Practice Exam 4 will cover.-., 0. ad 0.. Note that eve though. was tested i exam, questios from that sectios may also be o this exam. For practice problems o., refer to the last review. This

More information

Estimating Proportions

Estimating Proportions 3/1/018 Outlie for Today Remiders about Missig Values Iterretig Cofidece Itervals Cofidece About Proortios Proortios as Iterval Variables Cofidece Itervals Cofidece Coefficiets Examles Lab Exercise ( arts

More information

Distribution of Sample Proportions

Distribution of Sample Proportions Distributio of Samle Proortios Probability ad statistics Aswers & Teacher Notes TI-Nsire Ivestigatio Studet 90 mi 7 8 9 10 11 12 Itroductio From revious activity: This activity assumes kowledge of the

More information

Solution of Final Exam : / Machine Learning

Solution of Final Exam : / Machine Learning Solutio of Fial Exam : 10-701/15-781 Machie Learig Fall 2004 Dec. 12th 2004 Your Adrew ID i capital letters: Your full ame: There are 9 questios. Some of them are easy ad some are more difficult. So, if

More information