CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements
|
|
- Stewart Morris
- 6 years ago
- Views:
Transcription
1 CS 750 Machne Learnng Lecture 5 Densty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square CS 750 Machne Learnng Announcements Homework Due on Wednesday before the class Reports: hand n before the class Programs: submt electroncally Collaboratons on homeworks: You may dscuss materal wth your fellow students, but the report and programs should be wrtten ndvduall CS 750 Machne Learnng
2 Outlne Outlne: Densty estmaton: Maxmum lkelhood ML Maxmum a posteror MAP Bayesan Bernoull dstrbuton. Bnomal dstrbuton Multnomal dstrbuton. ormal dstrbuton. CS 750 Machne Learnng Densty estmaton Data: D { D, D,.., Dn} D x a vector of attrbute values Attrbutes: modeled by random varables X { X, X, K, X d} wth: Contnuous values Dscrete values E.g. blood pressure wth numercal values or chest pan wth dscrete values [no-pan, mld, moderate, strong] Underlyng true probablty dstrbuton: px CS 750 Machne Learnng
3 Data: Densty estmaton D { D, D,.., Dn} D x a vector of attrbute values Objectve: try to estmate the underlyng true probablty dstrbuton over varables X, px, usng examples n D true dstrbuton n samples p X D D, D,.., D } { n estmate pˆ X Standard d assumptons: Samples are ndependent of each other come from the same dentcal dstrbuton fxed px CS 750 Machne Learnng Densty estmaton Types of densty estmaton: Parametrc the dstrbuton s modeled usng a set of parameters Θ p X Θ Example: mean and covarances of multvarate normal Estmaton: fnd parameters Θ descrbng data D on-parametrc The model of the dstrbuton utlzes all examples n D As f all examples were parameters of the dstrbuton Examples: earest-neghbor Sem-parametrc CS 750 Machne Learnng
4 Learnng va parameter estmaton In ths lecture we consder parametrc densty estmaton Basc settngs: A set of random varables X { X, X, K, X d} A model of the dstrbuton over varables n X wth parameters Θ : pˆ X Θ Data D { n D, D,.., D } Objectve: fnd parameters Θˆ that descrbe p X Θ the best CS 750 Machne Learnng Parameter estmaton. Maxmum lkelhood ML maxmze p D Θ, ξ yelds: one set of parameters Θ ML the target dstrbuton s approxmated as: pˆ X p X Θ ML Bayesan parameter estmaton uses the posteror dstrbuton over possble parameters p D Θ, ξ p Θ ξ p Θ D, ξ p D ξ Yelds: all possble settngs of Θ and ther weghts The target dstrbuton s approxmated as: p ˆ X p X D p X Θ p Θ D, ξ dθ Θ CS 750 Machne Learnng
5 Parameter estmaton. Other possble crtera: Maxmum a posteror probablty MAP maxmze p Θ D, ξ mode of the posteror Yelds: one set of parameters Θ MAP Approxmaton: pˆ X p X Θ MAP Expected value of the parameter Θˆ E Θ mean of the posteror Expectaton taken wth regard to posteror p Θ D, ξ Yelds: one set of parameters Approxmaton: p ˆ X p X Θˆ CS 750 Machne Learnng Parameter estmaton. Con example. Con example: we have a con that can be based Outcomes: two possble values -- head or tal Data: D a sequence of outcomes x such that head x tal 0 x Model: probablty of a head probablty of a tal Objectve: We would lke to estmate the probablty of a head from data ˆ CS 750 Machne Learnng
6 Parameter estmaton. Example. Assume the unknown and possbly based con Probablty of the head s Data: H H T T H H T H T H T T T H T H H H H T H H H H T Heads: 5 Tals: 0 What would be your estmate of the probablty of a head? ~? CS 750 Machne Learnng Parameter estmaton. Example Assume the unknown and possbly based con Probablty of the head s Data: H H T T H H T H T H T T T H T H H H H T H H H H T Heads: 5 Tals: 0 What would be your choce of the probablty of a head? Soluton: use frequences of occurrences to do the estmate ~ Ths s the maxmum lkelhood estmate of the parameter CS 750 Machne Learnng
7 Probablty of an outcome Data: D a sequence of outcomes such that head x tal x 0 Model: probablty of a head probablty of a tal Assume: we know the probablty Probablty of an outcome of a con flp x x P x Bernoull dstrbuton Combnes the probablty of a head and a tal So that x s gong to pck ts correct probablty Gves for x Gves for 0 x x CS 750 Machne Learnng x Probablty of a sequence of outcomes. Data: D a sequence of outcomes such that head x tal x 0 Model: probablty of a head probablty of a tal Assume: a sequence of ndependent con flps D H H T H T H encoded as D 00 What s the probablty of observng the data sequence D: P D? x CS 750 Machne Learnng
8 Probablty of a sequence of outcomes. Data: D a sequence of outcomes such that head x tal x 0 Model: probablty of a head probablty of a tal Assume: a sequence of con flps D H H T H T H encoded as D 00 What s the probablty of observng a data sequence D: P D x CS 750 Machne Learnng Probablty of a sequence of outcomes. Data: D a sequence of outcomes such that head x tal x 0 Model: probablty of a head probablty of a tal Assume: a sequence of con flps D H H T H T H encoded as D 00 What s the probablty of observng a data sequence D: P D lkelhood of the data x CS 750 Machne Learnng
9 Probablty of a sequence of outcomes. Data: D a sequence of outcomes such that head x tal x 0 Model: probablty of a head probablty of a tal Assume: a sequence of con flps D H H T H T H encoded as D 00 What s the probablty of observng a data sequence D: P D 6 x P D Can be rewrtten usng the Bernoull dstrbuton: x x CS 750 Machne Learnng The goodness of ft to the data. Learnng: we do not know the value of the parameter Our learnng goal: Fnd the parameter that fts the data D the best? One soluton to the best : Maxmze the lkelhood n x P D x Intuton: more lkely are the data gven the model, the better s the ft ote: Instead of an error functon that measures how bad the data ft the model we have a measure that tells us how well the data ft : Error D, P D CS 750 Machne Learnng
10 Example: Bernoull dstrbuton. Con example: we have a con that can be based Outcomes: two possble values -- head or tal Data: D a sequence of outcomes x such that head x tal x 0 Model: probablty of a head probablty of a tal Objectve: We would lke to estmate the probablty of a head ˆ Probablty of an outcome P x x x x Bernoull dstrbuton CS 750 Machne Learnng Maxmum lkelhood ML estmate. Lkelhood of data: n x P D, ξ Maxmum lkelhood estmate ML arg max P D, ξ - number of heads seen - number of tals seen CS 750 Machne Learnng x Optmze log-lkelhood the same as maxmzng lkelhood n x x l D, log P D, ξ log n x log x log log n x log n x
11 Maxmum lkelhood ML estmate. Optmze log-lkelhood l D, log log Set dervatve to zero Solvng l D, 0 ML Soluton: ML CS 750 Machne Learnng Maxmum lkelhood estmate. Example Assume the unknown and possbly based con Probablty of the head s Data: H H T T H H T H T H T T T H T H H H H T H H H H T Heads: 5 Tals: 0 What s the ML estmate of the probablty of a head and a tal? CS 750 Machne Learnng
12 Maxmum lkelhood estmate. Example Assume the unknown and possbly based con Probablty of the head s Data: H H T T H H T H T H T T T H T H H H H T H H H H T Heads: 5 Tals: 0 What s the ML estmate of the probablty of head and tal? Head: Tal: ML ML CS 750 Machne Learnng Maxmum a posteror estmate Maxmum a posteror estmate Selects the mode of the posteror dstrbuton MAP arg max p D, ξ How to choose the pror probablty? Lkelhood of data pror P D, ξ p ξ p D, ξ va Bayes rule P D ξ P D, ξ p ξ n x x - s the pror probablty on CS 750 Machne Learnng ormalzng factor
13 Pror dstrbuton Choce of pror: Beta dstrbuton p ξ Beta, P D, ξ Posteror dstrbuton s agan a Beta dstrbuton P D, ξ Beta, p D, ξ Beta, P D ξ CS 750 Machne Learnng x - A Gamma functon For nteger values of x x x! Why to use Beta dstrbuton? Beta dstrbuton fts Bernoull trals - conjugate choces Beta dstrbuton , β0.5.5, β.5.5, β CS 750 Machne Learnng
14 Maxmum a posteror probablty Maxmum a posteror estmate Selects the mode of the posteror dstrbuton P D, ξ Beta, p D, ξ Beta, P D ξ otce that parameters of the pror act lke counts of heads and tals sometmes they are also referred to as pror counts MAP Soluton: MAP CS 750 Machne Learnng MAP estmate example Assume the unknown and possbly based con Probablty of the head s Data: H H T T H H T H T H T T T H T H H H H T H H H H T Heads: 5 Tals: 0 Assume p ξ Beta 5,5 What s the MAP estmate? CS 750 Machne Learnng
15 MAP estmate example Assume the unknown and possbly based con Probablty of the head s Data: H H T T H H T H T H T T T H T H H H H T H H H H T Heads: 5 Tals: 0 Assume p ξ Beta 5,5 What s the MAP estmate? MAP 9 33 CS 750 Machne Learnng MAP estmate example ote that the pror and data ft data lkelhood are combned The MAP can be based wth large pror counts It s hard to overturn t wth a smaller sample sze Data: H H T T H H T H T H T T T H T H H H H T H H H H T Heads: 5 Tals: 0 Assume p ξ Beta 5,5 p ξ Beta 5,0 MAP MAP CS 750 Machne Learnng
16 Bayesan framework Both ML or MAP estmates pck one value of the parameter Assume: there are two dfferent parameter settngs that are close n terms of ther probablty values. Usng only one of them may ntroduce a strong bas, f we use them, for example, for predctons. Bayesan parameter estmate Remedes the lmtaton of one choce Uses all possble parameter values Where p D, ξ Beta, The posteror can be used to defne pˆ X : p ˆ X p X D p X Θ p Θ D, ξ dθ Θ CS 750 Machne Learnng Bayesan framework Predctve probablty of an outcome x n the next tral P x D, ξ P x D, ξ P x, ξ p D, ξ d 0 p D, ξ d E 0 Posteror densty Equvalent to the expected value of the parameter expectaton s taken wth regard to the posteror dstrbuton p D, ξ Beta, CS 750 Machne Learnng
17 CS 750 Machne Learnng Expected value of the parameter How to obtan the expected value? d d Beta E 0 0, d 0 Beta d, 0 ote: for nteger values of CS 750 Machne Learnng Expected value of the parameter Substtutng the results for the posteror: We get ote that the mean of the posteror s yet another reasonable parameter choce: E,, Beta D p ξ ˆ E
Machine learning: Density estimation
CS 70 Foundatons of AI Lecture 3 Machne learnng: ensty estmaton Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square ata: ensty estmaton {.. n} x a vector of attrbute values Objectve: estmate the model of
More informationMLE and Bayesian Estimation. Jie Tang Department of Computer Science & Technology Tsinghua University 2012
MLE and Bayesan Estmaton Je Tang Department of Computer Scence & Technology Tsnghua Unversty 01 1 Lnear Regresson? As the frst step, we need to decde how we re gong to represent the functon f. One example:
More informationxp(x µ) = 0 p(x = 0 µ) + 1 p(x = 1 µ) = µ
CSE 455/555 Sprng 2013 Homework 7: Parametrc Technques Jason J. Corso Computer Scence and Engneerng SUY at Buffalo jcorso@buffalo.edu Solutons by Yngbo Zhou Ths assgnment does not need to be submtted and
More informationCS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration
CS 37 Avace Topcs AI Lecture 7 esty estmato Mlos Hauskrecht mlos@cs.ptt.eu 539 Seott Square CS 37 robablstc graphcal moels Amstrato Mterm: A take-home exam week ue o Weesay ovember 5 before the class epes
More informationGenerative classification models
CS 675 Intro to Machne Learnng Lecture Generatve classfcaton models Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Data: D { d, d,.., dn} d, Classfcaton represents a dscrete class value Goal: learn
More informationEM and Structure Learning
EM and Structure Learnng Le Song Machne Learnng II: Advanced Topcs CSE 8803ML, Sprng 2012 Partally observed graphcal models Mxture Models N(μ 1, Σ 1 ) Z X N N(μ 2, Σ 2 ) 2 Gaussan mxture model Consder
More informationLogistic Regression. CAP 5610: Machine Learning Instructor: Guo-Jun QI
Logstc Regresson CAP 561: achne Learnng Instructor: Guo-Jun QI Bayes Classfer: A Generatve model odel the posteror dstrbuton P(Y X) Estmate class-condtonal dstrbuton P(X Y) for each Y Estmate pror dstrbuton
More information8/25/17. Data Modeling. Data Modeling. Data Modeling. Patrice Koehl Department of Biological Sciences National University of Singapore
8/5/17 Data Modelng Patrce Koehl Department of Bologcal Scences atonal Unversty of Sngapore http://www.cs.ucdavs.edu/~koehl/teachng/bl59 koehl@cs.ucdavs.edu Data Modelng Ø Data Modelng: least squares Ø
More informationCIS526: Machine Learning Lecture 3 (Sept 16, 2003) Linear Regression. Preparation help: Xiaoying Huang. x 1 θ 1 output... θ M x M
CIS56: achne Learnng Lecture 3 (Sept 6, 003) Preparaton help: Xaoyng Huang Lnear Regresson Lnear regresson can be represented by a functonal form: f(; θ) = θ 0 0 +θ + + θ = θ = 0 ote: 0 s a dummy attrbute
More informationCourse 395: Machine Learning - Lectures
Course 395: Machne Learnng - Lectures Lecture 1-2: Concept Learnng (M. Pantc Lecture 3-4: Decson Trees & CC Intro (M. Pantc Lecture 5-6: Artfcal Neural Networks (S.Zaferou Lecture 7-8: Instance ased Learnng
More informationConjugacy and the Exponential Family
CS281B/Stat241B: Advanced Topcs n Learnng & Decson Makng Conjugacy and the Exponental Famly Lecturer: Mchael I. Jordan Scrbes: Bran Mlch 1 Conjugacy In the prevous lecture, we saw conjugate prors for the
More informationMaximum Likelihood Estimation (MLE)
Maxmum Lkelhood Estmaton (MLE) Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 01 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Maxmum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models
More informationAn Experiment/Some Intuition (Fall 2006): Lecture 18 The EM Algorithm heads coin 1 tails coin 2 Overview Maximum Likelihood Estimation
An Experment/Some Intuton I have three cons n my pocket, 6.864 (Fall 2006): Lecture 18 The EM Algorthm Con 0 has probablty λ of heads; Con 1 has probablty p 1 of heads; Con 2 has probablty p 2 of heads
More informationClassification learning II
Lecture 8 Classfcaton learnng II Mlos Hauskrecht mlos@cs.ptt.edu 539 Sennott Square Logstc regresson model Defnes a lnear decson boundar Dscrmnant functons: g g g g here g z / e z f, g g - s a logstc functon
More informationMATH 829: Introduction to Data Mining and Analysis The EM algorithm (part 2)
1/16 MATH 829: Introducton to Data Mnng and Analyss The EM algorthm (part 2) Domnque Gullot Departments of Mathematcal Scences Unversty of Delaware Aprl 20, 2016 Recall 2/16 We are gven ndependent observatons
More informationExpectation Maximization Mixture Models HMMs
-755 Machne Learnng for Sgnal Processng Mture Models HMMs Class 9. 2 Sep 200 Learnng Dstrbutons for Data Problem: Gven a collecton of eamples from some data, estmate ts dstrbuton Basc deas of Mamum Lelhood
More informationSemi-Supervised Learning
Sem-Supervsed Learnng Consder the problem of Prepostonal Phrase Attachment. Buy car wth money ; buy car wth wheel There are several ways to generate features. Gven the lmted representaton, we can assume
More informationParametric fractional imputation for missing data analysis. Jae Kwang Kim Survey Working Group Seminar March 29, 2010
Parametrc fractonal mputaton for mssng data analyss Jae Kwang Km Survey Workng Group Semnar March 29, 2010 1 Outlne Introducton Proposed method Fractonal mputaton Approxmaton Varance estmaton Multple mputaton
More informationProbabilistic Classification: Bayes Classifiers. Lecture 6:
Probablstc Classfcaton: Bayes Classfers Lecture : Classfcaton Models Sam Rowes January, Generatve model: p(x, y) = p(y)p(x y). p(y) are called class prors. p(x y) are called class condtonal feature dstrbutons.
More informationLearning from Data 1 Naive Bayes
Learnng from Data 1 Nave Bayes Davd Barber dbarber@anc.ed.ac.uk course page : http://anc.ed.ac.uk/ dbarber/lfd1/lfd1.html c Davd Barber 2001, 2002 1 Learnng from Data 1 : c Davd Barber 2001,2002 2 1 Why
More informationLecture Notes on Linear Regression
Lecture Notes on Lnear Regresson Feng L fl@sdueducn Shandong Unversty, Chna Lnear Regresson Problem In regresson problem, we am at predct a contnuous target value gven an nput feature vector We assume
More informationOutline. Multivariate Parametric Methods. Multivariate Data. Basic Multivariate Statistics. Steven J Zeil
Outlne Multvarate Parametrc Methods Steven J Zel Old Domnon Unv. Fall 2010 1 Multvarate Data 2 Multvarate ormal Dstrbuton 3 Multvarate Classfcaton Dscrmnants Tunng Complexty Dscrete Features 4 Multvarate
More informationClassification as a Regression Problem
Target varable y C C, C,, ; Classfcaton as a Regresson Problem { }, 3 L C K To treat classfcaton as a regresson problem we should transform the target y nto numercal values; The choce of numercal class
More informationMaximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models
ECO 452 -- OE 4: Probt and Logt Models ECO 452 -- OE 4 Mamum Lkelhood Estmaton of Bnary Dependent Varables Models: Probt and Logt hs note demonstrates how to formulate bnary dependent varables models for
More informationINTRODUCTION TO MACHINE LEARNING 3RD EDITION
ETHEM ALPAYDIN The MIT Press, 2014 Lecture Sldes for INTRODUCTION TO MACHINE LEARNING 3RD EDITION alpaydn@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/2ml3e CHAPTER 3: BAYESIAN DECISION THEORY Probablty
More informationLimited Dependent Variables
Lmted Dependent Varables. What f the left-hand sde varable s not a contnuous thng spread from mnus nfnty to plus nfnty? That s, gven a model = f (, β, ε, where a. s bounded below at zero, such as wages
More informationSpace of ML Problems. CSE 473: Artificial Intelligence. Parameter Estimation and Bayesian Networks. Learning Topics
/7/7 CSE 73: Artfcal Intellgence Bayesan - Learnng Deter Fox Sldes adapted from Dan Weld, Jack Breese, Dan Klen, Daphne Koller, Stuart Russell, Andrew Moore & Luke Zettlemoyer What s Beng Learned? Space
More informationStat260: Bayesian Modeling and Inference Lecture Date: February 22, Reference Priors
Stat60: Bayesan Modelng and Inference Lecture Date: February, 00 Reference Prors Lecturer: Mchael I. Jordan Scrbe: Steven Troxler and Wayne Lee In ths lecture, we assume that θ R; n hgher-dmensons, reference
More informationModule 3 LOSSY IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur
Module 3 LOSSY IMAGE COMPRESSION SYSTEMS Verson ECE IIT, Kharagpur Lesson 6 Theory of Quantzaton Verson ECE IIT, Kharagpur Instructonal Objectves At the end of ths lesson, the students should be able to:
More information2E Pattern Recognition Solutions to Introduction to Pattern Recognition, Chapter 2: Bayesian pattern classification
E395 - Pattern Recognton Solutons to Introducton to Pattern Recognton, Chapter : Bayesan pattern classfcaton Preface Ths document s a soluton manual for selected exercses from Introducton to Pattern Recognton
More informationEngineering Risk Benefit Analysis
Engneerng Rsk Beneft Analyss.55, 2.943, 3.577, 6.938, 0.86, 3.62, 6.862, 22.82, ESD.72, ESD.72 RPRA 2. Elements of Probablty Theory George E. Apostolaks Massachusetts Insttute of Technology Sprng 2007
More informationPredictive Analytics : QM901.1x Prof U Dinesh Kumar, IIMB. All Rights Reserved, Indian Institute of Management Bangalore
Sesson Outlne Introducton to classfcaton problems and dscrete choce models. Introducton to Logstcs Regresson. Logstc functon and Logt functon. Maxmum Lkelhood Estmator (MLE) for estmaton of LR parameters.
More informationThe conjugate prior to a Bernoulli is. A) Bernoulli B) Gaussian C) Beta D) none of the above
The conjugate pror to a Bernoull s A) Bernoull B) Gaussan C) Beta D) none of the above The conjugate pror to a Gaussan s A) Bernoull B) Gaussan C) Beta D) none of the above MAP estmates A) argmax θ p(θ
More informationHidden Markov Models & The Multivariate Gaussian (10/26/04)
CS281A/Stat241A: Statstcal Learnng Theory Hdden Markov Models & The Multvarate Gaussan (10/26/04) Lecturer: Mchael I. Jordan Scrbes: Jonathan W. Hu 1 Hdden Markov Models As a bref revew, hdden Markov models
More informationHomework Assignment 3 Due in class, Thursday October 15
Homework Assgnment 3 Due n class, Thursday October 15 SDS 383C Statstcal Modelng I 1 Rdge regresson and Lasso 1. Get the Prostrate cancer data from http://statweb.stanford.edu/~tbs/elemstatlearn/ datasets/prostate.data.
More informationENG 8801/ Special Topics in Computer Engineering: Pattern Recognition. Memorial University of Newfoundland Pattern Recognition
EG 880/988 - Specal opcs n Computer Engneerng: Pattern Recognton Memoral Unversty of ewfoundland Pattern Recognton Lecture 7 May 3, 006 http://wwwengrmunca/~charlesr Offce Hours: uesdays hursdays 8:30-9:30
More informationLinear Approximation with Regularization and Moving Least Squares
Lnear Approxmaton wth Regularzaton and Movng Least Squares Igor Grešovn May 007 Revson 4.6 (Revson : March 004). 5 4 3 0.5 3 3.5 4 Contents: Lnear Fttng...4. Weghted Least Squares n Functon Approxmaton...
More informationMaximum Likelihood Estimation
Maxmum Lkelhood Estmaton INFO-2301: Quanttatve Reasonng 2 Mchael Paul and Jordan Boyd-Graber MARCH 7, 2017 INFO-2301: Quanttatve Reasonng 2 Paul and Boyd-Graber Maxmum Lkelhood Estmaton 1 of 9 Why MLE?
More informationProbability Theory (revisited)
Probablty Theory (revsted) Summary Probablty v.s. plausblty Random varables Smulaton of Random Experments Challenge The alarm of a shop rang. Soon afterwards, a man was seen runnng n the street, persecuted
More informationFinite Mixture Models and Expectation Maximization. Most slides are from: Dr. Mario Figueiredo, Dr. Anil Jain and Dr. Rong Jin
Fnte Mxture Models and Expectaton Maxmzaton Most sldes are from: Dr. Maro Fgueredo, Dr. Anl Jan and Dr. Rong Jn Recall: The Supervsed Learnng Problem Gven a set of n samples X {(x, y )},,,n Chapter 3 of
More informationRetrieval Models: Language models
CS-590I Informaton Retreval Retreval Models: Language models Luo S Department of Computer Scence Purdue Unversty Introducton to language model Ungram language model Document language model estmaton Maxmum
More informationOutline. Bayesian Networks: Maximum Likelihood Estimation and Tree Structure Learning. Our Model and Data. Outline
Outlne Bayesan Networks: Maxmum Lkelhood Estmaton and Tree Structure Learnng Huzhen Yu janey.yu@cs.helsnk.f Dept. Computer Scence, Unv. of Helsnk Probablstc Models, Sprng, 200 Notces: I corrected a number
More information1/10/18. Definitions. Probabilistic models. Why probabilistic models. Example: a fair 6-sided dice. Probability
/0/8 I529: Machne Learnng n Bonformatcs Defntons Probablstc models Probablstc models A model means a system that smulates the object under consderaton A probablstc model s one that produces dfferent outcomes
More informationThe EM Algorithm (Dempster, Laird, Rubin 1977) The missing data or incomplete data setting: ODL(φ;Y ) = [Y;φ] = [Y X,φ][X φ] = X
The EM Algorthm (Dempster, Lard, Rubn 1977 The mssng data or ncomplete data settng: An Observed Data Lkelhood (ODL that s a mxture or ntegral of Complete Data Lkelhoods (CDL. (1a ODL(;Y = [Y;] = [Y,][
More informationBasic Statistical Analysis and Yield Calculations
October 17, 007 Basc Statstcal Analyss and Yeld Calculatons Dr. José Ernesto Rayas Sánchez 1 Outlne Sources of desgn-performance uncertanty Desgn and development processes Desgn for manufacturablty A general
More informationThe Expectation-Maximization Algorithm
The Expectaton-Maxmaton Algorthm Charles Elan elan@cs.ucsd.edu November 16, 2007 Ths chapter explans the EM algorthm at multple levels of generalty. Secton 1 gves the standard hgh-level verson of the algorthm.
More informationLogistic regression models 1/12
Logstc regresson models 1/12 2/12 Example 1: dogs look lke ther owners? Some people beleve that dogs look lke ther owners. Is ths true? To test the above hypothess, The New York Tmes conducted a quz onlne.
More informationDr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur
Analyss of Varance and Desgn of Exerments-I MODULE III LECTURE - 2 EXPERIMENTAL DESIGN MODELS Dr. Shalabh Deartment of Mathematcs and Statstcs Indan Insttute of Technology Kanur 2 We consder the models
More informationEvaluation for sets of classes
Evaluaton for Tet Categorzaton Classfcaton accuracy: usual n ML, the proporton of correct decsons, Not approprate f the populaton rate of the class s low Precson, Recall and F 1 Better measures 21 Evaluaton
More informationAdvanced Statistical Methods: Beyond Linear Regression
Advanced Statstcal Methods: Beyond Lnear Regresson John R. Stevens Utah State Unversty Notes 2. Statstcal Methods I Mathematcs Educators Workshop 28 March 2009 1 http://www.stat.usu.edu/~rstevens/pcm 2
More informationMotion Perception Under Uncertainty. Hongjing Lu Department of Psychology University of Hong Kong
Moton Percepton Under Uncertanty Hongjng Lu Department of Psychology Unversty of Hong Kong Outlne Uncertanty n moton stmulus Correspondence problem Qualtatve fttng usng deal observer models Based on sgnal
More informationLecture 10 Support Vector Machines. Oct
Lecture 10 Support Vector Machnes Oct - 20-2008 Lnear Separators Whch of the lnear separators s optmal? Concept of Margn Recall that n Perceptron, we learned that the convergence rate of the Perceptron
More informationLecture 3: Probability Distributions
Lecture 3: Probablty Dstrbutons Random Varables Let us begn by defnng a sample space as a set of outcomes from an experment. We denote ths by S. A random varable s a functon whch maps outcomes nto the
More informationBinomial Distribution: Tossing a coin m times. p = probability of having head from a trial. y = # of having heads from n trials (y = 0, 1,..., m).
[7] Count Data Models () Some Dscrete Probablty Densty Functons Bnomal Dstrbuton: ossng a con m tmes p probablty of havng head from a tral y # of havng heads from n trals (y 0,,, m) m m! fb( y n) p ( p)
More informationSmall Area Interval Estimation
.. Small Area Interval Estmaton Partha Lahr Jont Program n Survey Methodology Unversty of Maryland, College Park (Based on jont work wth Masayo Yoshmor, Former JPSM Vstng PhD Student and Research Fellow
More informationThe big picture. Outline
The bg pcture Vncent Claveau IRISA - CNRS, sldes from E. Kjak INSA Rennes Notatons classes: C = {ω = 1,.., C} tranng set S of sze m, composed of m ponts (x, ω ) per class ω representaton space: R d (=
More information= z 20 z n. (k 20) + 4 z k = 4
Problem Set #7 solutons 7.2.. (a Fnd the coeffcent of z k n (z + z 5 + z 6 + z 7 + 5, k 20. We use the known seres expanson ( n+l ( z l l z n below: (z + z 5 + z 6 + z 7 + 5 (z 5 ( + z + z 2 + z + 5 5
More informationHidden Markov Models
Hdden Markov Models Namrata Vaswan, Iowa State Unversty Aprl 24, 204 Hdden Markov Model Defntons and Examples Defntons:. A hdden Markov model (HMM) refers to a set of hdden states X 0, X,..., X t,...,
More informationQuantifying Uncertainty
Partcle Flters Quantfyng Uncertanty Sa Ravela M. I. T Last Updated: Sprng 2013 1 Quantfyng Uncertanty Partcle Flters Partcle Flters Appled to Sequental flterng problems Can also be appled to smoothng problems
More informationLecture 10 Support Vector Machines II
Lecture 10 Support Vector Machnes II 22 February 2016 Taylor B. Arnold Yale Statstcs STAT 365/665 1/28 Notes: Problem 3 s posted and due ths upcomng Frday There was an early bug n the fake-test data; fxed
More informationKernel Methods and SVMs Extension
Kernel Methods and SVMs Extenson The purpose of ths document s to revew materal covered n Machne Learnng 1 Supervsed Learnng regardng support vector machnes (SVMs). Ths document also provdes a general
More informationChapter 1. Probability
Chapter. Probablty Mcroscopc propertes of matter: quantum mechancs, atomc and molecular propertes Macroscopc propertes of matter: thermodynamcs, E, H, C V, C p, S, A, G How do we relate these two propertes?
More informationThe Gaussian classifier. Nuno Vasconcelos ECE Department, UCSD
he Gaussan classfer Nuno Vasconcelos ECE Department, UCSD Bayesan decson theory recall that we have state of the world X observatons g decson functon L[g,y] loss of predctng y wth g Bayes decson rule s
More information} Often, when learning, we deal with uncertainty:
Uncertanty and Learnng } Often, when learnng, we deal wth uncertanty: } Incomplete data sets, wth mssng nformaton } Nosy data sets, wth unrelable nformaton } Stochastcty: causes and effects related non-determnstcally
More information9.913 Pattern Recognition for Vision. Class IV Part I Bayesian Decision Theory Yuri Ivanov
9.93 Class IV Part I Bayesan Decson Theory Yur Ivanov TOC Roadmap to Machne Learnng Bayesan Decson Makng Mnmum Error Rate Decsons Mnmum Rsk Decsons Mnmax Crteron Operatng Characterstcs Notaton x - scalar
More informationSTAT 405 BIOSTATISTICS (Fall 2016) Handout 15 Introduction to Logistic Regression
STAT 45 BIOSTATISTICS (Fall 26) Handout 5 Introducton to Logstc Regresson Ths handout covers materal found n Secton 3.7 of your text. You may also want to revew regresson technques n Chapter. In ths handout,
More informationOn an Extension of Stochastic Approximation EM Algorithm for Incomplete Data Problems. Vahid Tadayon 1
On an Extenson of Stochastc Approxmaton EM Algorthm for Incomplete Data Problems Vahd Tadayon Abstract: The Stochastc Approxmaton EM (SAEM algorthm, a varant stochastc approxmaton of EM, s a versatle tool
More informationLecture 4 Hypothesis Testing
Lecture 4 Hypothess Testng We may wsh to test pror hypotheses about the coeffcents we estmate. We can use the estmates to test whether the data rejects our hypothess. An example mght be that we wsh to
More informationGaussian Mixture Models
Lab Gaussan Mxture Models Lab Objectve: Understand the formulaton of Gaussan Mxture Models (GMMs) and how to estmate GMM parameters. You ve already seen GMMs as the observaton dstrbuton n certan contnuous
More informationFirst Year Examination Department of Statistics, University of Florida
Frst Year Examnaton Department of Statstcs, Unversty of Florda May 7, 010, 8:00 am - 1:00 noon Instructons: 1. You have four hours to answer questons n ths examnaton.. You must show your work to receve
More informationHierarchical Bayes. Peter Lenk. Stephen M Ross School of Business at the University of Michigan September 2004
Herarchcal Bayes Peter Lenk Stephen M Ross School of Busness at the Unversty of Mchgan September 2004 Outlne Bayesan Decson Theory Smple Bayes and Shrnkage Estmates Herarchcal Bayes Numercal Methods Battng
More informationANSWERS. Problem 1. and the moment generating function (mgf) by. defined for any real t. Use this to show that E( U) var( U)
Econ 413 Exam 13 H ANSWERS Settet er nndelt 9 deloppgaver, A,B,C, som alle anbefales å telle lkt for å gøre det ltt lettere å stå. Svar er gtt . Unfortunately, there s a prntng error n the hnt of
More information3.1 Expectation of Functions of Several Random Variables. )' be a k-dimensional discrete or continuous random vector, with joint PMF p (, E X E X1 E X
Statstcs 1: Probablty Theory II 37 3 EPECTATION OF SEVERAL RANDOM VARIABLES As n Probablty Theory I, the nterest n most stuatons les not on the actual dstrbuton of a random vector, but rather on a number
More informationEnsemble Methods: Boosting
Ensemble Methods: Boostng Ncholas Ruozz Unversty of Texas at Dallas Based on the sldes of Vbhav Gogate and Rob Schapre Last Tme Varance reducton va baggng Generate new tranng data sets by samplng wth replacement
More informationMarkov Chain Monte Carlo (MCMC), Gibbs Sampling, Metropolis Algorithms, and Simulated Annealing Bioinformatics Course Supplement
Markov Chan Monte Carlo MCMC, Gbbs Samplng, Metropols Algorthms, and Smulated Annealng 2001 Bonformatcs Course Supplement SNU Bontellgence Lab http://bsnuackr/ Outlne! Markov Chan Monte Carlo MCMC! Metropols-Hastngs
More informationLearning the structure of Bayesian belief networks
Lectue 17 Leanng the stuctue of Bayesan belef netwoks Mlos Hauskecht mlos@cs.ptt.edu 5329 Sennott Squae Leanng of BBN Leanng. Leanng of paametes of condtonal pobabltes Leanng of the netwok stuctue Vaables:
More informationProbability and Random Variable Primer
B. Maddah ENMG 622 Smulaton 2/22/ Probablty and Random Varable Prmer Sample space and Events Suppose that an eperment wth an uncertan outcome s performed (e.g., rollng a de). Whle the outcome of the eperment
More informationsince [1-( 0+ 1x1i+ 2x2 i)] [ 0+ 1x1i+ assumed to be a reasonable approximation
Econ 388 R. Butler 204 revsons Lecture 4 Dummy Dependent Varables I. Lnear Probablty Model: the Regresson model wth a dummy varables as the dependent varable assumpton, mplcaton regular multple regresson
More informationMIMA Group. Chapter 2 Bayesian Decision Theory. School of Computer Science and Technology, Shandong University. Xin-Shun SDU
Group M D L M Chapter Bayesan Decson heory Xn-Shun Xu @ SDU School of Computer Scence and echnology, Shandong Unversty Bayesan Decson heory Bayesan decson theory s a statstcal approach to data mnng/pattern
More informationCHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE
CHAPTER 5 NUMERICAL EVALUATION OF DYNAMIC RESPONSE Analytcal soluton s usually not possble when exctaton vares arbtrarly wth tme or f the system s nonlnear. Such problems can be solved by numercal tmesteppng
More informationCS 2750 Machine Learning Lecture 5. Density estimation. Density estimation
CS 750 Mache Learg Lecture 5 esty estmato Mlos Hausrecht mlos@tt.edu 539 Seott Square esty estmato esty estmato: s a usuervsed learg roblem Goal: Lear a model that rereset the relatos amog attrbutes the
More informationGeneralized Linear Methods
Generalzed Lnear Methods 1 Introducton In the Ensemble Methods the general dea s that usng a combnaton of several weak learner one could make a better learner. More formally, assume that we have a set
More informationCS286r Assign One. Answer Key
CS286r Assgn One Answer Key 1 Game theory 1.1 1.1.1 Let off-equlbrum strateges also be that people contnue to play n Nash equlbrum. Devatng from any Nash equlbrum s a weakly domnated strategy. That s,
More informationProbabilistic & Unsupervised Learning. Introduction and Foundations
Probablstc & Unsupervsed Learnng Introducton and Foundatons Maneesh Sahan maneesh@gatsby.ucl.ac.uk Gatsby Computatonal Neuroscence Unt, and MSc ML/CSML, Dept Computer Scence Unversty College London Term
More informationStatistical inference for generalized Pareto distribution based on progressive Type-II censored data with random removals
Internatonal Journal of Scentfc World, 2 1) 2014) 1-9 c Scence Publshng Corporaton www.scencepubco.com/ndex.php/ijsw do: 10.14419/jsw.v21.1780 Research Paper Statstcal nference for generalzed Pareto dstrbuton
More informationStatistical analysis using matlab. HY 439 Presented by: George Fortetsanakis
Statstcal analyss usng matlab HY 439 Presented by: George Fortetsanaks Roadmap Probablty dstrbutons Statstcal estmaton Fttng data to probablty dstrbutons Contnuous dstrbutons Contnuous random varable X
More information10-701/ Machine Learning, Fall 2005 Homework 3
10-701/15-781 Machne Learnng, Fall 2005 Homework 3 Out: 10/20/05 Due: begnnng of the class 11/01/05 Instructons Contact questons-10701@autonlaborg for queston Problem 1 Regresson and Cross-valdaton [40
More informationEGR 544 Communication Theory
EGR 544 Communcaton Theory. Informaton Sources Z. Alyazcoglu Electrcal and Computer Engneerng Department Cal Poly Pomona Introducton Informaton Source x n Informaton sources Analog sources Dscrete sources
More information3.1 ML and Empirical Distribution
67577 Intro. to Machne Learnng Fall semester, 2008/9 Lecture 3: Maxmum Lkelhood/ Maxmum Entropy Dualty Lecturer: Amnon Shashua Scrbe: Amnon Shashua 1 In the prevous lecture we defned the prncple of Maxmum
More informationClustering & Unsupervised Learning
Clusterng & Unsupervsed Learnng Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Wnter 2012 UCSD Statstcal Learnng Goal: Gven a relatonshp between a feature vector x and a vector y, and d data samples (x,y
More informationSDMML HT MSc Problem Sheet 4
SDMML HT 06 - MSc Problem Sheet 4. The recever operatng characterstc ROC curve plots the senstvty aganst the specfcty of a bnary classfer as the threshold for dscrmnaton s vared. Let the data space be
More informationRelevance Vector Machines Explained
October 19, 2010 Relevance Vector Machnes Explaned Trstan Fletcher www.cs.ucl.ac.uk/staff/t.fletcher/ Introducton Ths document has been wrtten n an attempt to make Tppng s [1] Relevance Vector Machnes
More informationThe Basic Idea of EM
The Basc Idea of EM Janxn Wu LAMDA Group Natonal Key Lab for Novel Software Technology Nanjng Unversty, Chna wujx2001@gmal.com June 7, 2017 Contents 1 Introducton 1 2 GMM: A workng example 2 2.1 Gaussan
More informationCS47300: Web Information Search and Management
CS47300: Web Informaton Search and Management Probablstc Retreval Models Prof. Chrs Clfton 7 September 2018 Materal adapted from course created by Dr. Luo S, now leadng Albaba research group 14 Why probabltes
More informationLogistic Classifier CISC 5800 Professor Daniel Leeds
lon 9/7/8 Logstc Classfer CISC 58 Professor Danel Leeds Classfcaton strategy: generatve vs. dscrmnatve Generatve, e.g., Bayes/Naïve Bayes: 5 5 Identfy probablty dstrbuton for each class Determne class
More informationUsing T.O.M to Estimate Parameter of distributions that have not Single Exponential Family
IOSR Journal of Mathematcs IOSR-JM) ISSN: 2278-5728. Volume 3, Issue 3 Sep-Oct. 202), PP 44-48 www.osrjournals.org Usng T.O.M to Estmate Parameter of dstrbutons that have not Sngle Exponental Famly Jubran
More informationSuites of Tests. DIEHARD TESTS (Marsaglia, 1985) See
Sutes of Tests DIEHARD TESTS (Marsagla, 985 See http://stat.fsu.edu/~geo/dehard.html NIST Test sute- 6 tests on the sequences of bts http://csrc.nst.gov/rng/ Test U0 Includes the above tests. http://www.ro.umontreal.ca/~lecuyer/
More informationDETERMINATION OF UNCERTAINTY ASSOCIATED WITH QUANTIZATION ERRORS USING THE BAYESIAN APPROACH
Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata Proceedngs, XVII IMEKO World Congress, June 7, 3, Dubrovn, Croata TC XVII IMEKO World Congress Metrology n the 3rd Mllennum June 7, 3,
More informationMultilayer neural networks
Lecture Multlayer neural networks Mlos Hauskrecht mlos@cs.ptt.edu 5329 Sennott Square Mdterm exam Mdterm Monday, March 2, 205 In-class (75 mnutes) closed book materal covered by February 25, 205 Multlayer
More information