CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Similar documents
CS 3710 Advanced Topics in AI Lecture 17. Density estimation. CS 3710 Probabilistic graphical models. Administration

Parameter Estimation

Generative classification models

Learning Graphical Models

Density estimation II

Bayesian belief networks

2. Independence and Bernoulli Trials

STK3100 and STK4100 Autumn 2017

CS 2750 Machine Learning. Lecture 5. Density estimation. CS 2750 Machine Learning. Announcements

Part I: Background on the Binomial Distribution

Probability and Statistics. What is probability? What is statistics?

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

STK3100 and STK4100 Autumn 2018

CS 2750 Machine Learning. Lecture 8. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Machine learning: Density estimation

Overview. Basic concepts of Bayesian learning. Most probable model given data Coin tosses Linear regression Logistic regression

Unsupervised Learning and Other Neural Networks

ρ < 1 be five real numbers. The

Probabilistic Graphical Models

BASIC PRINCIPLES OF STATISTICS

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Chapter 5 Properties of a Random Sample

THE ROYAL STATISTICAL SOCIETY 2016 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE MODULE 5

Density estimation III.

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

Econometric Methods. Review of Estimation

STK4011 and STK9011 Autumn 2016

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Lecture 3 Probability review (cont d)

Bayes (Naïve or not) Classifiers: Generative Approach

Point Estimation: definition of estimators

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Nonparametric Density Estimation Intro

Chapter 3 Sampling For Proportions and Percentages

Module 7. Lecture 7: Statistical parameter estimation

Special Instructions / Useful Data

D KL (P Q) := p i ln p i q i

X ε ) = 0, or equivalently, lim

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Machine Learning. Tutorial on Basic Probability. Lecture 2, September 15, 2006

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Chapter 5 Properties of a Random Sample

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Continuous Distributions

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Chapter 4 Multiple Random Variables

Density estimation. Density estimations. CS 2750 Machine Learning. Lecture 5. Milos Hauskrecht 5329 Sennott Square

Classification : Logistic regression. Generative classification model.

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Continuous Random Variables: Conditioning, Expectation and Independence

Bayes Estimator for Exponential Distribution with Extension of Jeffery Prior Information

Chapter 4 Multiple Random Variables

Lecture 3. Sampling, sampling distributions, and parameter estimation

ON BIVARIATE GEOMETRIC DISTRIBUTION. K. Jayakumar, D.A. Mundassery 1. INTRODUCTION

Parameter, Statistic and Random Samples

= 2. Statistic - function that doesn't depend on any of the known parameters; examples:

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

22 Nonparametric Methods.

Simulation Output Analysis

Multivariate Transformation of Variables and Maximum Likelihood Estimation

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

Formulas and Tables from Beginning Statistics

Section 2 Notes. Elizabeth Stone and Charles Wang. January 15, Expectation and Conditional Expectation of a Random Variable.

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

MIMA Group. Chapter 4 Non-Parameter Estimation. School of Computer Science and Technology, Shandong University. Xin-Shun SDU

A BAYESIAN APPROACH TO SHRINKAGE ESTIMATORS

Random Variate Generation ENM 307 SIMULATION. Anadolu Üniversitesi, Endüstri Mühendisliği Bölümü. Yrd. Doç. Dr. Gürkan ÖZTÜRK.

Lecture 02: Bounding tail distributions of a random variable

Analysis of Variance with Weibull Data

Set Theory and Probability

1 Solution to Problem 6.40

CS 2750 Machine Learning Lecture 8. Linear regression. Supervised learning. a set of n examples

THE ROYAL STATISTICAL SOCIETY 2010 EXAMINATIONS SOLUTIONS GRADUATE DIPLOMA MODULE 2 STATISTICAL INFERENCE

Mathematics HL and Further mathematics HL Formula booklet

Lecture Note to Rice Chapter 8

Chapter 3 Experimental Design Models

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

ESS Line Fitting

Introduction to Probability

Entropy, Relative Entropy and Mutual Information

Lecture Notes Types of economic variables

Law of Large Numbers

Qualifying Exam Statistical Theory Problem Solutions August 2005

Chapter 14 Logistic Regression Models

A New Family of Transformations for Lifetime Data

The expected value of a sum of random variables,, is the sum of the expected values:

LINEAR REGRESSION ANALYSIS

Randomness and uncertainty play an important

Point Estimation: definition of estimators

Likelihood Ratio, Wald, and Lagrange Multiplier (Score) Tests. Soccer Goals in European Premier Leagues

Transcription:

CS 750 Mache Learg Lecture 5 esty estmato Mlos Hausrecht mlos@tt.edu 539 Seott Square esty estmato esty estmato: s a usuervsed learg roblem Goal: Lear a model that rereset the relatos amog attrbutes the data.. } ata: { a vector of attrbute values Attrbutes: modeled by radom varables X { X wth X Xd} Cotuous or dscrete valued varables esty estmato: lear a uderlyg robablty dstrbuto model : X X X X from d

ata: esty estmato {.. } a vector of attrbute values Objectve: estmate the model of the uderlyg robablty dstrbuto over varables X X usg eamles true dstrbuto samles X.. } { estmate ˆ X esty estmato true dstrbuto samles X.. } { estmate ˆ X Stadard d assumtos: Samles are deedet of each other come from the same detcal dstrbuto fed X Ideedetly draw staces from the same fed dstrbuto

esty estmato yes of desty estmato: arametrc the dstrbuto s modeled usg a set of arameters ˆ X X Eamle: mea ad covaraces of a multvarate ormal Estmato: fd arameters descrbg data o-arametrc he model of the dstrbuto utlzes all eamles As f all eamles were arameters of the dstrbuto Eamles: earest-eghbor Learg va arameter estmato I ths lecture we cosder arametrc desty estmato Basc settgs: A set of radom varables X { X X Xd} A model of the dstrbuto over varables X wth arameters : ˆ X Eamle: Gaussa dstrbuto wth mea ad varace arameters ata {.. } Objectve: fd arameters such that X fts data the best 3

ML arameter estmato Model ˆ X X Θ ata {.. } Mamum lelhood ML Fd that mamzes the lelhood.. log-lelhood ML arg ma ˆ X X Θ log log arg ma ML Ideedet eamles log Bayesa arameter estmato he ML estmate cs just oe value of the arameter roblem: f there are two dfferet arameter values that are close terms of the lelhood usg oly oe of them may troduce a strog bas f we use t for eamle for redctos. Bayesa arameter estmato Remedes the lmtato of oe choce Uses the osteror dstrbuto for arameters osteror covers all ossble arameter values ad ther weghts arameter osteror ata Lelhood arameter ror 4

What does t do? Bayesa arameter estmato ror ad osteror covers all ossble arameter values ad ther weghts Assume: we have a model of wth a arameter Bayesa arameter estmato: ror o a arameter ML Estmate + ata + = ata + = osteror o a arameter Just oe value Bayesa arameter estmato Bayesa arameter estmato Uses the osteror dstrbuto for arameters osteror covers all ossble arameter values ad ther weghts arameter osteror How to use the osteror for modelg X? ˆ X X X Θ Θ dθ Θ ata Lelhood arameter ror 5

arameter estmato Other crtera: Mamum a osteror robablty MA mamze Θ mode of the osteror Yelds: oe set of arameters Θ MA Aromato: ˆ X X Θ MA Eected value of the arameter Θˆ E Θ mea of the osteror Eectato tae wth regard to osteror Θ Yelds: oe set of arameters Aromato: ˆ X X Θˆ arameter estmato. Co eamle. Co eamle: we have a co that ca be based Outcomes: two ossble values -- head or tal ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Objectve: We would le to estmate the robablty of a head from data ˆ 6

arameter estmato. Eamle. Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What would be your estmate of the robablty of a head? ~? arameter estmato. Eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What would be your choce of the robablty of a head? Soluto: use frequeces of occurreces to do the estmate ~ 5 5 0.6 hs s the mamum lelhood estmate of the arameter 7

robablty of a outcome ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: we ow the robablty robablty of a outcome of a co fl Combes the robablty of a head ad a tal So that s gog to c ts correct robablty Gves for Gves for 0 Beroull dstrbuto robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of deedet co fls = H H H H ecoded as = 00 What s the robablty of observg the data sequece :? 8

robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of co fls = H H H H ecoded as = 00 What s the robablty of observg a data sequece : robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of co fls = H H H H ecoded as = 00 What s the robablty of observg a data sequece : lelhood of the data 9

robablty of a sequece of outcomes. ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Assume: a sequece of co fls = H H H H ecoded as = 00 What s the robablty of observg a data sequece : 6 Ca be rewrtte usg the Beroull dstrbuto: he goodess of ft to the data Learg: we do ot ow the value of the arameter Our learg goal: Fd the arameter that fts the data the best? Oe soluto to the best : Mamze the lelhood Ituto: more lely are the data gve the model the better s the ft ote: Istead of a error fucto that measures how bad the data ft the model we have a measure that tells us how well the data ft : Error 0

Mamum lelhood ML estmate. Lelhood of data: Mamum lelhood estmate ML arg ma Otmze log-lelhood the same as mamzg lelhood l log log log log log - umber of heads see - umber of tals see log Mamum lelhood ML estmate. Otmze log-lelhood l log log Set dervatve to zero Solvg l 0 ML Soluto: ML

Mamum lelhood estmate. Eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What s the ML estmate of the robablty of a head ad a tal? Mamum lelhood estmate. Eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 What s the ML estmate of the robablty of head ad tal? Head: al: ML ML 5 5 0.6 0 5 0.4

3 Bayesa arameter estmato Uses the dstrbutos ror ad osteror over all ossble values of the arameter of the samlg dstrbuto Beroull: We ow that the lelhood s: How to choose the ror robablty? va Bayes theorem - s the ror robablty o ror Lelhood of data ormalzg factor osteror CS 750 Mache Learg ror dstrbuto Beta Beta Choce of ror: Beta dstrbuto Beta dstrbuto fts Beroull samle - cojugate choces Beta Why to use Beta dstrbuto? osteror dstrbuto s aga a Beta dstrbuto - a Gamma fucto! For teger values of

4 CS 750 Mache Learg Beta dstrbuto b a b a b a b a Beta osteror dstrbuto * = Beta Beta Beta Beta

5 osteror dstrbuto Beta osteror A cojugate ror to Beroull samle otce that arameters of the ror act le couts of heads ad tals sometmes they are also referred to as ror couts Beta Beta Mamum aosteror robablty MA Mamum a osteror estmate Selects the mode of the osteror dstrbuto Selects the model of the osteror rereseted as a Beta dstrbuto va Bayes rule ror Lelhood of data ormalzg factor ma arg MA Beta Beta

6 Mamum osteror robablty Mamum a osteror estmate Selects the mode of the osteror dstrbuto Assumes cojugate ror to Beroull samle MA MA Soluto: Beta Beta 0 log Mode of the osteror satsfes : CS 750 Mache Learg MA estmate eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume What s the MA estmate? 55 Beta

MA estmate eamle Assume the uow ad ossbly based co robablty of the head s ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume Beta 55 What s the MA estmate? MA 9 33 CS 750 Mache Learg MA estmate eamle ote that the ror ad data ft data lelhood are combed he MA ca be based wth large ror couts It s hard to overtur t wth a smaller samle sze ata: H H H H H H H H H H H H H H H Heads: 5 als: 0 Assume Beta 55 Beta 50 9 MA 33 9 MA 48 CS 750 Mache Learg 7

8 CS 750 Mache Learg Bayesa framewor redctve robablty of a outcome the et tral Equvalet to the eected value of the arameter eectato s tae wth resect to the osteror dstrbuto Beta 0 d 0 E d osteror desty CS 750 Mache Learg Eected value of the arameter How to calculate the eected value of Beta? d d Beta E 0 0 d 0 Beta d 0 ote: for teger values of

Eected value of the arameter Substtutg the results for the osteror: Beta We get E ote that the mea of the osteror s yet aother reasoable arameter choce: ˆ E CS 750 Mache Learg Bomal dstrbuto Eamle roblem: a based co Outcomes: two ossble values -- head or tal ata: a set of order-deedet outcomes for trals - umber of heads see - umber of tals see ca be calculated from the tral data!!! Model: robablty of a head robablty of a tal robablty of a outcome Objectve: We would le to estmate the robablty of a head Bomal dstrbuto ˆ 9

Bomal dstrbuto = * + 3* Eamle roblem: co fls where each co fl ca have two results: head or tal Outcome: - umber of heads see - umber of tals see trals Model: robablty of a head robablty of a tal robablty of a outcome: Bomal dstrbuto Bomal dstrbuto: models order deedet sequece of Beroull trals Bomal dstrbuto: Bomal dstrbuto 0

Mamum lelhood ML estmate. Lelhood of data: Log-lelhood!!! log log!!! log log l Costat from the ot of otmzato!!! ML ML Soluto: he same as for Beroull ad wth d sequece of eamles osteror desty osteror desty ror choce Lelhood osteror MA estmate ma arg MA va Bayes rule Beta Beta MA

Multomal dstrbuto Eamle: multle rolls of a dce wth 6 results Outcome: couts of occurreces of ossble outcomes of trals: Model arameters: robablty dstrbuto: ML estmate: ML θ s.t. - a umber of tmes a outcome has bee see!!!! θ - robablty of a outcome Multomal dstrbuto osteror ad MA estmate Choce of the ror: rchlet dstrbuto.... r r θ θ θ θ MA.. MA estmate: osteror desty.. r θ rchlet s the cojugate choce for the multomal samlg!!!! θ θ

rchlet dstrbuto: Assume: =3 rchlet dstrbuto r θ.. 3 Other dstrbutos he same deas ca be aled to other dstrbutos ycally we choose dstrbutos that behave well so that comutatos lead to ce solutos Eoetal famly of dstrbutos Cojugate choces for some of the dstrbutos from the eoetal famly: Bomal Beta Multomal - rchlet Eoetal Gamma osso Iverse Gamma Gaussa - Gaussa mea ad Wshart covarace 3

Gaussa ormal dstrbuto Gaussa: ~ arameters: - mea - stadard devato esty fucto: e[ Eamle: ] 0.4 0.35 0 0.3 0.5 0. 0.5 0. 0.05 0-4 -3 - - 0 3 4 CS 750 Mache Learg arameter estmates Loglelhood l log ML estmates of the mea ad varace: ˆ ˆ ˆ ML varace estmate s based E E ˆ Ubased estmate: ˆ ˆ CS 750 Mache Learg 4

Multvarate ormal dstrbuto Multvarate ormal: ~ arameters: - mea - covarace matr esty fucto: e d / / Eamle: CS 750 Mache Learg arttoed Gaussa strbutos Multvarate Gaussa: Eamle: recso matr What are the dstrbutos for margals ad codtoals? a a b 5

arttoed Codtoals ad Margals Codtoal desty: Margal esty: arttoed Codtoals ad Margals 6

7 CS 750 Mache Learg arameter estmates Loglelhood ML estmates of the mea ad covaraces: Covarace estmate s based Ubased estmate: ˆ ˆ ˆ ˆ log l ˆ ˆ ˆ E E ˆ ˆ ˆ CS 750 Mache Learg osteror of a multvarate ormal Assume a ror o the mea that s ormally dstrbuted: he the osteror of s ormally dstrbuted d / / e e * / / d e / / d

8 CS 750 Mache Learg osteror of a multvarate ormal he the osteror of s ormally dstrbuted e / / d CS 750 Mache Learg Other dstrbutos Gamma dstrbuto: Eoetal dstrbuto: A secal case of Gamma for a= osso dstrbuto: b a a e b a b a b e b b! e for ] 0 [ for } 0 {

9 Other dstrbutos Gamma dstrbuto: b a a e b a b a for ] 0 [ CS 750 Mache Learg Sequetal Bayesa arameter estmato Sequetal Bayesa aroach Uder the d the estmates of the osteror ca be comuted cremetally for a sequece of data ots If we use a cojugate ror we get bac the same osteror Assume we slt the data the last elemet ad the rest he: d Θ Θ Θ d Θ Θ Θ Θ A ew ror

Eoetal famly Eoetal famly: all robablty mass / desty fuctos that ca be wrtte the eoetal ormal form f h e t Z a vector of atural or caocal arameters t a fucto referred to as a suffcet statstc h a fucto of t s less mortat Z a ormalzato costat a artto fucto Z h e t d Other commo form: f h e t A log Z A CS 750 Mache Learg Eoetal famly: eamles Beroull dstrbuto e log log elog e log Eoetal famly f h e t Z arameters? t? Z? h? CS 750 Mache Learg 30

3 CS 750 Mache Learg Eoetal famly: eamles Beroull dstrbuto Eoetal famly arameters ote log log e log t h Z e e t h Z f log e log e e CS 750 Mache Learg Eoetal famly: eamles Uvarate Gaussa dstrbuto Eoetal famly arameters e log e?? t? h? Z e t h Z f ] e[

3 CS 750 Mache Learg Eoetal famly: eamles Uvarate Gaussa dstrbuto Eoetal famly arameters e log e / / t / h log 4 e log e Z e t h Z f ] e[ CS 750 Mache Learg Eoetal famly For d samles the lelhood of data s Imortat: the dmesoalty of the suffcet statstc remas the same wth the umber of samles e A t h e A t h e A t h

33 CS 750 Mache Learg Eoetal famly he log lelhood of data s Otmzg the loglelhood For the ML estmate t must hold e log A t h l log A t h 0 A t l t A CS 750 Mache Learg Eoetal famly Rewrttg the gradet:

Eoetal famly Rewrttg the gradet: A log Z log h e t d t h e t d A h e t d A t h e t A d A E t Result: E t t For the ML estmate the arameters should be adjusted such that the eectato of the statstc t s equal to the observed samle statstcs CS 750 Mache Learg Momets of the dstrbuto For the eoetal famly he -th momet of the statstc corresods to the -th dervatve of A If s a comoet of t the we get the momets of the dstrbuto by dfferetatg ts corresodg atural arameter Eamle: Beroull e log log A log log e ervatves: A e log e e e A e CS 750 Mache Learg 34

Ed CS 750 Mache Learg Multvarate ormal dstrbuto Multvarate ormal: ~ arameters: - mea - covarace matr esty fucto: e d / / Eamle: CS 750 Mache Learg 35

arameter estmates Loglelhood l log ML estmates of the mea ad covaraces: ˆ ˆ ˆ ˆ Covarace estmate s based E ˆ E ˆ Ubased estmate: ˆ ˆ CS 750 Mache Learg ˆ ˆ Learg va arameter estmato I ths lecture we cosder arametrc desty estmato Basc settgs: A set of radom varables X { X X Xd} A model of the dstrbuto over varables X wth arameters ata.. } { Objectve: fd arameters ˆ that ft the data the best What s the best set of arameters? here are varous crtera oe ca aly here CS 750 Mache Learg 36

37 CS 750 Mache Learg arameter estmato. Mamum lelhood ML Mamum a osteror robablty MA Bayesa framewor use a osteror desty o otmzato mamze - reresets ror bacgroud owledge mamze Selects the mode of the osteror CS 750 Mache Learg osteror of a multvarate ormal Assume that we use oly a ror o the mea: A ror he the osteror s: ormally ML estmates of the mea ad covaraces: Covarace estmate s based Ubased estmate: ˆ ˆ ˆ ˆ ˆ ˆ ˆ E E ˆ ˆ ˆ e / / d

Loglelhood arameter estmates l log ML estmates of the mea ad covaraces: ˆ ˆ ˆ ˆ Ubased estmate: ˆ ˆ ˆ CS 750 Mache Learg Usuervsed learg ata: {.. } a vector of attrbute values e.g. the descrto of a atet o secfc target attrbute we wat to redct o outut y Objectve: lear descrbe relatos betwee attrbutes eamles yes of roblems: Clusterg Grou together smlar eamles esty estmato Model robablstcally the oulato of eamles CS 750 Mache Learg 38

Beta dstrbuto 3.5 3.5 0.5.5 5 =0.5 =0.5 =.5 =.5 =.5 =5.5.5 0.5 0 0 0. 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 CS 750 Mache Learg Eoetal famly Eoetal famly of dstrbutos θ b θ f θφ e c φ a φ arameters: θ - locato arameters φ - scalg arameters Eamle: d / / e CS 750 Mache Learg 39

Eamle: Beroull dstrbuto Co eamle: we have a co that ca be based Outcomes: two ossble values -- head or tal ata: a sequece of outcomes such that head tal 0 Model: robablty of a head robablty of a tal Objectve: We would le to estmate the robablty of a head ˆ robablty of a outcome Beroull dstrbuto CS 750 Mache Learg 40