Machine Learning. Tutorial on Basic Probability. Lecture 2, September 15, 2006

Similar documents
CS 2750 Machine Learning Lecture 5. Density estimation. Density estimation

Probability and Statistics. What is probability? What is statistics?

Random Variables. ECE 313 Probability with Engineering Applications Lecture 8 Professor Ravi K. Iyer University of Illinois

CHAPTER VI Statistical Analysis of Experimental Data

2. Independence and Bernoulli Trials

Learning Graphical Models

Lecture 3 Naïve Bayes, Maximum Entropy and Text Classification COSI 134

Parameter Estimation

Lecture 9. Some Useful Discrete Distributions. Some Useful Discrete Distributions. The observations generated by different experiments have

Bayes (Naïve or not) Classifiers: Generative Approach

Summary of the lecture in Biostatistics

Entropy, Relative Entropy and Mutual Information

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

å 1 13 Practice Final Examination Solutions - = CS109 Dec 5, 2018

Chapter 5 Properties of a Random Sample

Lecture 3 Probability review (cont d)

Elementary manipulations of probabilities

Bayesian belief networks

UNIVERSITY OF OSLO DEPARTMENT OF ECONOMICS

Point Estimation: definition of estimators

Simulation Output Analysis

Lecture Notes Types of economic variables

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

Continuous Random Variables: Conditioning, Expectation and Independence

X ε ) = 0, or equivalently, lim

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

Nonparametric Density Estimation Intro

Bayesian belief networks

Chapter 3 Sampling For Proportions and Percentages

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

( ) = ( ) ( ) Chapter 13 Asymptotic Theory and Stochastic Regressors. Stochastic regressors model

Econometric Methods. Review of Estimation

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

Random Variables and Probability Distributions

Generative classification models

Lecture 3. Sampling, sampling distributions, and parameter estimation

STK4011 and STK9011 Autumn 2016

2SLS Estimates ECON In this case, begin with the assumption that E[ i

BIOREPS Problem Set #11 The Evolution of DNA Strands

CHAPTER 6. d. With success = observation greater than 10, x = # of successes = 4, and

Artificial Intelligence Learning of decision trees

Bayesian Classification. CS690L Data Mining: Classification(2) Bayesian Theorem: Basics. Bayesian Theorem. Training dataset. Naïve Bayes Classifier

Parametric Density Estimation: Bayesian Estimation. Naïve Bayes Classifier

BASIC PRINCIPLES OF STATISTICS

ρ < 1 be five real numbers. The

Probabilistic Graphical Models

best estimate (mean) for X uncertainty or error in the measurement (systematic, random or statistical) best

1 Solution to Problem 6.40

Chapter 4 Multiple Random Variables

Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur

Channel Models with Memory. Channel Models with Memory. Channel Models with Memory. Channel Models with Memory

X X X E[ ] E X E X. is the ()m n where the ( i,)th. j element is the mean of the ( i,)th., then

Multivariate Transformation of Variables and Maximum Likelihood Estimation

MEASURES OF DISPERSION

Continuous Distributions

STK3100 and STK4100 Autumn 2017

Chapter 14 Logistic Regression Models

Chapter 5 Properties of a Random Sample

Maximum Likelihood Estimation

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

D KL (P Q) := p i ln p i q i

1 Onto functions and bijections Applications to Counting

Qualifying Exam Statistical Theory Problem Solutions August 2005

Special Instructions / Useful Data

22 Nonparametric Methods.

STK3100 and STK4100 Autumn 2018

Module 7. Lecture 7: Statistical parameter estimation

Chain Rules for Entropy

Functions of Random Variables

ON BIVARIATE GEOMETRIC DISTRIBUTION. K. Jayakumar, D.A. Mundassery 1. INTRODUCTION

Parameter, Statistic and Random Samples

Chapter 8: Statistical Analysis of Simulated Data

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

IS 709/809: Computational Methods in IS Research. Simple Markovian Queueing Model

Dimensionality Reduction and Learning

Chapter 4 Multiple Random Variables

LECTURE - 4 SIMPLE RANDOM SAMPLING DR. SHALABH DEPARTMENT OF MATHEMATICS AND STATISTICS INDIAN INSTITUTE OF TECHNOLOGY KANPUR

Chapter -2 Simple Random Sampling

Law of Large Numbers

Chapter -2 Simple Random Sampling

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

8.1 Hashing Algorithms

Recall MLR 5 Homskedasticity error u has the same variance given any values of the explanatory variables Var(u x1,...,xk) = 2 or E(UU ) = 2 I

Chapter 8. Inferences about More Than Two Population Central Values

Class 13,14 June 17, 19, 2015

Quantitative analysis requires : sound knowledge of chemistry : possibility of interferences WHY do we need to use STATISTICS in Anal. Chem.?

STATISTICAL INFERENCE

Part I: Background on the Binomial Distribution

Introduction to local (nonparametric) density estimation. methods

Set Theory and Probability

Logistic regression (continued)

Median as a Weighted Arithmetic Mean of All Sample Observations

Module 7: Probability and Statistics

d dt d d dt dt Also recall that by Taylor series, / 2 (enables use of sin instead of cos-see p.27 of A&F) dsin

Chapter 4 (Part 1): Non-Parametric Classification (Sections ) Pattern Classification 4.3) Announcements

Chapter 2 Supplemental Text Material

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #1

Training Sample Model: Given n observations, [[( Yi, x i the sample model can be expressed as (1) where, zero and variance σ

The expected value of a sum of random variables,, is the sum of the expected values:

7. Joint Distributions

Transcription:

Mache Learg -7/5 7/5-78, 78, all 6 Tutoral o asc robablty Erc g f Lecture, Setember 5, 6 Readg: Cha. &, C & Cha 5,6, TM What s ths? Classcal AI ad ML research gored ths heomea The roblem a eamle: you wat to catch a flght at :am from tt to S, ca I make t f I leave at 7am ad take a 8 at CMU? artal observablty road state, other drvers' las, etc. osy sesors rado traffc reorts ucertaty acto outcomes flat tre, etc. mmese comlety of modelg ad redctg traffc Reasog uder ucertaty!

asc robablty Cocets A samle sace S s the set of all ossble outcomes of a cocetual or hyscal, reeatable eermet. S ca be fte or fte. E.g., S may be the set of all ossble outcomes of a dce roll: S {,,3,4,5,6} E.g., S may be the set of all ossble ucleotdes of a DA ste: S { A,T, C,G} E.g., S may be the set of all ossble ostos tme-sace ostos o of a arcraft o a radar scree: S {, R } {, 36 } {, } A evet A s the ay subset S : ma + Seeg "" or "6" a roll; observg a "G" at a ste; UA7 sace-tme terval A evet sace E s the ossble worlds the outcomes ca hae All dce-rolls, readg a geome, motorg the radar sgal Vsualzg robablty Sace A robablty sace s a samle sace of whch, for every subset s S, there s a assgmet s S such that: s Σ s S s s s called the robablty or robablty mass of s Evet sace of all ossble worlds. Its area s Worlds whch A s true Worlds whch A s false a s the area of the oval

Kolmogorov Aoms All robabltes are betwee ad true regardless of the evet, my outcome s true false o evet makes my outcome true The robablty of a dsjucto s gve by A A + A A A? A A Why use robablty? There have bee attemts to develo dfferet methodologes for ucertaty: uzzy logc Qualtatve reasog Qualtatve hyscs robablty theory s othg but commo sese reduced to calculato erre Lalace, 8. I 93, de ett roved that t s rratoal to have belefs that volate these aoms, the followg sese: If you bet accordace wth your belefs, but your belefs volate the aoms, the you ca be guarateed to lose moey to a ooet whose belefs more accurately reflect the true state of the world. ere, bettg ad moey are roes for decso makg ad utltes. What f you refuse to bet? Ths s lke refusg to allow tme to ass: every acto cludg acto s a bet 3

Radom Varable A radom varable s a fucto that assocates a uque umercal value a toke wth every outcome of a eermet. The value of the r.v. wll vary from tral to tral as the eermet s reeated ω S Dscrete r.v.: The outcome of a dce-roll The outcome of readg a t at ste : ary evet ad dcator varable: Seeg a "A" at a ste, o/w. Ths descrbes the true or false outcome a radom evet. Ca we descrbe rcher outcomes the same way?.e.,,, 3, 4, for beg A, C, G, T --- thk about what would hae f we take eectato of. Ut-ase Radom vector [ A, T, G, C ] ', [,,,]' seeg a "G" at ste Cotuous r.v.: The outcome of recordg the true locato of a arcraft: The outcome of observg the measured locato of a arcraft true ω obs Dscrete rob. Dstrbuto I the dscrete case, a robablty dstrbuto o S ad hece o the doma of s a assgmet of a o-egatve real umber s to each s S or each vald value of such that Σ s S s. s tutvely, s corresods to the frequecy or the lkelhood of gettg s the eermets, f reeated may tmes call s s the arameters a dscrete robablty dstrbuto A robablty dstrbuto o a samle sace s sometmes called a robablty model, artcular f several dfferet dstrbutos are uder cosderato wrte models as M, M, robabltes as M, M e.g., M may be the arorate rob. dst. f s from "far dce", M s for the "loaded dce". M s usually a two-tule of {dst. famly, dst. arameters} 4

5 eroull dstrbuto: er Multomal dstrbuto: Mult, Multomal dcator varable:., w.. ad ],, [ where, [,...,6] [,...,6] 6 5 4 3 j j j j j j j k k T G C A j j k T G C A j j de the dce - face} where, { Dscrete Dstrbutos for for Multomal dstrbuto: Mult, Cout varable: Dscrete Dstrbutos j j K where, M K K K K!!!!!!!! L L L

Cotuous rob. Dstrbuto A cotuous radom varable ca assume ay value a terval o the real le or a rego a hgh dmesoal sace usually corresods to a real-valued measuremets of some roerty, e.g., legth, osto, It s ot ossble to talk about the robablty of the radom varable assumg a artcular value --- Istead, we talk about the robablty of the radom varable assumg a value wth a gve terval, or half terval [ ],, < [ ], Arbtrary oolea combato of basc roostos Cotuous rob. Dstrbuto The robablty of the radom varable assumg a value wth some gve terval from to s defed to be the area uder the grah of the robablty desty fucto betwee ad. [ ] d, robablty mass:, ote that + d. Cumulatve dstrbuto fucto CD: < ' d ' robablty desty fucto D: + d d d ; >, Car flow o Lberty rdge cooked u! 6

What s the tutve meag of If a ad b, That s: the whe a value s samled from the dstrbuto wth desty, you are a/b tmes as lkely to fd that s very close to tha that s very close to. h < < + h lm h h < < + h + h d h a h b d h + h h Cotuous Dstrbutos Uform robablty Desty ucto / b a for a b elsewhere ormal Gaussa robablty Desty ucto / σ e πσ The dstrbuto s symmetrc, ad s ofte llustrated as a bell-shaed curve. Two arameters, mea ad σ stadard devato, determe the locato ad shae of the dstrbuto. The hghest ot o the ormal curve s at the mea, whch s also the meda ad mode. The mea ca be ay umercal value: egatve, zero, or ostve. Eoetal robablty Dstrbuto / desty : / e, o CD : e f f.4.3 < area.4866.. 3 4 5 6 7 8 9 Tme etwee Successve Arrvals ms. 7

Statstcal Characterzatos Eectato: the cetre of mass, mea value, frst momet: Samle mea: E S d dscrete cotuous Varace: the sreadess: Var Samle varace S [ E ] [ E ] d dscrete cotuous Gaussa ormal desty D If, σ, the robablty desty fucto df of s defed as / σ e πσ We wll ofte use the recso λ /σ stead of the varace σ. ere s how we lot the df matlab s-3:.:3; lots,ormdfs,mu,sgma ote that a desty evaluated at a ot ca be bgger tha! 8

Gaussa CD If Z,, the cumulatve desty fucto s defed as Φ z dz / z e dz π Ths has o closed form eresso, but s bult to most software ackages eg. ormcdf matlab stats toolbo. Use of the cdf If, σ, the Z /σ,. ow much mass s cotaed sde the [-.98σ,.98σ] terval? a b b a a < < b σ < Z < σ Φ σ Φ σ Sce Z.96 ormcdf.96 we have σ < < σ.95 9

Cetral lmt theorem If,, are..d. we wll come back to ths ot shortly cotuous radom varables The defe f,,..., As fty, Gaussa wth mea E[ ] ad varace Var[ ] Somewhat of a justfcato for assumg Gaussa ose s commo Elemetary maulatos of robabltes Set robablty of mult-valued r.v. {Odd} +3+5 /6+/6+/6 ½,, j j K Mult-varat dstrbuto: Jot robablty: true true j j {,K, } Margal robablty: j j S

Codtoal robablty racto of worlds whch s true that also have true "havg a headache" "comg dow wth lu" / /4 / fracto of flu-flcted worlds whch you have a headache / Defto: Corollary: The Cha Rule robablstc Iferece "havg a headache" "comg dow wth lu" / /4 / Oe day you wake u wth a headache. ou come wth the followg reasog: "sce 5% of flues are assocated wth headaches, so I must have a 5-5 chace of comg dow wth flu Is ths reasog correct?

robablstc Iferece "havg a headache" "comg dow wth lu" / /4 / The roblem:? The ayes Rule What we have just dd leads to the followg geeral eresso: Ths s ayes Rule

3 More Geeral orms of ayes Rule lu eadhead Drakeer + Z Z Z Z Z Z Z Z Z Z + S y y y ror Dstrbuto Suort that our roostos about the ossble has a "causal flow" e.g., ror or ucodtoal robabltes of roostos e.g., lu true ad Drkeer true. corresod to belef ror to arrval of ay ew evdece A robablty dstrbuto gves values for all ossble assgmets: Drkeer [.,.9,.,.8] ormalzed,.e., sums to

Jot robablty A jot robablty dstrbuto for a set of RVs gves the robablty of every atomc evet samle ot lu,drkeer a matr of values:.5.95..78 lu,drkeer, eadache? Every questo about a doma ca be aswered by the jot dstrbuto, as we wll see later. osteror codtoal robablty Codtoal or osteror see later robabltes e.g., lueadache.78 gve that flu s all I kow OT f flu the 7.8% chace of eadache Reresetato of codtoal dstrbutos: lueadache -elemet vector of -elemet vectors If we kow more, e.g., Drkeer s also gve, the we have lueadache,drkeer.7 Ths effect s kow as ela away! lueadache,lu ote: the less or more certa belef remas vald after more evdece arrves, but s ot always useful ew evdece may be rrelevat, allowg smlfcato, e.g., lueadache,stealerw lueadache Ths kd of ferece, sactoed by doma kowledge, s crucal 4

Iferece by eumerato Start wth a Jot Dstrbuto uldg a Jot Dstrbuto of M3 varables rob.4..7 Make a truth table lstg all combatos of values of your varables f there are M oolea varables the the table wll have M rows.. or each combato of values, say how robable t s. ormalzed,.e., sums to Iferece wth the Jot Oe you have the JD you ca.4 ask for the robablty of ay atomc evet cosstet wth you query E row E..7. 5

Iferece wth the Jot Comute Margals.4..7 lu eadache. Iferece wth the Jot Comute Margals.4..7 eadache. 6

Iferece wth the Jot Comute Codtoals.4..7 E E E E E E IE E row row. Iferece wth the Jot Comute Codtoals.4. lu eadhead lu eadhead eadhead.7. Geeral dea: comute dstrbuto o query varable by fg evdece varables ad summg over hdde varables 7

Summary: Iferece by eumerato Let be all the varables. Tycally, we wat the osteror jot dstrbuto of the query varables gve secfc values e for the evdece varables E Let the hdde varables be --E The the requred summato of jot etres s doe by summg out the hdde varables: Eeα,Eeα h,ee, h The terms the summato are jot etres because, E, ad together ehaust the set of radom varables Obvous roblems: Worst-case tme comlety Od where d s the largest arty Sace comlety Od to store the jot dstrbuto ow to fd the umbers for Od etres??? Codtoal deedece Wrte out full jot dstrbuto usg cha rule: eadache;lu;vrus;drkeer eadache lu;vrus;drkeer lu;vrus;drkeer eadache lu;vrus;drkeer lu Vrus;Drkeer Vrus Drkeer Drkeer Assume deedece ad codtoal deedece eadachelu;drkeer luvrus Vrus Drkeer I.e.,? deedet arameters I most cases, the use of codtoal deedece reduces the sze of the reresetato of the jot dstrbuto from eoetal to lear. Codtoal deedece s our most basc ad robust form of kowledge about ucerta evromets. 8

Rules of Ideedece --- by eamles Vrus Drkeer Vrus ff Vrus s deedet of Drkeer lu Vrus;Drkeer luvrus ff lu s deedet of Drkeer, gve Vrus eadache lu;vrus;drkeer eadachelu;drkeer ff eadache s deedet of Vrus, gve lu ad Drkeer Margal ad Codtoal Ideedece Recall that for evets E.e. ad say, y, the codtoal robablty of E gve, wrtte as E, s E ad / the robablty of both E ad are true, gve s true E ad are statstcally deedet f E E.e., rob. E s true does't deed o whether s true; or equvaletly E ad E. E ad are codtoally deedet gve f E, E or equvaletly E, E 9

Why kowledge of Ideedece s useful Lower comlety tme, sace, search.4..7. Motvates effcet ferece for all kds of queres Stay tued!! Structured kowledge about the doma easy to learg both from eert ad from data easy to grow Where do robablty dstrbutos come from? Idea Oe: uma, Doma Eerts Idea Two: Smler robablty facts ad some algebra e.g.,,, Idea Three: Lear them from data!.4..7. A good chuk of ths course s essetally about varous ways of learg varous forms of them!

Desty Estmato A Desty Estmator lears a mag from a set of attrbutes to a robablty Ofte kow as arameter estmato f the dstrbuto form s secfed omal, Gaussa Three mortat ssues: ature of the data d, correlated, Objectve fucto MLE, MA, Algorthm smle algebra, gradet methods, EM, Evoluto scheme lkelhood o test data, redctablty, cosstecy, arameter Learg from d data Goal: estmate dstrbuto arameters from a dataset of deedet, detcally dstrbuted d, fully observed, trag cases D {,..., } Mamum lkelhood estmato MLE. Oe of the most commo estmators. Wth d ad full-observablty assumto, wrte L as the lkelhood of the data: L,, K, ; ; ;, K, ; 3. ck the settg of arameters most lkely to have geerated the data we saw: ; * arg ma L arg ma log L

Eamle : eroull model Data: We observed d co tossg: D{,,,, } Reresetato: ary r.v: Model: for for ow to wrte the lkelhood of a sgle observato? {, } The lkelhood of datasetd{,, }:,,..., #head #tals MLE Objectve fucto: h t l ; D log D log log + log We eed to mamze ths w.r.t. Take dervatves wrt h h l h h h MLE or MLE Suffcet statstcs h, where k, requecy as samle mea The couts, are suffcet statstcs of data D

MLE for dscrete jot dstrbutos More geerally, t s easy to show that # records whch evet s true evet total umber of records Ths s a mortat but sometmes ot so effectve learg algorthm!.4..7. Eamle : uvarate ormal Data: We observed d real samles: Model: D{-.,,, -5.,, 3} Log lkelhood: / πσ e{ / σ } MLE: take dervatve ad set to zero: l ; D log D log πσ σ l / σ MLE l + 4 σ σ σ σ MLE ML 3

Overfttg Recall that for eroull Dstrbuto, we have What f we tossed too few tmes so that we saw zero head? We have head ad we wll redct that the robablty of ML, seeg a head et s zero!!! The rescue: head ML head Where ' s kow as the seudo- magary cout head ML ut ca we make ths more formal? head + tal head + ' head tal + + ' The ayesa Theory The ayesa Theory: e.g., for date D ad model M MD DMM/D the osteror equals to the lkelhood tmes the ror, u to a costat. Ths allows us to cature ucertaty about the model a rcled way 4

5 erarchcal ayesa Models are the arameters for the lkelhood α are the arameters for the ror α. We ca have hyer-hyer-arameters, etc. We sto whe the choce of hyer-arameters makes o dfferece to the margal lkelhood; tycally make hyerarameters costats. Where do we get the ror? Itellget guesses Emrcal ayes Tye-II mamum lkelhood comutg ot estmates of α : ma arg α α α v v v v MLE ayesa estmato for eroull eta dstrbuto: osteror dstrbuto of : otce the somorhsm of the osteror to the ror, such a ror s called a cojugate ror + + β α β α t h t h,...,,...,,..., Γ Γ + Γ β α β α β α β α β α β α,, ;

ayesa estmato for eroull, co'd osteror dstrbuto of :,..., h t α β h + α,...,,..., Mamum a osteror MA estmato: t + β MA arg ma log osteror mea estmato:,..., ata arameters ca be uderstood as seudo-couts ayes ror stregth: Aα+β + α β h + α t + β h D d C d + α + A ca be teroerated as the sze of a magary data set from whch we obta the seudo-couts Effect of ror Stregth Suose we have a uform ror αβ/, ad we observe v h, 8 t Weak ror A. osteror redcto: v v + h h, t 8, α α'. 5 + Strog ror A. osteror redcto: v v + h h, t 8, α α'. 4 + owever, f we have eough data, t washes away the ror. e.g., v h, t 8. The the estmates uder + + weak ad strog ror are + ad +, resectvely, both of whch are close to. 6

7 ayesa estmato for ormal dstrbuto ormal ror: Jot robablty: osteror: { } τ πτ / e / + + + + τ σ σ τ σ τ τ σ σ ~ ad, / / / / / / ~ where Samle mea { } τ πτ σ πσ / e e, / / { } σ πσ ~ / ~ e ~ /