15-780: Graduate Artificial Intelligence. Density estimation

Size: px

Start display at page:

Download "15-780: Graduate Artificial Intelligence. Density estimation"

Morgan Roberts
5 years ago
Views:

1 5-780: Graduate Artificial Itelligece Desity estimatio

2 Coditioal Probability Tables (CPT) But where do we get them? P(B)=.05 B P(E)=. E P(A B,E) )=.95 P(A B, E) =.85 P(A B,E) )=.5 P(A B, E) =.05 A P(J A) )=.7 P(J A) =.05 J M P(M A) )=.8 P(M A) =.5

3 Desity Estimatio A Desity Estimator lears a mappig from a set of attributes to a Probability Iput data for a variable or a set of variables Desity Estimator Probability

4 Desity estimatio Estimate the distributio (or coditioal distributio) of a radom variable Types of variables: - Biary coi flip, alarm - Discrete dice, car model year - Cotiuous height, weight, temp.,

5 Not just for Bayesia etworks Desity estimators ca do may good thigs Ca sort the records by probability, ad thus spot weird records (aomaly detectio) Ca do iferece: P(E E) Medical diagosis / Robot sesors Igrediet for Bayes etworks

6 Desity estimatio Biary ad discrete variables: Easy: Just cout! Cotiuous variables: Harder (but just a bit): Fit a model

7 Learig a desity estimator Pˆ ( x[ i] = u) = records i which x[ i] = u total umber of records A trivial learig algorithm!

8 Course evaluatio P(summer) = Summer / records Summer? Size Evaluatio = 3/5 = P(Evaluatio = ) = Evaluatio= / records = 49/5 = P(Evaluatio = summer) = P(Evaluatio = & summer) / P(summer) = /3 = But why do we cout? 0

9 Computig the joit likelihood of the data P(summer) = Summer / records Summer? Size Evaluatio = 3/5 = P(Evaluatio = ) = Evaluatio= / records = 49/5 = Pˆ P(Evaluatio = summer) = R 0 33 P(Evaluatio (dataset M= ) = & summer) Pˆ( x! " / x K" xr M ) = Pˆ( xk M ) P(summer) = /3 = k = 55 3 The ext slide presets oe of the most 0 importat ideas i probabilistic iferece. It has a huge umber of applicatios i may differet ad diverse problems

10 Maximum Likelihood Priciple We ca fit models by maximizig the probability of geeratig the observed samples: L(x,,x Θ) = p(x Θ) p(x Θ) The samples (rows i the table) are assumed to be idepedet) For a biary radom variable A with P(A=)=q argmax q = /samples Why?

11 Maximum Likelihood Priciple For a biary radom variable A with P(A=)=q argmax q = /samples Why? Data likelihood: P ( D M ) = q! ( q) We would like to fid: arg max q q! ( q)

12 Data likelihood: We would like to fid: Maximum Likelihood Priciple ) ( ) ( q q M D P! = ) ( arg max q q q! 0 ) ( 0 ) ) ( ( ) ( 0 ) ( ) ( 0 ) ( ) ( ) ( q q q q q q q q q q q q q q q q q q q q q + =! + =! = " "! = " " "! = " " "! = " " " = " " " " " " "

13 Log Probabilities Whe workig with products, probabilities of etire datasets ofte get too small. A possible solutio is to use the log of probabilities, ofte termed log likelihood R R log Pˆ(dataset M ) = log Pˆ( xk M ) = log Pˆ( xk M " k =! k = ) Log values betwee 0 ad

14 Desity estimatio Biary ad discrete variables: Easy: Just cout! Cotiuous variables: Harder (but just a bit): Fit a model But what if we oly have very few samples?

15 The dager of joit desity estimatio P(summer & size > 0 & evaluatio = 3) = 0 - No such example i our dataset Summer? Size 9 Evaluatio 3 Now lets assume we are give a ew (ofte called test ) dataset. If this dataset cotais the lie Summer Size Evaluatio 30 3 The the probability we would assig to the etire dataset is

16 Naïve Desity Estimatio The problem with the Joit Estimator is that it just mirrors the traiig data. We eed somethig which geeralizes more usefully. The aïve model geeralizes strogly: Assume that each attribute is distributed idepedetly of ay of the other attributes.

17 Joit estimatio, revisited Assumig idepedece we ca compute each probability idepedetly P(Summer) = 0.5 Summer? Size 9 Evaluatio 3 P(Evaluatio = ) = P(Size > 0) = How do we do o the joit? P(Summer & Evaluatio = ) = 0.09 P(Summer)P(Evaluatio = ) = Not bad! P(size > 0 & Evaluatio = ) = 0.3 P(size > 0)P(Evaluatio = ) = 0.0

18 Joit estimatio, revisited Assumig idepedece we ca compute each probability idepedetly P(Summer) = 0.5 Summer? Size 9 Evaluatio 3 P(Evaluatio = ) = P(Size > 0) = How do we do o the joit? P(Summer & Size > 0) = 0.06 P(Summer)P(Size > 0) = We must be careful whe usig the Naïve desity estimator

19 Cotrast Joit DE Naïve DE Ca model aythig No problem to model C is a oisy copy of A Ca model oly very borig distributios Outside Naïve s scope Give 00 records ad more tha 6 Boolea attributes will screw up badly Give 00 records ad 0,000 multivalued attributes will be fie

Dealig with small datasets We just discussed oe possibility: Naïve estimatio There is aother way to deal with small umber of measuremets that is ofte used i practice.

20 Dealig with small datasets We just discussed oe possibility: Naïve estimatio There is aother way to deal with small umber of measuremets that is ofte used i practice. Assume we wat to compute the probability of heads i a coi flip - What if we ca oly observe 3 flips? - 5% of the times a maximum likelihood estimator will assig probability of to either the heads or tails

21 - What if we ca oly observe 3 flips? Pseudo couts - 5% of the times a maximum likelihood estimator will assig probability of to either the heads or tails I these cases we ca use prior belief about the fairess of most cois to ifluece the resultig model. We assume that we have observed 0 flips with 5 tails ad 5 heads Thus p(heads) = (heads+5)/(flips+0) Advatages:. Never assig a probability of 0 to a evet. As more data accumulates we ca get very close to the real distributio (the impact of the pseudo couts will dimiish rapidly)

22 - What if we ca oly observe 3 flips? Pseudo couts - 5% of the times a maximum likelihood estimator will assig probability of to either the heads or tails I these cases we ca use prior belief about the fairess of most cois to ifluece the resultig model. Some distributios (for example, the We assume that we have observed 0 flips with 5 tails Beta distributio) ca icorporate ad 5 heads pseudo couts as part of the model Thus p(heads) = (heads+5)/(flips+0) Advatages:. Never assig a probability of 0 to a evet. As more data accumulates we ca get very close to the real distributio (the impact of the pseudo couts will dimiish rapidly)

23 Desity estimatio Biary ad discrete variables: Easy: Just cout! Cotiuous variables: Harder (but just a bit): Fit a model

24 Coditioal Probability Tables (CPT) What do we do with cotiuous variables? P(S D) =? S P(S D) =? S S sesor S sesor D distace to wall T too close P(T D < )=.9 D T

25 Coditioal Probability Tables (CPT) What do we do with cotiuous variables? P(S D) =? S P(S D) =? S S sesor S sesor D distace to wall T too close P(T D < )=.9 D T

26 Elemetary Cocepts Populatio: the ideal group whose properties we are iterested i ad from which the samples are draw e.g., graduate studets at CMU Radom sample: a set of elemets draw at radom from the populatio e.g., studets i grad AI

27 Elemetary Cocepts Statistic: a umber computed from the data e.g., Average time of sleep

28 Sample mea: Sample Statistics µ =! i= xi where is the umber of samples. Sample variace: = Sample covariace:!( " µ ) i= xi cov( x, x ) =!( x " µ ) ( x " µ ), i, i i=

29 How much do grad studets sleep? Lets try to estimate the distributio of the time graduate studets sped sleepig (outside class).

30 Possible statistics X Sleep time Mea of X: 0 Sleep E{X} 7.03 Variace of X: Frequecy Sleep Var{X} = E{(X-E{X})^} Hours

31 Covariace: Sleep vs. GPA Co-Variace of X, X: Covariace{X,X} = E{(X-E{X})(X-E{X})} = Sleep / GPA 4 GPA 3.5 Sleep / GPA Sleep hours

32 Statistical Models Statistical models attempt to characterize properties of the populatio of iterest For example, we might believe that repeated measuremets follow a ormal (Gaussia) distributio with some mea µ ad variace σ, x ~ N(µ,σ ) where ( x x µ ) ( $ ) = e! p "! ad Θ=(µ,σ ) defies the parameters (mea ad variace) of the model.

33 The Parameters of Our Model A statistical model is a collectio of distributios; the parameters specify idividual distributios x ~ N(µ,σ ) We eed to adjust the parameters so that the resultig distributio fits the data well

34 The Parameters of Our Model A statistical model is a collectio of distributios; the parameters specify idividual distributios x ~ N(µ,σ ) We eed to adjust the parameters so that the resultig distributio fits the data well

35 Computig the parameters of our model Lets assume a Guassia distributio for our sleep data 0 8 Sleep How do we compute the parameters of the model? Frequecy 6 4 Sleep Hours

36 Maximum Likelihood Priciple We ca fit statistical models by maximizig the probability of geeratig the observed samples: L(x,,x Θ) = p(x Θ) p(x Θ) (the samples are assumed to be idepedet) I the Gaussia case we simply set the mea ad the variace to the sample mea ad the sample variace: µ =! i= xi =!( " µ ) i= xi Why? I will leave these derivatio to you

37 Sesor data S S D T

38 What value would we ifer for D give S,S? We will write the geeral terms ad the use the etwork model to simplify it. The importat issue is how to work with Gaussias Bayes rule D P( S D, S) P( D S) P( S D, S) P( S D) P( D) P ( D S, S) = = P( S S) P( S S) P( S) Usig etwork structure Assumig equal prior o all values of D arg max P( S D) P( S D) P( D) P( S S) P( S) D = arg max D S P( S D) P( S D) T S P( S D) P( S D) = "! e ( D S)! "! e ( D S )!

39 Model for sesor data ) ( ) ( ) ( ) ( ) log( ) log(!! "! "! "! "!!! S D S D e e S D S D = ) ( ) ( ) ( ) ( ) log(!!!! "! "! S D S D S D S D D = $ $ 0 ) ( ) ( S S D S S D S D S D + =! + + =! = " " "! " Oly if σ = σ

40 Sesor data D = S+ S

41 Lets go back to Naïve vs.full What should I use? model This ca be determied based o: Traiig data size Cross validatio Likelihood ratio test Cross validatio is oe of the most useful tricks i model fittig

42 Cross validatio

43 Cross validatio

44 px ( $ ) = Multi-Variate Gaussia A multivariate Gaussia model: x ~ N(µ,Σ) where! p / / e " x( " µ ) T " x( " µ ) Here µ is the mea vector ad Σ is the covariace matrix µ = {µ, µ } Σ = var(x ) cov(x,x ) cov(x,x ) var(x ) The covariace matrix captures liear depedecies amog the variables

45 Example

46 Importat poits Maximum likelihood estimatios (MLE) Pseudo couts Types of distributios Hadlig cotiuous variables

Probability and MLE.

Probability and MLE. 10-701 Probability ad MLE http://www.cs.cmu.edu/~pradeepr/701 (brief) itro to probability Basic otatios Radom variable - referrig to a elemet / evet whose status is ukow: A = it will rai tomorrow Domai