Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

Size: px

Start display at page:

Download "Gaussians. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University."

Osborne Miles
5 years ago
Views:

1 Note to other teachers and users of these slides. Andrew would be delighted if you found this source aterial useful in giing your own lectures. Feel free to use these slides erbati, or to odify the to fit your own needs. PowerPoint originals are aailable. If you ake use of a significant portion of these slides in your own lecture, please include this essage, or the following link to the source repository of Andrew s tutorials: Coents and corrections gratefully receied. Gaussians Andrew W. oore Professor School of Coputer Science Carnegie ellon niersity aw@cs.cu.edu Copyright Andrew W. oore Slide Gaussians in Data ining Why we should care he entropy of a PDF niariate Gaussians ultiariate Gaussians Bayes Rule and Gaussians aiu ikelihood and AP using Gaussians Copyright Andrew W. oore Slide

2 Why we should care Gaussians are as natural as Orange Juice and Sunshine We need the to understand Bayes Optial Classifiers We need the to understand regression We need the to understand neural nets We need the to understand iture odels (You get the idea Copyright Andrew W. oore Slide 3 he bo distribution w if if w w > /w -w/ w/ Copyright Andrew W. oore Slide 4

3 he bo distribution w if if w w > /w -w/ w/ E[ ] Var[ ] w Copyright Andrew W. oore Slide 5 Entropy of Entropy of a PDF H[ ] log d Natural log (ln or log e he larger the entropy of a distribution the harder it is to predict the harder it is to copress it the less spiky the distribution Copyright Andrew W. oore Slide 6 3

4 he bo distribution w if if w w > /w -w/ w/ w/ H[ ] log d log d log d log w w w w w w/ w/ w/ Copyright Andrew W. oore Slide 7 nit ariance bo distribution w if if w w > 3 E[ ] w Var[ ] 3 3 if w 3 then Var[ ] and H[ ].4 Copyright Andrew W. oore Slide 8 4

5 he Hat distribution w w if if w > w w E[ ] w Var[ ] 6 w w Copyright Andrew W. oore Slide 9 nit ariance hat distribution w w if if w > w 6 E[ ] w Var[ ] if w 6 then Var[ ] and H[ ].396 Copyright Andrew W. oore Slide 5

6 he spikes distribution Dirac Delta δ ( + δ ( δ ( δ ( E[ ] Var[ ] - H [ ] log d Copyright Andrew W. oore Slide Entropies of unit-ariance distributions Distribution Entropy Bo Hat spikes??? infinity.489 argest possible entropy of any unitariance distribution Copyright Andrew W. oore Slide 6

7 nit ariance Gaussian ep π E[ ] Var[ ] H[ ] log d.489 Copyright Andrew W. oore Slide 3 General Gaussian ( ep π 5 E [ ] Var[ ] Copyright Andrew W. oore Slide 4 7

8 General Gaussian Also known as the noral distribution or Bellshaped cure ( ep π 5 E [ ] Var[ ] Shorthand: We say ~ N(, to ean is distributed as a Gaussian with paraeters and. In the aboe figure, ~ N(,5 Copyright Andrew W. oore Slide 5 he Error Function Assue ~ N(, Define ERF( P(< Cuulatie Distribution of ERF ( z dz π z z z ep dz Copyright Andrew W. oore Slide 6 8

9 sing he Error Function Assue ~ N(, ERF ( P(<, Copyright Andrew W. oore Slide 7 he Central iit heore If(,, n are i.i.d. continuous rando ariables n hen define z f (,,... n i n i As n-->infinity, z--->gaussian with ean E[ i ] and ariance Var[ i ] Soewhat of a justification for assuing Gaussian noise is coon Copyright Andrew W. oore Slide 8 9

10 Other aazing facts about Gaussians Wouldn t you like to know? We will not eaine the until we need to. Copyright Andrew W. oore Slide 9 Write r.. Biariate Gaussians Y hen define ~ N (, ( ( ( ep π Where the Gaussian s paraeters are to ean y y y y Where we insist that is syetric non-negatie definite Copyright Andrew W. oore Slide

11 Write r.. Biariate Gaussians Y hen define ~ N (, to ean ( ( ( ep π Where the Gaussian s paraeters are y y y y Where we insist that is syetric non-negatie definite It turns out that E[] and Co[]. (Note that this is a resulting property of Gaussians, not a definition* *his note rates 7.4 on the pedanticness scale Copyright Andrew W. oore Slide Ealuating : Step ( ( ( ep π. Begin with ector Copyright Andrew W. oore Slide

12 Ealuating : Step. Begin with ector. Define δ - ( ( ( ep π δ Copyright Andrew W. oore Slide 3 Ealuating : Step 3. Begin with ector. Define δ - 3. Count the nuber of contours crossed of the ellipsoids fored - D this count sqrt(δ - δ ahalonobis Distance between and δ ( ( ( ep π Contours defined by sqrt(δ - δ constant Copyright Andrew W. oore Slide 4

13 Ealuating : Step 4 ( ( ( ep π. Begin with ector. Define δ - 3. Count the nuber of contours crossed of the ellipsoids fored - D this count sqrt(δ - δ ahalonobis Distance between and 4. Define w e-d / e-d / D close to in squared ahalonobis space gets a large weight. Far away gets a tiny weight Copyright Andrew W. oore Slide 5 Ealuating : Step 5 ( ( ( ep π. Begin with ector. Define δ - 3. Count the nuber of contours crossed of the ellipsoids fored - D this count sqrt(δ - δ ahalonobis Distance between and 4. Define w e-d / e-d / 5. D ultiply w by to ensure d π Copyright Andrew W. oore Slide 6 3

14 Eaple Obsere: ean, Principal aes, iplication of off-diagonal coariance ter, a gradient zone of Coon conention: show contour corresponding to standard deiations fro ean Copyright Andrew W. oore Slide 7 Eaple Copyright Andrew W. oore Slide 8 4

15 Eaple In this eaple, and y are alost independent Copyright Andrew W. oore Slide 9 Eaple In this eaple, and +y are clearly not independent Copyright Andrew W. oore Slide 3 5

oore Slide 3 ultiariate Gaussians ( ( ( ep ( ( p π r.

16 6 Copyright Andrew W. oore Slide 3 Eaple In this eaple, and +y are clearly not independent Copyright Andrew W. oore Slide 3 ultiariate Gaussians ( ( ( ep ( ( p π r.. Write O hen define, ( ~ N to ean Where the Gaussian s paraeters hae Where we insist that is syetric non-negatie definite Again, E[] and Co[]. (Note that this is a resulting property of Gaussians, not a definition

17 7 Copyright Andrew W. oore Slide 33 General Gaussians O Copyright Andrew W. oore Slide 34 Ais-Aligned Gaussians 3 O j i i i for

18 8 Copyright Andrew W. oore Slide 35 Spherical Gaussians O j i i i for Copyright Andrew W. oore Slide 36 Degenerate Gaussians What s so wrong with clipping one s toenails in public?

19 9 Copyright Andrew W. oore Slide 37 Where are we now? We e seen the forulae for Gaussians We hae an intuition of how they behae We hae soe eperience of reading a Gaussian s coariance atri Coing net: Soe useful tricks with Gaussians Copyright Andrew W. oore Slide 38 Subsets of ariables where as Write ( ( + u u V V his will be our standard notation for breaking an - diensional distribution into subsets of ariables

20 Copyright Andrew W. oore Slide 39 Gaussian arginals are Gaussian + u u ( (, where as Write V V HEN is also distributed as a Gaussian u u uu u V, N ~ IF ( u uu, N ~ V arginalize Copyright Andrew W. oore Slide 4 Gaussian arginals are Gaussian + u u ( (, where as Write V V HEN is also distributed as a Gaussian u u uu u V, N ~ IF ( u uu, N ~ V arginalize his fact is not iediately obious Obious, once we know it s a Gaussian (why?

21 Gaussian arginals are Gaussian Write IF ~ N V as u, uu u where V HEN is also distributed as a Gaussian ~ N, ( u uu u V arginalize ( u (, V How would you proe this? u u, d (snore... u + Copyright Andrew W. oore Slide 4 inear ransfors reain Gaussian atri A ultiply A Assue is an -diensional Gaussian r.. ( ~ N, Define Y to be a p-diensional r.. thusly (note : Y A p where A is a p atri. hen Y ~ N, ( A A A Note: the subset result is a special case of this result Copyright Andrew W. oore Slide 4

22 Adding saples of independent Gaussians is Gaussian if ~ N (, and Y ~ N(, and Y Y y ( + then + Y ~ N, + y y + + Y y Why doesn t this hold if and Y are dependent? Which of the below stateents is true? If and Y are dependent, then +Y is Gaussian but possibly with soe other coariance If and Y are dependent, then +Y ight be non-gaussian Copyright Andrew W. oore Slide 43 Conditional of Gaussian is Gaussian V Conditionalize V IF ~ V N u, uu u u ( u, where HEN V ~ N u + ( V u u u u uu u u Copyright Andrew W. oore Slide 44

23 IF ~ N V u, uu u u ( u, where HEN V ~ N u + ( V u u u u uu u u IF w ~ N, y (, where HEN w y ~ N w y w y w y w y 976( y Copyright Andrew W. oore Slide 45 IF ~ N V u, uu u u ( u, where HEN V ~ N u + ( V u u u u uu u u IF w ~ N, y (, where HEN w y ~ N w y w w y w y y 976( y P(w 8 P(w 76 P(w Copyright Andrew W. oore Slide 46 3

IF ~ N V u, uu u u ( u, where HEN V ~ N u + ( V u u u u uu u u w 977 849 967 IF ~ N, y Note: when 76 gien 967 alue 3.

68 ean is a linear function of Note: conditional ariance is independent of the gien alue of P(w 8 Note: conditional ariance can only be equal to or saller

24 IF ~ N V u, uu u u ( u, where HEN V ~ N u + ( V u u u u uu u u w IF ~ N, y Note: when 76 gien 967 alue 3.68 of HEN w is y ~, N the ( conditional w y, w y where ean of u is u 976( y 76 w y w y Note: arginal 3.68 ean is a linear function of Note: conditional ariance is independent of the gien alue of P(w 8 Note: conditional ariance can only be equal to or saller than P(w 76 arginal ariance P(w Copyright Andrew W. oore Slide 47 Gaussians and the chain rule et A be a constant atri ( AV, and V ~ N( IF V ~ N u, HEN A ~ N V (,, with A A + ( A u V V A Chain Rule V Copyright Andrew W. oore Slide 48 4

25 V Y V V V Aailable Gaussian tools Conditionalize arginalize atri A ultiply + + Y Chain Rule A V V u ~ N, V uu u IF u HEN ~ N (, u uu ( IF ~ N, IF AND Y A HEN Y ~ N( A, A A (, and Y ~ N( y, y and Y Y ~ N( + if ~ N then +, + where ~ N V u, + uu u y u y ( V u u u HEN u ( AV, and V ~ N( IF V ~ N u, HEN ~ N V (,, with uu A A + ( A ( V ~ N u u u, u A u Copyright Andrew W. oore Slide 49 Assue You are an intellectual snob You hae a child Copyright Andrew W. oore Slide 5 5

26 Intellectual snobs with children are obsessed with IQ In the world as a whole, IQs are drawn fro a Gaussian N(,5 Copyright Andrew W. oore Slide 5 IQ tests If you take an IQ test you ll get a score that, on aerage (oer any tests will be your IQ But because of noise on any one test the score will often be a few points lower or higher than your true IQ. SCORE IQ ~ N(IQ, Copyright Andrew W. oore Slide 5 6

27 Assue You drag your kid off to get tested She gets a score of 3 Yippee you screech and start deciding how to casually refer to her ebership of the top % of IQs in your Christas newsletter. P(<3, 5 P(<, erf(.977 Copyright Andrew W. oore Slide 53 Assue You drag your kid off to get tested She gets a score of 3 You are thinking: Well sure the test isn t accurate, so Yippee you screech she ight and hae an start IQ of deciding or she how ight hae an Q of 4, but the to casually refer ost to her likely IQ ebership gien the eidence of the top % of IQs in score3 your Christas is, of course, newsletter. 3. P(<3, 5 P(<, erf(.977 Can we trust this reasoning? Copyright Andrew W. oore Slide 54 7

28 aiu ikelihood IQ IQ~N(,5 S IQ ~ N(IQ, S3 IQ le arg a s 3 iq iq he E is the alue of the hidden paraeter that akes the obsered data ost likely In this case le IQ 3 Copyright Andrew W. oore Slide 55 IQ~N(,5 S IQ ~ N(IQ, S3 B. IQ le he E is the alue of the hidden paraeter that akes the obsered data ost likely In this case arg a s 3 iq iq le IQ 3 his is not the sae as he ost likely alue of the paraeter gien the obsered data Copyright Andrew W. oore Slide 56 8

29 What we really want: IQ~N(,5 S IQ ~ N(IQ, S3 Question: What is IQ (S3? Called the Posterior Distribution of IQ Copyright Andrew W. oore Slide 57 IQ~N(,5 S IQ ~ N(IQ, S3 Question: What is IQ (S3? Which tool or tools? V Y V Conditionalize arginalize atri A ultiply + + Y A V V V Chain Rule V Copyright Andrew W. oore Slide 58 9

30 IQ~N(,5 S IQ ~ N(IQ, S3 Plan Question: What is IQ (S3? S IQ IQ Chain Rule S IQ Swap IQ S Conditionalize I Q S Copyright Andrew W. oore Slide 59 Working IQ~N(,5 S IQ ~ N(IQ, S3 Question: What is IQ (S3? IF ~ N V u, + uu u u HEN ( V u u u ( AV, and V ~ N( IF V ~ N u, HEN ~ N V (,, with A A + ( A u A Copyright Andrew W. oore Slide 6 3

31 Your pride and joy s posterior IQ If you did the working, you now hae IQ S3 If you hae to gie the ost likely IQ gien the score you should gie IQ ap arg a iq s 3 iq where AP eans aiu A-posteriori Copyright Andrew W. oore Slide 6 What you should know he Gaussian PDF forula off by heart nderstand the workings of the forula for a Gaussian Be able to understand the Gaussian tools described so far Hae a rough idea of how you could proe the Be happy with how you could use the Copyright Andrew W. oore Slide 6 3

Feature Extraction Techniques

Feature Extraction Techniques Unsupervised Learning II Feature Extraction Unsupervised ethods can also be used to find features which can be useful for categorization. There are unsupervised ethods that