CS1512 Foundations of Computing Science 2. Lecture 4

Similar documents
MA : Introductory Probability

Discrete Probability

TOPIC 12 PROBABILITY SCHEMATIC DIAGRAM

Conditional Probability

Probability and Probability Distributions. Dr. Mohammed Alahmed

What is Probability? Probability. Sample Spaces and Events. Simple Event

With Question/Answer Animations. Chapter 7

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

M378K In-Class Assignment #1

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Chapter. Probability

Chapter 2 Class Notes

2.6 Tools for Counting sample points

MA 1125 Lecture 15 - The Standard Normal Distribution. Friday, October 6, Objectives: Introduce the standard normal distribution and table.

Chapter Summary. 7.1 Discrete Probability 7.2 Probability Theory 7.3 Bayes Theorem 7.4 Expected value and Variance

Business Statistics MBA Pokhara University

MATH MW Elementary Probability Course Notes Part I: Models and Counting

CS206 Review Sheet 3 October 24, 2018

Probability 1 (MATH 11300) lecture slides

Lecture 2: Probability and Distributions

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Introduction to Probability 2017/18 Supplementary Problems

5. Conditional Distributions

Binomial and Poisson Probability Distributions

7.1 What is it and why should we care?

Advanced Herd Management Probabilities and distributions

CS 361: Probability & Statistics

Example. If 4 tickets are drawn with replacement from ,

Random Variable. Discrete Random Variable. Continuous Random Variable. Discrete Random Variable. Discrete Probability Distribution

Introductory Probability

Conditional Probability (cont...) 10/06/2005

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

2. AXIOMATIC PROBABILITY

CSC Discrete Math I, Spring Discrete Probability

STAT 285 Fall Assignment 1 Solutions

Announcements. Topics: To Do:

Lecture 1: Probability Fundamentals

Conditional Probability. CS231 Dianna Xu

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Information Science 2

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

P (A) = P (B) = P (C) = P (D) =

Statistical Methods for the Social Sciences, Autumn 2012

CS 160: Lecture 16. Quantitative Studies. Outline. Random variables and trials. Random variables. Qualitative vs. Quantitative Studies

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning

Lecture On Probability Distributions

MATH 10 INTRODUCTORY STATISTICS

Lecture 4: Probability, Proof Techniques, Method of Induction Lecturer: Lale Özkahya

Lecture 16. Lectures 1-15 Review

Basic Statistics and Probability Chapter 3: Probability

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 2 MATH00040 SEMESTER / Probability

Probability Year 9. Terminology

Great Theoretical Ideas in Computer Science

Dept. of Linguistics, Indiana University Fall 2015

Probability Notes (A) , Fall 2010

STA Module 4 Probability Concepts. Rev.F08 1

Relationship between probability set function and random variable - 2 -

Lecture 1. ABC of Probability

STAT:5100 (22S:193) Statistical Inference I

Essential Statistics Chapter 4

ECE 450 Lecture 2. Recall: Pr(A B) = Pr(A) + Pr(B) Pr(A B) in general = Pr(A) + Pr(B) if A and B are m.e. Lecture Overview

success and failure independent from one trial to the next?

L2: Review of probability and statistics

Econ 325: Introduction to Empirical Economics

Discrete Structures for Computer Science

Lecture Slides. Elementary Statistics Tenth Edition. by Mario F. Triola. and the Triola Statistics Series. Slide 1

Probability Theory and Random Variables

Example A. Define X = number of heads in ten tosses of a coin. What are the values that X may assume?

1 INFO Sep 05

Senior Math Circles November 19, 2008 Probability II

STAT509: Probability

Lecture 2. Conditional Probability

An introduction to biostatistics: part 1

Probability Year 10. Terminology

Probability 5-4 The Multiplication Rules and Conditional Probability

Math 42, Discrete Mathematics

Sets and Set notation. Algebra 2 Unit 8 Notes

Probability and Discrete Distributions

Probability calculus and statistics

Binomial random variable

Probability Basics Review

Module 8 Probability

Probability Dr. Manjula Gunarathna 1

CS70: Jean Walrand: Lecture 16.

STOR Lecture 4. Axioms of Probability - II

Counting principles, including permutations and combinations.

Deep Learning for Computer Vision

Probability Theory and Applications

Chapter 3. Chapter 3 sections

DEFINITION: IF AN OUTCOME OF A RANDOM EXPERIMENT IS CONVERTED TO A SINGLE (RANDOM) NUMBER (E.G. THE TOTAL

University of Jordan Fall 2009/2010 Department of Mathematics

1. Discrete Distributions

Name: Firas Rassoul-Agha

Lecture Slides. Elementary Statistics Eleventh Edition. by Mario F. Triola. and the Triola Statistics Series 4.1-1

Discrete Probability

Bayesian Updating with Discrete Priors Class 11, Jeremy Orloff and Jonathan Bloom

a table or a graph or an equation.

Transcription:

CS1512 Foundations of Computing Science 2 Lecture 4 Bayes Law; Gaussian Distributions 1 J R W Hunter, 2006; C J van Deemter 2007

(Revd. Thomas) Bayes Theorem P( E 1 and E 2 ) = P( E 1 )* P( E 2 E 1 ) Order of E 1 and E 2 is not important. 2

(Revd. Thomas) Bayes Theorem P( E 1 and E 2 ) = P( E 1 )* P( E 2 E 1 ) Order of E 1 and E 2 is not important. P( E 2 and E 1 ) = P( E 2 )* P( E 1 E 2 ). 3

(Revd. Thomas) Bayes Theorem P( E 1 and E 2 ) = P( E 1 )* P( E 2 E 1 ) Order of E 1 and E 2 is not important. P( E 2 and E 1 ) = P( E 2 )* P( E 1 E 2 ). So, P( E 2 )* P( E 1 E 2 ) = P( E 1 )* P( E 2 E 1 ). So, P( E 1 E 2 ) = P( E 1 )* P( E 2 E 1 ) P( E 2 ) 4

(Revd. Thomas) Bayes Theorem P( E 1 E 2 ) = P( E 1 )* P( E 2 E 1 ) P( E 2 ) E.g., once you know P(E1) and P(E2), knowing one of the two conditional probabilities (that is, knowing either P(E1 E2) or P(E2 E1)) implies knowing the other of the two. 5

(Revd. Thomas) Bayes Theorem P( E 1 E 2 ) = P( E 1 )* P( E 2 E 1 ) P( E 2 ) A simple example, without replacement: Two red balls and one black: RRB P(R)=2/3 P(B)= P(R B)= P(B R)= 6

(Revd. Thomas) Bayes Theorem P( E 1 E 2 ) = P( E 1 )* P( E 2 E 1 ) P( E 2 ) A simple example: Two red balls and one black: RRB P(R)=2/3 P(B)=1/3 P(R B)=1 P(B R)=1/2 7

(Revd. Thomas) Bayes Theorem P( E 1 E 2 ) = P( E 1 )* P( E 2 E 1 ) P( E 2 ) A simple example: Two red balls and one black: RRB P(R)=2/3 P(B)=1/3 P(R B)=1 P(B R)=1/2. Now calculate P(R B) using Bayes: P(R B)= 8

(Revd. Thomas) Bayes Theorem P( E 1 E 2 ) = P( E 1 )* P( E 2 E 1 ) P( E 2 ) A simple example: Two red balls and one black: RRB P(R)=2/3 P(B)=1/3 P(R B)=1 P(B R)=1/2 Now calculate P(R B) using Bayes: P(R B)=(2/3*1/2)/1/3=(2/6)/(1/3)=1 --- correct! 9

Bayes Theorem Why bother? Suppose E 1 is a disease (say measles) Suppose E 2 is a symptom (say spots) P( E 1 ) is the probability of a given patient having measles (before you see them) P( E 2 ) is the probability of a given patient having spots (for whatever reason) P( E 2 E 1 ) is the probability that someone who has measles has spots Bayes theorem allows us to calculate the probability: P( E 1 E 2 ) = P (patient has measles given that the patient has spots) 10

Discrete random variables Discrete random variable a discrete variable where the values taken are associated with a probability Discrete probability distribution formed by the possible values and their probabilities P(X = <some value>) =... 11

Simple example fair coin Let variable (X) be number of heads in one trial (toss). Values are 0 and 1 Probability distribution: Value of X 0 1 P(X = i) 0.5 0.5 0.6 0.5 0.4 0.3 0.2 0.1 0 0 1 12

Example total of two dice Let variable (X) be the total when two fair dice are tossed. Its values are 2, 3, 4,... 11, 12 Are these combination values equally probable? 13

Example total of two dice Let variable (X) be the total when two fair dice are tossed. Its values are 2, 3, 4,... 11, 12 Not all these combination values are equally probable! Probability distribution: P(X = 2) = 1/36 P(X = 3) = 2/36 (because 3 = 1+2 = 2+1) P(X = 4) = 3/36 (because 4 = 1+3 = 3+1 = 2+2)... 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 14

Example total of two dice Let variable (X) be the total when two fair dice are tossed. Its values are 2, 3, 4,... 11, 12 Probability distribution: 0.18 P(X = 2) = 1/36 P(X = 3) = 2/36... 1 2 3 4 5 6 1 2 3 4 5 6 7 2 3 4 5 6 7 8 3 4 5 6 7 8 9 4 5 6 7 8 9 10 5 6 7 8 9 10 11 6 7 8 9 10 11 12 0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 2 3 4 5 6 7 8 9 10 11 12 15

Bernouilli probability distribution Bernoulli trial: one in which there are only two possible and exclusive outcomes; call the outcomes success and failure ; consider the variable X which is the number of successes in one trial 0.7 Probability distribution: Value of X 0 1 P(X = i) (1-p) p 0.6 0.5 0.4 0.3 0.2 0.1 (fair coin è p=1/2 ; graph à p=4/10) 0 0 1 16

Binomial distribution Carry out n Bernoulli trials. X is the number of successes in n trials. Possible values are: 0, 1, 2,... i,... n-1, n What is P(X=r)? (= the probability of scoring exactly r successes in n trials) 17

Binomial distribution Carry out n Bernoulli trials. X is the number of successes in n trials. Possible values are: 0, 1, 2,... i,... n-1, n What is P(X=r)? (= the probability of scoring exactly r successes in n trials) Let s=success and f=failure. Then a trial is a tuple like e.g. <s,s,f,s,f,f,f> (n=7) Generally, a trial is an n-tuple consisting of s and f. Consider the 7-tuple <f,s,f,s,f,f,f>. Its probability is P(f)*P(s)*P(f)*P(s)*P(f)*P(f)*P(f). That s P(s) 2 * P(f) 5. Probability of a given tuple with r s-es and n-r f-s in it = P(s) r * P(f) n-r 18

Binomial distribution Probability of a given tuple with r s-es and n-r f-s in it = P(s) r * P(f) n-r We re interested in P(X=r), that s the probability of all tuples together that have r s-es and n-r f-s in them (i.e. the sum of their probabilities). How many ways are there of picking exactly r elements out of a set of n? Combinatorics says there are n! r! (n-r)! (Def: n! = n * (n-1) * (n-2) *... * 2 * 1) Try a few values. E.g., if n=9 and r=1 then 9! / 1!(9-1)! = 9! / 8! = 9. So, we need to multiply P(s) r * P(f) n-r with this factor. 19

Binomial distribution Combining what you ve seen in the previous slides Carry out n Bernoulli trials; in each trial the probability of success is P(s) and the probablity of failure is P(f) The distribution of X (i.e., the number of successes) is: P(X=r) = n C r * P(s) r * P(f) n-r n Cr = n! r! (n-r)! 20

Binomial distribution (slightly different formulation) P(X=r) = n C r * p r * (1-p) n-r Remember CS1015, combinations & permutations 21

Example of binomial distribution Probability of r heads in n trials (n = 20) P(X = r) 2.00E-01 1.80E-01 1.60E-01 1.40E-01 1.20E-01 1.00E-01 8.00E-02 6.00E-02 4.00E-02 2.00E-02 0.00E+00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 22

Example of binomial distribution Probability of r heads in n trials (n = 100) P(X = r) 9.00E-02 8.00E-02 7.00E-02 6.00E-02 5.00E-02 4.00E-02 3.00E-02 2.00E-02 1.00E-02 0.00E+00 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 90 96 23

Probability of being in a given range Probability of r heads in n trials (n = 100) P(0 X < 5) P(5 X < 10)... 4.00E-01 3.50E-01 3.00E-01 2.50E-01 2.00E-01 1.50E-01 1.00E-01 5.00E-02 0.00E+00 0 10 20 30 40 50 60 70 80 90 24

Probability of being in a given range P(55 X < 65) = P(55 X < 60) + P(60 X < 65) (mutually exclusive events) Add the heights of the columns 4.00E-01 3.50E-01 3.00E-01 2.50E-01 2.00E-01 1.50E-01 1.00E-01 5.00E-02 0.00E+00 0 10 20 30 40 50 60 70 80 90 25

Continuous random variable Consider a large population of individuals e.g. all males in the UK over 16 Consider a continuous attribute e.g. Height: X Select an individual at random so that any individual is as likely to be selected as any other, observe his/her Height X is said to be a continuous random variable 26

Continuous random variable Suppose the heights in your sample range from 31cm (a baby) to 207cm Continuous à can t list every height separately à need bins. If you are satisfied with a rough division into a few dozen categories, you might decide to use bins as follows: 19.5-39.5cm: relative frequency = 3/100 39.5-59.5cm: relative frequency = 7/100 59.5-79.5cm: relative frequency = 23/100 etc. Relative frequencies may be plotted as follows: 27

Relative Frequency Histogram relative frequency 2.50E-01 2.00E-01 height of the bar gives the relative frequency 1.50E-01 1.00E-01 5.00E-02 0.00E+00 19.5 39.5 59.5 79.5 99.5 119.5 139.5 159.5 179.5 199.5 28

Probability density function The probability distribution of X is the function P such that, for all a and b : P( {x : a x > b} ) = the area under the curve between a and b P (total area under curve) = 1.0 29

Increase the number of samples... and decrease the width of the bin... 30

The normal or Gaussian distribution Very common distribution: variable measured for large number of nominally identical objects; variation assumed to be caused by a large number of factors; each factor exerts a small random positive or negative influence; e.g. height: age, diet, genes -- Symmetric about mean µ -- Monotonically increasing where < µ -- Monotonically descreasing where > µ -- (Unimodal: only one peak) 0.25 0.2 0.15 The greater the number of trials (n), the more closely does the binomial Distribution approximate a normal Distribution 0.1 0.05 2 2.8 3.6 4.4 5.2 0 6 6.8 7.6 8.4 9.2 10 10.8 11.6 12.4 13.2 14 14.8 15.6 16.4 17.2 18 31

Mean Mean determines the centre of the curve: 0.25 0.2 0.15 Mean = 10 0.1 0.05 0 0.25 0 1.6 3.2 4.8 6.4 8 9.6 11.2 12.8 14.4 16 17.6 19.2 20.8 22.4 24 25.6 27.2 28.8 30.4 32 33.6 35.2 36.8 38.4 40 0.2 0.15 Mean = 30 0.1 0.05 0 0 1.6 3.2 4.8 6.4 8 9.6 11.2 12.8 14.4 16 17.6 19.2 20.8 22.4 24 25.6 27.2 28.8 30.4 32 33.6 35.2 36.8 38.4 40 32

Standard deviation Standard deviation determines the width of the curve: Std. Devn. = 2 0.25 0.2 0.15 0.1 Std. Devn. = 1 0.05 0 2 2.8 3.6 4.4 5.2 6 6.8 7.6 8.4 9.2 10 10.8 11.6 12.4 13.2 14 14.8 15.6 16.4 17.2 18 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 2 2.6 3.2 3.8 4.4 5 5.6 6.2 6.8 7.4 8 8.6 9.2 9.8 10.4 11 11.6 12.2 12.8 13.4 14 14.6 15.2 15.8 16.4 17 17.6 33

Normal distribution Given a perfectly normal distribution, mean and standard deviation together determine the function uniquely. Certain things can be proven about any normal distribution This is why statisticians often assume that their sample space is approximately normally distributed... even though this is usually very difficult to be sure about! Example (without proof): In normally distributed data, the probability of lying within 2 standard deviations of the mean, on either side, is approximately 0.95 34

Probability of sample lying within mean ± 2 standard deviations Example: Mean (µ)= 10.0 Std. devn (σ)= 2.0 P(X < µ σ) = 15.866% P(X < µ 2σ) = 2.275% P(µ 2σ < X < µ + 2σ) = = (100 2 * 2.275)% = 94.5% x Prob Dist Cum Prob Dist 2.0 0.00013383 00.003% 2.4 0.000291947 00.007% 2.8 0.000611902 00.016% 3.2 0.001232219 00.034% 3.6 0.002384088 00.069% 4.0 0.004431848 00.135% 4.4 0.007915452 00.256% 4.8 0.013582969 00.466% 5.2 0.02239453 00.820% 5.6 0.035474593 01.390% 6.0 0.053990967 02.275% 6.4 0.078950158 03.593% 6.8 0.110920835 05.480% 7.2 0.149727466 08.076% 7.6 0.194186055 11.507% 8.0 0.241970725 15.866% 8.4 0.289691553 21.186% 8.8 0.333224603 27.425% 9.2 0.36827014 34.458% 9.6 0.391042694 42.074% 10.0 0.39894228 50.000% 35

Probability of sample lying within mean ± 2 standard deviations 0.25 0.2 0.15 0.1 2.275% 94.5% 2.275% 0.05 2 2.8 3.6 4.4 5.2 0 6 6.8 7.6 8.4 9.2 10 10.8 11.6 12.4 13.2 14 14.8 µ 2σ µ µ + 2σ 15.6 16.4 17.2 18 36

Non-normal distributions Some non-normal distributions are so close to normal that it doesn t harm much to pretend that they are normal But some distributions are very far from normal As a stark reminder, suppose we re considering the variable X= error when a person s age is stated. Error = def real age age last birthday Range(X)= 0 to 12 months Which of these values is most probable? 37

Non-normal distributions Some non-normal distributions are so close to normal that it doesn t harm much to pretend that they normal But some distributions are very far from normal As a stark reminder, suppose we re considering the variable X= error when a person s age is state. Error = def real age age last birthday Range(X)= 0 to 12 months Which of these values is most probable? None: always P(x)=1/12 How would you plot this? 38

Uniform probability distribution Also called rectangular distribution P(X) P(X < y) = 1.0 * y = y 1.0 0.0 0.0 1.0 y x 39

Uniform probability distribution P(X) P(X < a) = a P(X < b) = b 1.0 P(a X < b) = b - a 0.0 0.0 a b 1.0 x 40

Towards Inferential Statistics If this was a course on inferential statistics, we would now discuss the role of normal distributions in the statistical testing of hypotheses. The rough idea (in one representative type of situation): Your hypothesis: Brits are taller than Frenchmen Take two samples: heights of Brits, heights of Frenchmen. Assume both populations (as a whole) are normally distributed. Compute the mean values, µ b and µ f, for these two samples. Suppose you find µ b > µ f This is promising. but it could be an accident! Now compute the probability p of measuring these means if, in fact, µ b = µ f in the two populations as a whole (i.e., your finding was an accident). If p < 0.05 then conclude that Brits are taller than Frenchmen. 41

Summing up this lecture Discrete random variable Benouilli trial; series of Bernouilli trials Binomial distribution Probability of r successes out of n trials Continuous random variable Normal (= Gaussian) distribution 42