Lecture 10: Normal RV Lisa Yan July 18, 2018
Announcements Midterm next Tuesday Practice midterm, solutions out on website SCPD students: fill out Google form by today Covers up to and including Friday s lecture PS4 now out (due Monday 7/30) First six problems useful practice for the midterm 2
Fundamental properties of a RV Semantic Meaning Discrete: " # = % ) " * P(X x) F(X) E[X 2 ] Random Variable E[X] Var(X) Std(X) &'( ) " * is a PMF Continuous: " # = + - " *.*, - " * is a PDF ( 3
Goals for today Memorylessness of the exponential distribution Normal distribution 4
Laptop crashes X ~ Exp(l ): * + = -. / 0 1 = 1 " 345 X = # hours of use until your laptop dies. On average, laptops die after 5000 hours of use. Suppose your laptop time-to-death (in hours) is exponentially distributed. You use your laptop 5 hours per day. P(your laptop lasts 4 years)? (i.e., (5)(365)(4) = 7300 hours) Solution: Define: X ~ Exp(l), where l = 1/E[X] = 1/5000 P( X > 7300) = 1 F(7300) = 1 (1 " 7300(1/5000) ) = " 1.46 0.2322 Better plan ahead especially if you re co-terming P(X > 9125) = 1 F(9125) = " 1.825 0.1612 P ( X > 10950) = 1 F(10950) = " 2.19 0.1119 (5 year plan) (6 year plan) 5
Memorylessness Let X = time until some event occurs, where X ~ Exp(l ). What is P(X > s + t X > s)? Solution: P(X > s + t X > s) = P(X > s + t and X > s) P(X > s) P(X > s + t) = = P(X > s) 1 F(s + t) 1 F(s) X ~ Exp(l ): * +, = 1 "#0 Definition of conditional probability "#(&'$) "#& = = "#$ = 1 F(t) = P(X > t) 6
Exponential RVs are Memoryless P(X > s + t X > s) = P(X > t) def memoryless no impact from preceding period s. After initial period of time s, waiting another t units of time is equivalent to waiting t units of time from the start. Means the following in terms of pdf: P(X > t + 1 X > 1) = P(X > t) 7
Appointment times X ~ Exp(l ): / 0 1 = 1 % &'3 You are waiting for your friend at the movie theater. On average, your friend arrives 5 minutes late. Your friend s arrival time is exponentially distributed. If you have already waited 5 minutes and your friend hasn t shown up, what is the probability that your friend will show up within the next 5 minutes? Solution: Define: X ~ Exp(1/5) (because E[X] = 5) WTF: P(5 < X < 10 X > 5) = P(X < 5) = 1 % &'( = 1 % &) = 0.6321 Exponential RVs are memoryless 8
(silent drumroll) 9
Normal Random Variable Normal RV, X: X ~ N(µ, s 2 ) " # $ = 1 σ 2) *+ < $ < (-+.) 0 12 0 µ 6 7 = µ f (x) 9:;(7) = < 1 x Also called: the Gaussian 10
Gaussian symmetry The PDF of a normal RV (aka Gaussian RV) is symmetric about its mean, µ: 11
Carl Friedrich Gauss Carl Friedrich Gauss (1777-1855) was a remarkably influential German mathematician. Did not invent Normal distribution but rather popularized it 12
Why the normal? Common for natural phenomena: height, weight, etc. Most noise in the world is Normal approximately Often results from the sum of many random variables Sample means are distributed normally If independent + equally weighted With sufficient sample sizes (more in 3 weeks) 13
But really, why the normal? Goal in machine learning: 1. Collect training data 2. Learn model (probabilistic model) 3. Test on real data Complex modeling of training data: If you know µ and s 2 but nothing else, a normal distribution is a good starting point. Simple is best: μ probability probability value Probably overfitting; may not generalize value Maximizes entropy for a given mean and variance 14
Normal PDF f(x) x Exponential tails f(x) = 1 p 2 e PDF centered around µ (x µ) 2 2 2 symmetric Normalizing constant, which depends on variance and also converts to correct units variance, manages spread 15
Normal CDF P (a apple X apple b) = Z b a 1 p e 2 (x µ)2 2 2 dx (no closed form lol) 16
Normal CDF F (x) = x µ Has been solved for numerically (we ll spend the next few slides getting here) 17
Properties of the Normal Let ~#(%, ' ( ). Symmetry: * - = * - Linearity: 0 = 1 + 3 then 0 is also Normal. 4 0 = 14 + 3 516 0 = 516 1 + 3 In particular, let 7 = 89: ; = < ; : ; : = 1% + 3 = 1 ( ' ( 0~#(1% + 3, 1 ( ' ( ) 7~#( < ; % : ;, < ;( = ' ( ) 7~#(0,1) The Unit Normal 18
The Standard (Unit) Normal RV, Z 3 H B = 3 H B Let be the standard normal. " = $ = 0 Var Z = * + = 1 Then the CDF is defined as -. / = Φ(/) Φ / = 3 / = 5 67 8 1 * 2: ;6 Solve the following in terms of Φ: 1. P(Z x) = P(Z < x) 2. P(Z > x) = P(Z x) 3. P(Z -x) = P(Z < -x) 4. P(a < Z < b) = < + =6>? Φ(B) @ 1 Φ(/) AB = 5 67 phi of z 8 1 = @ 2: ;6 + AB zero mean, unit variance Φ B = P B = P B = 1 Φ B Φ F Φ G 19
Normal CDF Let ~# $, &' and ( ) * = Φ(*) is the CDF of the unit normal Z. Prove that: F (x) = x µ Define: / = 012, where /~N 0,1 (unit normal). 3 ( 0 7 = 8( 7) = 8( 012 3 :12 3 ) = 8 / :12 3 = Φ 7 $ & = ( ) :12 3 Φ has been solved numerically, meaning we can use Φ to find the CDF of a general normal 20
The Standard Normal Table Find F(0.54), i.e., F(z) where z = 0.54. F(0.54) = 0.7054 Old-school 21
New-school Normal Table scipy.stats.norm(mean, std).cdf(x) standard deviation not variance. you might need math.sqrt here. New-school CS109 Website à Demos à Normal CDF 22
Exploiting symmetry We only need Φ(#) for positive z to compute all CDFs of a normal. By symmetry: Φ ( = P / ( = P / ( = 1 Φ ( Then because: P & ( = * + ( = Φ ( -. We can compute all probabilities of any well-defined normal by using the Standard Normal Table. 23
Break Attendance: tinyurl.com/cs109summer2018 24
Get Your Gaussian On Φ = = P 7 = = P 7 = = 1 Φ = Let ~#(3,16). 2 = 3, 4 5 = 16 (so 4 = 4) 1. *( > 0) = * 3 4 > 0 3 4 = * 7 > 3 4 Get ur freak on Missy Elliott, 2001 = 1 * 7 3 4 = 1 Φ 3 4 = 1 1 Φ 3 4 = Φ 3 4 0.7734 2. *(2 < < 5) 3. *( 3 > 6) 25
Get Your Gaussian On Φ E = P = E = P = E = 1 Φ E Let ~#(3,16). C = 3, D 8 = 16 (so D = 4) 1. * > 0 Φ. 0.7734 / 2. * 2 < < 5 = * 89. < :9. < ;9. / / / = * 9< / < = < 8 / = Φ < 8 Φ = Φ < 8 1 Φ = 0.6915 (1 0.5987) 0.2902 3. * 3 > 6 = * < 3 + * > 9 = * = < 9.9. = Φ. + 1 Φ. = 2 1 Φ. 8 8 8 / + * = > B9. / = 2(1 0.9332) 0.1337 26
Noisy Wires Send a voltage of 2 V or -2 V on wire (to denote 1 and 0, respectively) Let X = voltage sent. R = voltage received, meaning = # + %., where noise %~'(0,1). Decode R: If (R 0.5), then received 1, else, received 0. 1. What is P(error after decoding original bit = 1)? 2. What is P(error after decoding original bit = 0)? Solution: Define:T B B : received bit T original bit B original bit B 1. T B B = 1 T = 0 B = 1 R < 0.5 X = 2. < 0.5 # = 2 =. 2 + % < 0.5 =.(% < 1.5) = Φ 1.5 = 1 Φ 1.5 2. T 0 B = 0 R 0.5 X = -2 Φ 6 = P 8 6 = P 8 6 = 1 Φ 6 0.0668. 2 + % 0.5 =.(% 2.5) = 1 Φ 2.5 0.0062 27
Normal approximates Binomial Let X ~ Bin(n,p). E[X] = np, Var(X) = np(1 p) Poisson is good when n large (> 20), p small (< 0.05) For large n: E[X] Var(X) X #~$(&', &' 1 ' ) Normal is approximately good when: Var(X) = np(1 p) 10 (n large, p medium) Proof is: De Moivre-Laplace Limit Theorem 28
Normal approximates Binomial But wait What is P(X = 55) when X is approximated as a normal? 29
Continuity correction ~Bin(', )) +~,('), ') 1 ) ): 0 = 2 0 2 4 5 < Y < 2 + 4 5 0 2 0 + > 2 4 5 P(X=55) P(54.5 < Y < 55.5) P(X 55) P(Y > 54.5) 30
Continuity Correction "~Bin(1, 3) (~5(13, 13 1 3 ): Discrete (e.g., Binomial) probability question Continuous (Normal) probability question " = 6 5.5 ( 6.5 " 6 ( > 5.5 " > 6 ( > 6.5 " < 6 ( < 5.5 " 6 ( < 6.5 31
Faulty Endorsements 100 people are placed on a special diet Let X = # people whose cholesterol levels decrease. A doctor will endorse the diet if 65 people have their cholesterols decrease. P(Doctor endorses change diet has no effect)? Note: if diet has no effect, then cholesterol levels equally likely to increase/decrease. Solution: Define: X~Bin(100,0.5). WTF: P(X 65) Can we use normal approximation? Define: "~#(50,25) (standard deviation = 5) * + 65 * " 64.5 = * 2345 4 > 78.4345 4 np = 50, np(1 p) = 25 = * 9 > 2.9 = 1 Φ 2.9 0.0019 Using Binomial: P(X 65) 0.0018 32
Summary of the normal RV Let ~# $, & ' and (~#(0,1). Symmetry: Φ / = P ( / = P ( / = 1 Φ / Linearity: 4 = 5 + 7, 4~#(5$ + 7, 5 ' & ' ) CDF of unit normal Calculating probabilities on X: P / = 8 9 / = Φ :;< = Binomial RV X~Bin(n,p) can be approximated with a normal RV Y if: Var(X) = np(1 p) 10, 4~#(>?, >? 1? ) But you must use a continuity correction when approximating a discrete RV with a continuous RV. 33