Lecture 10: Normal RV. Lisa Yan July 18, 2018

Similar documents
The Normal Distribution

STAT 430/510 Probability Lecture 12: Central Limit Theorem and Exponential Distribution

Lecture 18: Central Limit Theorem. Lisa Yan August 6, 2018

Lecture 08: Poisson and More. Lisa Yan July 13, 2018

STAT Chapter 5 Continuous Distributions

Poisson Chris Piech CS109, Stanford University. Piech, CS106A, Stanford University

Continuous Variables Chris Piech CS109, Stanford University

Lecture 13: Covariance. Lisa Yan July 25, 2018

Bernoulli and Binomial

Chapter 5 continued. Chapter 5 sections

Intro to Probability Instructor: Alexandre Bouchard

Continuous Random Variables and Continuous Distributions

Math/Stat 352 Lecture 9. Section 4.5 Normal distribution

Lecture 13. Poisson Distribution. Text: A Course in Probability by Weiss 5.5. STAT 225 Introduction to Probability Models February 16, 2014

Special distributions

Chapter 5. Chapter 5 sections

STA 111: Probability & Statistical Inference

Debugging Intuition. How to calculate the probability of at least k successes in n trials?

CS145: Probability & Computing

STAT 430/510: Lecture 10

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

continuous random variables

Normal Distribution and Central Limit Theorem

Slides 8: Statistical Models in Simulation

Special Discrete RV s. Then X = the number of successes is a binomial RV. X ~ Bin(n,p).

S n = x + X 1 + X X n.

Continuous Random Variables. What continuous random variables are and how to use them. I can give a definition of a continuous random variable.

2DI90 Probability & Statistics. 2DI90 Chapter 4 of MR

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Chapter Learning Objectives. Probability Distributions and Probability Density Functions. Continuous Random Variables

ECE 313 Probability with Engineering Applications Fall 2000

Lecture 14. Text: A Course in Probability by Weiss 5.6. STAT 225 Introduction to Probability Models February 23, Whitney Huang Purdue University

STAT509: Continuous Random Variable

STAT 516 Midterm Exam 2 Friday, March 7, 2008

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Continuous Expectation and Variance, the Law of Large Numbers, and the Central Limit Theorem Spring 2014

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation

Chapter 4: Continuous Probability Distributions

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Chapter 3 Common Families of Distributions

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

Chapter 4 Multiple Random Variables

HW7 Solutions. f(x) = 0 otherwise. 0 otherwise. The density function looks like this: = 20 if x [10, 90) if x [90, 100]

Probability Theory for Machine Learning. Chris Cremer September 2015

Introduction to Probability

Chapter 4. Continuous Random Variables

1. I had a computer generate the following 19 numbers between 0-1. Were these numbers randomly selected?

Twelfth Problem Assignment

Topic 4: Continuous random variables

CSE 312, 2017 Winter, W.L. Ruzzo. 7. continuous random variables

The Binomial distribution. Probability theory 2. Example. The Binomial distribution

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Moments. Raw moment: February 25, 2014 Normalized / Standardized moment:

Chapter 3 sections. SKIP: 3.10 Markov Chains. SKIP: pages Chapter 3 - continued

Overfitting, Bias / Variance Analysis

Exponential Distribution and Poisson Process

Continuous random variables

Probability Distributions: Continuous

SDS 321: Introduction to Probability and Statistics

Massachusetts Institute of Technology Department of Electrical Engineering & Computer Science 6.041/6.431: Probabilistic Systems Analysis

Guidelines for Solving Probability Problems

Central Theorems Chris Piech CS109, Stanford University

CS145: Probability & Computing Lecture 11: Derived Distributions, Functions of Random Variables

Binomial in the Limit

ESS011 Mathematical statistics and signal processing

Discrete Random Variables

Chapter 4: Continuous Random Variable

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Basic concepts of probability theory

Stat 100a, Introduction to Probability.

Introduction and Overview STAT 421, SP Course Instructor

Common ontinuous random variables

Intro to probability concepts

Topic 4: Continuous random variables

Independent random variables

Definition: A random variable X is a real valued function that maps a sample space S into the space of real numbers R. X : S R

Random variables. DS GA 1002 Probability and Statistics for Data Science.

CS 237: Probability in Computing

Chapter 7: Theoretical Probability Distributions Variable - Measured/Categorized characteristic

STAT 430/510: Lecture 15

SDS 321: Introduction to Probability and Statistics

Basic concepts of probability theory

Closed book and notes. 60 minutes. Cover page and four pages of exam. No calculators.

15 Discrete Distributions

Review for the previous lecture

Lecture 4: Random Variables and Distributions

Math Spring Practice for the final Exam.

11/16/2017. Chapter. Copyright 2009 by The McGraw-Hill Companies, Inc. 7-2

ECE Homework Set 2

The exponential distribution and the Poisson process

Probability Density Functions

Continuous Distributions

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

ACM 116: Lecture 2. Agenda. Independence. Bayes rule. Discrete random variables Bernoulli distribution Binomial distribution

Random Variables and Their Distributions

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty

1 Bernoulli Distribution: Single Coin Flip

Things to remember when learning probability distributions:

Transcription:

Lecture 10: Normal RV Lisa Yan July 18, 2018

Announcements Midterm next Tuesday Practice midterm, solutions out on website SCPD students: fill out Google form by today Covers up to and including Friday s lecture PS4 now out (due Monday 7/30) First six problems useful practice for the midterm 2

Fundamental properties of a RV Semantic Meaning Discrete: " # = % ) " * P(X x) F(X) E[X 2 ] Random Variable E[X] Var(X) Std(X) &'( ) " * is a PMF Continuous: " # = + - " *.*, - " * is a PDF ( 3

Goals for today Memorylessness of the exponential distribution Normal distribution 4

Laptop crashes X ~ Exp(l ): * + = -. / 0 1 = 1 " 345 X = # hours of use until your laptop dies. On average, laptops die after 5000 hours of use. Suppose your laptop time-to-death (in hours) is exponentially distributed. You use your laptop 5 hours per day. P(your laptop lasts 4 years)? (i.e., (5)(365)(4) = 7300 hours) Solution: Define: X ~ Exp(l), where l = 1/E[X] = 1/5000 P( X > 7300) = 1 F(7300) = 1 (1 " 7300(1/5000) ) = " 1.46 0.2322 Better plan ahead especially if you re co-terming P(X > 9125) = 1 F(9125) = " 1.825 0.1612 P ( X > 10950) = 1 F(10950) = " 2.19 0.1119 (5 year plan) (6 year plan) 5

Memorylessness Let X = time until some event occurs, where X ~ Exp(l ). What is P(X > s + t X > s)? Solution: P(X > s + t X > s) = P(X > s + t and X > s) P(X > s) P(X > s + t) = = P(X > s) 1 F(s + t) 1 F(s) X ~ Exp(l ): * +, = 1 "#0 Definition of conditional probability "#(&'$) "#& = = "#$ = 1 F(t) = P(X > t) 6

Exponential RVs are Memoryless P(X > s + t X > s) = P(X > t) def memoryless no impact from preceding period s. After initial period of time s, waiting another t units of time is equivalent to waiting t units of time from the start. Means the following in terms of pdf: P(X > t + 1 X > 1) = P(X > t) 7

Appointment times X ~ Exp(l ): / 0 1 = 1 % &'3 You are waiting for your friend at the movie theater. On average, your friend arrives 5 minutes late. Your friend s arrival time is exponentially distributed. If you have already waited 5 minutes and your friend hasn t shown up, what is the probability that your friend will show up within the next 5 minutes? Solution: Define: X ~ Exp(1/5) (because E[X] = 5) WTF: P(5 < X < 10 X > 5) = P(X < 5) = 1 % &'( = 1 % &) = 0.6321 Exponential RVs are memoryless 8

(silent drumroll) 9

Normal Random Variable Normal RV, X: X ~ N(µ, s 2 ) " # $ = 1 σ 2) *+ < $ < (-+.) 0 12 0 µ 6 7 = µ f (x) 9:;(7) = < 1 x Also called: the Gaussian 10

Gaussian symmetry The PDF of a normal RV (aka Gaussian RV) is symmetric about its mean, µ: 11

Carl Friedrich Gauss Carl Friedrich Gauss (1777-1855) was a remarkably influential German mathematician. Did not invent Normal distribution but rather popularized it 12

Why the normal? Common for natural phenomena: height, weight, etc. Most noise in the world is Normal approximately Often results from the sum of many random variables Sample means are distributed normally If independent + equally weighted With sufficient sample sizes (more in 3 weeks) 13

But really, why the normal? Goal in machine learning: 1. Collect training data 2. Learn model (probabilistic model) 3. Test on real data Complex modeling of training data: If you know µ and s 2 but nothing else, a normal distribution is a good starting point. Simple is best: μ probability probability value Probably overfitting; may not generalize value Maximizes entropy for a given mean and variance 14

Normal PDF f(x) x Exponential tails f(x) = 1 p 2 e PDF centered around µ (x µ) 2 2 2 symmetric Normalizing constant, which depends on variance and also converts to correct units variance, manages spread 15

Normal CDF P (a apple X apple b) = Z b a 1 p e 2 (x µ)2 2 2 dx (no closed form lol) 16

Normal CDF F (x) = x µ Has been solved for numerically (we ll spend the next few slides getting here) 17

Properties of the Normal Let ~#(%, ' ( ). Symmetry: * - = * - Linearity: 0 = 1 + 3 then 0 is also Normal. 4 0 = 14 + 3 516 0 = 516 1 + 3 In particular, let 7 = 89: ; = < ; : ; : = 1% + 3 = 1 ( ' ( 0~#(1% + 3, 1 ( ' ( ) 7~#( < ; % : ;, < ;( = ' ( ) 7~#(0,1) The Unit Normal 18

The Standard (Unit) Normal RV, Z 3 H B = 3 H B Let be the standard normal. " = $ = 0 Var Z = * + = 1 Then the CDF is defined as -. / = Φ(/) Φ / = 3 / = 5 67 8 1 * 2: ;6 Solve the following in terms of Φ: 1. P(Z x) = P(Z < x) 2. P(Z > x) = P(Z x) 3. P(Z -x) = P(Z < -x) 4. P(a < Z < b) = < + =6>? Φ(B) @ 1 Φ(/) AB = 5 67 phi of z 8 1 = @ 2: ;6 + AB zero mean, unit variance Φ B = P B = P B = 1 Φ B Φ F Φ G 19

Normal CDF Let ~# $, &' and ( ) * = Φ(*) is the CDF of the unit normal Z. Prove that: F (x) = x µ Define: / = 012, where /~N 0,1 (unit normal). 3 ( 0 7 = 8( 7) = 8( 012 3 :12 3 ) = 8 / :12 3 = Φ 7 $ & = ( ) :12 3 Φ has been solved numerically, meaning we can use Φ to find the CDF of a general normal 20

The Standard Normal Table Find F(0.54), i.e., F(z) where z = 0.54. F(0.54) = 0.7054 Old-school 21

New-school Normal Table scipy.stats.norm(mean, std).cdf(x) standard deviation not variance. you might need math.sqrt here. New-school CS109 Website à Demos à Normal CDF 22

Exploiting symmetry We only need Φ(#) for positive z to compute all CDFs of a normal. By symmetry: Φ ( = P / ( = P / ( = 1 Φ ( Then because: P & ( = * + ( = Φ ( -. We can compute all probabilities of any well-defined normal by using the Standard Normal Table. 23

Break Attendance: tinyurl.com/cs109summer2018 24

Get Your Gaussian On Φ = = P 7 = = P 7 = = 1 Φ = Let ~#(3,16). 2 = 3, 4 5 = 16 (so 4 = 4) 1. *( > 0) = * 3 4 > 0 3 4 = * 7 > 3 4 Get ur freak on Missy Elliott, 2001 = 1 * 7 3 4 = 1 Φ 3 4 = 1 1 Φ 3 4 = Φ 3 4 0.7734 2. *(2 < < 5) 3. *( 3 > 6) 25

Get Your Gaussian On Φ E = P = E = P = E = 1 Φ E Let ~#(3,16). C = 3, D 8 = 16 (so D = 4) 1. * > 0 Φ. 0.7734 / 2. * 2 < < 5 = * 89. < :9. < ;9. / / / = * 9< / < = < 8 / = Φ < 8 Φ = Φ < 8 1 Φ = 0.6915 (1 0.5987) 0.2902 3. * 3 > 6 = * < 3 + * > 9 = * = < 9.9. = Φ. + 1 Φ. = 2 1 Φ. 8 8 8 / + * = > B9. / = 2(1 0.9332) 0.1337 26

Noisy Wires Send a voltage of 2 V or -2 V on wire (to denote 1 and 0, respectively) Let X = voltage sent. R = voltage received, meaning = # + %., where noise %~'(0,1). Decode R: If (R 0.5), then received 1, else, received 0. 1. What is P(error after decoding original bit = 1)? 2. What is P(error after decoding original bit = 0)? Solution: Define:T B B : received bit T original bit B original bit B 1. T B B = 1 T = 0 B = 1 R < 0.5 X = 2. < 0.5 # = 2 =. 2 + % < 0.5 =.(% < 1.5) = Φ 1.5 = 1 Φ 1.5 2. T 0 B = 0 R 0.5 X = -2 Φ 6 = P 8 6 = P 8 6 = 1 Φ 6 0.0668. 2 + % 0.5 =.(% 2.5) = 1 Φ 2.5 0.0062 27

Normal approximates Binomial Let X ~ Bin(n,p). E[X] = np, Var(X) = np(1 p) Poisson is good when n large (> 20), p small (< 0.05) For large n: E[X] Var(X) X #~$(&', &' 1 ' ) Normal is approximately good when: Var(X) = np(1 p) 10 (n large, p medium) Proof is: De Moivre-Laplace Limit Theorem 28

Normal approximates Binomial But wait What is P(X = 55) when X is approximated as a normal? 29

Continuity correction ~Bin(', )) +~,('), ') 1 ) ): 0 = 2 0 2 4 5 < Y < 2 + 4 5 0 2 0 + > 2 4 5 P(X=55) P(54.5 < Y < 55.5) P(X 55) P(Y > 54.5) 30

Continuity Correction "~Bin(1, 3) (~5(13, 13 1 3 ): Discrete (e.g., Binomial) probability question Continuous (Normal) probability question " = 6 5.5 ( 6.5 " 6 ( > 5.5 " > 6 ( > 6.5 " < 6 ( < 5.5 " 6 ( < 6.5 31

Faulty Endorsements 100 people are placed on a special diet Let X = # people whose cholesterol levels decrease. A doctor will endorse the diet if 65 people have their cholesterols decrease. P(Doctor endorses change diet has no effect)? Note: if diet has no effect, then cholesterol levels equally likely to increase/decrease. Solution: Define: X~Bin(100,0.5). WTF: P(X 65) Can we use normal approximation? Define: "~#(50,25) (standard deviation = 5) * + 65 * " 64.5 = * 2345 4 > 78.4345 4 np = 50, np(1 p) = 25 = * 9 > 2.9 = 1 Φ 2.9 0.0019 Using Binomial: P(X 65) 0.0018 32

Summary of the normal RV Let ~# $, & ' and (~#(0,1). Symmetry: Φ / = P ( / = P ( / = 1 Φ / Linearity: 4 = 5 + 7, 4~#(5$ + 7, 5 ' & ' ) CDF of unit normal Calculating probabilities on X: P / = 8 9 / = Φ :;< = Binomial RV X~Bin(n,p) can be approximated with a normal RV Y if: Var(X) = np(1 p) 10, 4~#(>?, >? 1? ) But you must use a continuity correction when approximating a discrete RV with a continuous RV. 33