Probability reminders

Similar documents
Quick Tour of Basic Probability Theory and Linear Algebra

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

2. Suppose (X, Y ) is a pair of random variables uniformly distributed over the triangle with vertices (0, 0), (2, 0), (2, 1).

CSE 312 Final Review: Section AA

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Basics on Probability. Jingrui He 09/11/2007

MATH 151, FINAL EXAM Winter Quarter, 21 March, 2014

Probability and Distributions

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Lecture 2: Repetition of probability theory and statistics

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Chapter 2: Random Variables

Review of Probability. CS1538: Introduction to Simulations

Algorithms for Uncertainty Quantification

Northwestern University Department of Electrical Engineering and Computer Science

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

1 Random Variable: Topics

CME 106: Review Probability theory

This does not cover everything on the final. Look at the posted practice problems for other topics.

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

Lecture 1: August 28

Math Bootcamp 2012 Miscellaneous

Week 12-13: Discrete Probability

1 Exercises for lecture 1

Chapter 3: Random Variables 1

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Recitation 2: Probability

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Review of Probability Theory

CDA6530: Performance Models of Computers and Networks. Chapter 2: Review of Practical Random Variables

. Find E(V ) and var(v ).

Probability Distributions Columns (a) through (d)

Set Theory Digression

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Chapter 3: Random Variables 1

STAT Chapter 5 Continuous Distributions

Random variables. DS GA 1002 Probability and Statistics for Data Science.

ECON Fundamentals of Probability

Twelfth Problem Assignment

Kousha Etessami. U. of Edinburgh, UK. Kousha Etessami (U. of Edinburgh, UK) Discrete Mathematics (Chapter 7) 1 / 13

Lecture 1. ABC of Probability

Midterm Exam 1 Solution

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Chap 2.1 : Random Variables

(Practice Version) Midterm Exam 2

MAS113 Introduction to Probability and Statistics. Proofs of theorems

Spring 2012 Math 541B Exam 1

2. Variance and Covariance: We will now derive some classic properties of variance and covariance. Assume real-valued random variables X and Y.

CS145: Probability & Computing

Given a experiment with outcomes in sample space: Ω Probability measure applied to subsets of Ω: P[A] 0 P[A B] = P[A] + P[B] P[AB] = P(AB)

Random Variables Example:

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Lecture 1: Review on Probability and Statistics

Tom Salisbury

Lecture 11. Probability Theory: an Overveiw

1 Presessional Probability

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Section 9.1. Expected Values of Sums

Deep Learning for Computer Vision

Fundamental Tools - Probability Theory II

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

1 Random variables and distributions

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Probability: Handout

EE4601 Communication Systems

Introduction to Machine Learning

Chapter 5. Chapter 5 sections

3. Review of Probability and Statistics

Lecture Notes 2 Random Variables. Discrete Random Variables: Probability mass function (pmf)

Probability and Statistics

Expectation, inequalities and laws of large numbers

Some Probability and Statistics

Random Variables and Their Distributions

Probability and Statistics

Measure-theoretic probability

Review of Probabilities and Basic Statistics

18.175: Lecture 3 Integration

Lecture 2: Review of Probability

Analysis of Engineering and Scientific Data. Semester

Lecture Notes 5 Convergence and Limit Theorems. Convergence with Probability 1. Convergence in Mean Square. Convergence in Probability, WLLN

Guidelines for Solving Probability Problems

Overview. CSE 21 Day 5. Image/Coimage. Monotonic Lists. Functions Probabilistic analysis

Statistics 100A Homework 5 Solutions

Chp 4. Expectation and Variance

Continuous Distributions

LECTURE 1. Introduction to Econometrics

1: PROBABILITY REVIEW

L2: Review of probability and statistics

Stochastic Models in Computer Science A Tutorial

Lecture 2: Review of Basic Probability Theory

Introduction to Probability and Statistics (Continued)

1 Probability theory. 2 Random variables and probability theory.

Probability Theory and Random Variables

EE514A Information Theory I Fall 2013

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Chapter 5 continued. Chapter 5 sections

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Transcription:

CS246 Winter 204 Mining Massive Data Sets Probability reminders Sammy El Ghazzal selghazz@stanfordedu Disclaimer These notes may contain typos, mistakes or confusing points Please contact the author so that we can improve them for next year Definition: a few reminders Definition Sample space, Event space, Probability measure This definition contains the basic definitions of probability theory: Sample space usually denoted as Ω: the set of all possible outcomes Event space usually denoted as F: a family of subsets of Ω possibly all subsets of Ω Probability Measure Function P : a function that goes from F to R properties: It must have the following P Ω 2 A F, 0 P A 3 P A B P A + P B P A B 4 For a set of disjoint events A,, A p : P A i i p Proposition Union bound Let A and B be two events As we have seen, it holds that: p P A i P A B P A + P B P A B, and in particular the following formula is referred to as the Union Bound: and more generally if E,, E n are events: P A B P A + P B, P E i i n P E i Definition Random variable A random variable X is a function from the sample space to R

2 Probability reminders Definition Cumulative distribution function Let X be a random variable Its cumulative distribution function cdf F is defined as: F is monotonically increasing and verifies: F x P X x lim F x 0 and lim x F x x + Definition Probability density function Let X be a continuous random variable X, the probability density function p X of X is defined when it exists as the function such that: df x p X xdx, where F is the cumulative distribution function of X p X is non-negative and verifies: R p X xdx Definition Expectation Let X be a continuous resp discrete random variable The expectation of X is defined as: E X p X xxdx resp P X x x Proposition Linearity of expectation The expectation is linear, that is, if X and Y are random variables, then: R E X + Y E X + E Y and a R, E ax ae X Definition Variance Let X be a random variable The variance of X is defined as: varx E X E X 2 E X 2 E X 2 The standard deviation of X also often denoted as σ X is then defined as: x stdx varx Definition Covariance Let X and Y be two random variables The covariance of X and Y is defined as: covx, Y E X E XY E Y E XY E X E Y In particular, note that: varx covx, X The correlation coefficient between X and Y is then defined as: corrx, Y covx, Y σ X σ Y The correlation can therefore be thought of as the normalized covariance

Probability reminders 3 Proposition Variance of sum of random variables Let X and Y be random variables varx + Y varx + 2covX, Y + vary and a R, varax a 2 varx Definition Independence Let X and Y be random variables We say that X and Y are independent if and only if: U, V, P X U, Y V P X U P Y V Proposition Independence and covariance Let X and Y be two random variables If X and Y are independent, then: The opposite is not true in general covx, Y 0 Note that this result implies that if X and Y are independent random variables, then: Definition Bayes rule Let A and B be two events varx + Y varx + vary P A B P A B P B Proposition Law of total probability Let A be an event and B be a non-negative discrete random variable P A k0 P A B k P B k Definition Indicator variable Let A be an event We define the indicator variable denoted as I A or A as: { if A occurs I A 0 otherwise I A has the following property: E I A P A 2 Common distributions Let us start with the common distributions of discrete random variables: X Bp: X is a Bernoulli random variable with parameter p if and only if: P X p and P X 0 p Example Coin flip E X p and varx p p

4 Probability reminders X Bn, p: X is a Binomial random variable with parameters n and p if and only if: n P X k p k p n k k E X np and varx np p Example Number of heads obtained in n coin flips X Pλ: X is a Poisson random variable of parameter λ if and only if: Example Number of people arriving in a queue P X k e λ λ k E X λ and varx λ The most common continous distribution that we will use is the Gaussian distribution: X N µ, σ 2 if and only if the probability density function of X is: p X x x µ2 e 2σ 2 2πσ 2 E X µ and varx σ 2 3 Common inequalities Proposition Markov inequality Let X be a non-negative random variable Proposition Chebychev inequality Let X be a random variable P X a E X a P X E X a varx a 2 Proposition Hoeffding inequality Let X,, X n be iid random variables such that: i n, X i [0, ] We denote by µ the expectation of the X i s P n X i µ ɛ 2e 2nɛ2 Proposition Jensen inequality Let φ be a convex function and X be a random variable E φx φe X

Probability reminders 5 Proposition The following limit holds: x R, Note that the following inequality also holds: x R, Proposition Stirling formula An equivalent of n! when n goes to infinity is: lim x n e n n x x n n e x n! n 2πne n n n 4 Maximum Likelihood Estimation The method of maximum likelihood estimation MLE is a method that can be used to estimate the parameters of a model The goal is to find the value of the parameters that maximize the probability of the data sample ie the likelihood being observed given the model For instance, if you assume that some dataset can be modeled as Gaussian and you want to estimate the parameters of the Gaussian mean and variance, MLE is a good method to use and it will find the parameters that fit best the data Let us go through an example to explain the method: say we have n data points x,, x n drawn iid from a Gaussian distribution N µ, σ 2 and we want to estimate µ and σ 2 from these samples We compute the likelihood of the observations: Lµ, σ px,, x n ; µ, σ n x 2πσ 2 e i µ 2 2σ 2, and the goal is now to maximize L with respect to the parameters µ and σ As we have a product and exponentials, it is in fact easier to maximize the log-likelood defined as: lµ, σ log Lµ, σ n 2 log2πσ2 x i µ 2 2σ 2 We now look for µ and σ by solving: The first equation gives us: l µ µ, σ 0 l σ µ, σ 0 l µ µ, σ x i µ σ 2,

6 Probability reminders and therefore to make this derivative be 0, we need: The second equation gives: µ n x i l σ µ, σ n n σ + x i µ 2 σ 3, and therefore to make this derivative be 0, we need: σ x i µ n 2, which concludes the estimation of the parameters of the model 5 Exercises Let X be a random variable that takes non-negative integer values Prove that: E X k P X k We compute: P X k k k uk u k u E X P X u u P X u up X u Change order of summation 2 Birthday Paradox Let us consider a room with n 365 people Compute the probability that two people in the room share the same birthday for simplicity, we will consider that a year is 365 days We start by computing the probability that no two people have the same birthday One way to solve the problem is to look at how many possibilities each person taken in a fixed order has: the first person will have 365 possible days, the second 364, This gives: P No two people have the same birthday 365 365 364 365 n 365 365 365! 365 n! 365 n n! 365 n 365 n

Probability reminders 7 Therefore, the probability that two people share the same birthday is: 365 n n! 365 n 3 Show that if X Pλ and Y X k Bk, p then Y Pλp Hint: Use the definitions of the Poisson and Binomial distributions, as well as the law of total probability We compute: which proves the result P Y k u0 uk uk pk e λ pk e λ P Y k X u P X u P Y k X u P X u u p k p u k e λ λ u k u! uk λpk e λ λpk e λ uk λ u p u k u k! λ u k λ k p u k uk q0 λpk e λ e λ p λpk e λp, u k! λ u k p u k u k! λ q p q q! 4 Why do you need to assume that X be a non-negative random variable in Markov inequality find an example with a random variable that is non-negative and the proposition does not hold? Let X be a random variable that takes the values - and with probability 05 Then E X 0 and if we take a 0 then Markov inequality does not hold 05 on the left-hand side and 0 on the right hand-side 5 Let us consider a setting with n birds and k boxes Assume each bird picks one of the boxes uniformly at random and independently from the other birds a Let X be the random variable representing the number of birds in box number 0 What are E X and varx?

8 Probability reminders b Compute the probability p 0 that birds i and j > i i and j are some fixed indices end up in box 0 From this result, compute the probability that birds i and j end up in the same box not necessarily box 0 c Assuming that k n 2, find a simple upper-bound on p 0 a Let us denote by B i the event that bird i goes to box 0 The B i are iid random variables distributed as B k We therefore compute by linearity of expectation: n E X E B i E B i n k For the variance, you need to be a bit more careful as it is not a linear operator But here, the choices are independent and therefore: n varx var B i Independence varb i n k k b Let us call M ij the event that birds i and j > i end up in box 0 and no other bird goes to box 0 We have, by independence: p 0 P M ij k 2 k n 2, Let us call A p the event that birds i and j end up in box p The A p p k are clearly disjoint and therefore: P i and j end up in the same box P A p p k k P A p p }{{} p 0 kp 0 n 2 k k c Using the common inequality: we compute: p p e, p 0 en 2 2 6 Let us consider n points x,, x n drawn iid from a Bernoulli distribution of parameter φ Compute an estimate the parameter φ using MLE Hint: you can write the lihelihood of a single point as px; φ φ x φ x The likelihood for the n points is: Lφ n φ xi φ xi φ n xi φ n n xi

Probability reminders 9 To simplify the notation, let us use the shortcut: S x i, and to simplify the calculations, let us use the log-likelihood: We find the MLE φ by solving: lφ S log φ + n S log φ dl dφ φ 0 S φ n S φ 0 φ S n n x i 7 Let us consider two boxes Box contains 30 white balls and 0 black balls Box 2 contains 20 white and 20 black balls Someone draws one ball at random among the 80 balls The drawn ball is white What is the probability that the ball was drawn from the first box? Let us denote by B resp B 2 the event that the ball was drawn from box resp 2 Let us denote by W the event that the drawn ball was white We are looking for: P B W P B, W P W We compute: and: We conclude: P B, W P W B P B 3 4 2 3 8, P W 50 80 5 8 P B W 3 8 5 8 06