Review of Probabilities and Basic Statistics

Similar documents
Recitation 2: Probability

Probability and Estimation. Alan Moses

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

A Probability Primer. A random walk down a probabilistic path leading to some stochastic thoughts on chance events and uncertain outcomes.

STAT 302 Introduction to Probability Learning Outcomes. Textbook: A First Course in Probability by Sheldon Ross, 8 th ed.

Deep Learning for Computer Vision

Quick Tour of Basic Probability Theory and Linear Algebra

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 1: Probability Fundamentals

Basics on Probability. Jingrui He 09/11/2007

Learning Objectives for Stat 225

Lecture 2: Repetition of probability theory and statistics

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Probability Theory for Machine Learning. Chris Cremer September 2015

Lecture 1: Bayesian Framework Basics

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

M378K In-Class Assignment #1

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

Math Review Sheet, Fall 2008

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Chapter 5. Chapter 5 sections

Introduction to Probability and Statistics (Continued)

Single Maths B: Introduction to Probability

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

Intro to Probability. Andrei Barbu

Introduction to Machine Learning CMU-10701

Data Mining Techniques. Lecture 3: Probability

Probability Review. Chao Lan

Lecture 3 - Axioms of Probability

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Stochastic Models of Manufacturing Systems

Week 2. Review of Probability, Random Variables and Univariate Distributions

Class 26: review for final exam 18.05, Spring 2014

Dept. of Linguistics, Indiana University Fall 2015

Probability 1 (MATH 11300) lecture slides

This does not cover everything on the final. Look at the posted practice problems for other topics.

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

EE514A Information Theory I Fall 2013

Communication Theory II

Discrete Probability Distributions

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Lecture 16. Lectures 1-15 Review

Terminology. Experiment = Prior = Posterior =

Probability Theory and Applications

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Advanced Probabilistic Modeling in R Day 1

Discrete Random Variables

Machine Learning

Probability Review. Gonzalo Mateos

TABLE OF CONTENTS CHAPTER 1 COMBINATORIAL PROBABILITY 1

1 Random Variable: Topics

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Counting principles, including permutations and combinations.

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

CSE 312 Final Review: Section AA

Random Variables and Their Distributions

Time Series and Dynamic Models

Data Analysis and Monte Carlo Methods

Review of probability

Continuous Distributions

Algorithms for Uncertainty Quantification

Probability. Lecture Notes. Adolfo J. Rumbos

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Things to remember when learning probability distributions:

Expectation. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Information Science 2

CME 106: Review Probability theory

Product measure and Fubini s theorem

MATH 556: PROBABILITY PRIMER

Brief Review of Probability

Probability Notes. Compiled by Paul J. Hurtado. Last Compiled: September 6, 2017

Probability Theory. Patrick Lam

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

Theorem 2.1 (Caratheodory). A (countably additive) probability measure on a field has an extension. n=1

Lecture 2: Probability and Distributions

1 Presessional Probability

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Probability and Probability Distributions. Dr. Mohammed Alahmed

Sample Spaces, Random Variables

Bayesian RL Seminar. Chris Mansley September 9, 2008

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

1 INFO Sep 05

Review: mostly probability and some statistics

CS 109 Review. CS 109 Review. Julia Daniel, 12/3/2018. Julia Daniel

More on Distribution Function

Expectation. DS GA 1002 Probability and Statistics for Data Science. Carlos Fernandez-Granda

Brief Review of Probability

Lecture 1: Review on Probability and Statistics

Fundamentals of Probability CE 311S

MAT 271E Probability and Statistics

Bayesian Inference and MCMC

MA : Introductory Probability

Bivariate distributions

6.1 Moment Generating and Characteristic Functions

MATH c UNIVERSITY OF LEEDS Examination for the Module MATH2715 (January 2015) STATISTICAL METHODS. Time allowed: 2 hours

Transcription:

Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 2

Review: the concept of probability Sample space Ω set of all possible outcomes Event E Ω a subset of the sample space Probability measure maps Ω to unit interval How likely is that event E will occur? Kolmogorov axioms P E 0 P Ω = 1 i=1 P E 1 E 2 = P(E i ) Ω E 1/25/2013 Introduction to Probability Theory 3

Reasoning with events Venn Diagrams P A = Vol(A)/Vol (Ω) Event union and intersection P A B = P A + P B P A B Properties of event union/intersection Commutativity: A B = B A; A B = B A Associativity: A B C Distributivity: A B C = (A B) C = (A B) (A C) 1/25/2013 Introduction to Probability Theory 4

Reasoning with events DeMorgan s Laws (A B) C = A C B C (A B) C = A C B C Proof for law #1 - by double containment (A B) C A C B C A C B C (A B) C 1/25/2013 Introduction to Probability Theory 5

Reasoning with events Disjoint (mutually exclusive) events P A B = 0 P A B examples: A and A C partitions = P A + P(B) NOT the same as independent events For instance, successive coin flips S 1 S 2 S 3 S 4 S 5 S 6 1/25/2013 Introduction to Probability Theory 6

Partitions Partition S 1 S n Events cover sample space S 1 S n = Ω Events are pairwise disjoint S i S j = Event reconstruction n i=1 i ) P A = P(A S Boole s inequality P Bayes Rule n i=1 A i i=1 P(A i ) P S i A = P A S i P(S i ) n j=1 P A S j P(S j ) 1/25/2013 Introduction to Probability Theory 7

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 8

Random Variables Random variable associates a value to the outcome of a randomized event Sample space X: possible values of rv X Example: event to random variable Draw 2 numbers between 1 and 4. Let r.v. X be their sum. E 11 12 13 14 21 22 23 24 31 32 33 34 41 42 43 44 X(E) 2 3 4 5 3 4 5 6 4 5 6 7 5 6 7 8 Induced probability function on X. x 2 3 4 5 6 7 8 P(X=x) 1 16 2 16 3 16 4 16 3 16 2 16 1 16 1/25/2013 Random Variables 9

Cumulative Distribution Functions F X x = P X x x X The CDF completely determines the probability distribution of an RV The function F x is a CDF i.i.f lim x F x = 0 and lim x F x = 1 F x is a non-decreasing function of x F x is right continuous: x 0 x x0 lim F x = F(x 0 ) x > x 0 1/25/2013 Random Variables 10

Identically distributed RVs Two random variables X 1 and X 2 are identically distributed iif for all sets of values A P X 1 A = P X 2 A So that means the variables are equal? NO. Example: Let s toss a coin 3 times and let X H and X F represent the number of heads/tails respectively They have the same distribution but X H = 1 X F 1/25/2013 Random Variables 11

Discrete vs. Continuous RVs Step CDF X is discrete Probability mass f X x = P X = x x Continuous CDF X is continuous Probability density x F X x = f X t dt x 1/25/2013 Random Variables 12

Interval Probabilities Obtained by integrating the area under the curve P x 1 X x 2 = x 2 x 1 f x x dx x 1 x 2 This explains why P(X=x) = 0 for continuous distributions! P X = x lim[f x x F x (x ε)] = 0 ε 0 ε >0 1/25/2013 Random Variables 13

Moments Expectations The expected value of a function g depending on a r.v. X~P is defined as Eg X = g(x)p x dx n th moment of a probability distribution μ n = x n P x dx mean μ = μ 1 n th central moment μ n = x μ n P x dx Variance σ 2 = μ 2 1/25/2013 Random Variables 14

Multivariate Distributions Example Joint Uniformly draw X and Y from the set {1,2,3} 2 W = X + Y; V = X Y P X, Y A = f(x, y) (x,y)εa Marginal f Y y = f(x, y) x For independent RVs: f x 1,, x n = f X1 x 1 f Xn (x n ) W V 0 1 2 P W 2 1/9 0 0 1/9 3 0 2/9 0 2/9 4 1/9 0 2/9 3/9 5 0 2/9 0 2/9 6 1/9 0 0 1/9 P V 3/9 4/9 2/9 1 1/25/2013 Random Variables 15

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 16

Bernoulli 1 with probability p X = 0 with probability 1 p 0 p 1 Mean and Variance EX = 1p + 0 1 p = p VarX = 1 p 2 p + 0 p 2 1 p = p(1 p) MLE: sample mean Connections to other distributions: If X 1 X n ~ Bern(p) then Y = n i=1 X i is Binomial(n, p) Geometric distribution the number of Bernoulli trials needed to get one success 1/25/2013 Properties of Common Distributions 17

n P X = x; n, p = x Mean and Variance Binomial px (1 p) n x EX = n x n x=0 x px (1 p) n x VarX = np(1 p) NOTE: Sum of Bin is Bin VarX = EX 2 (EX) 2 Conditionals on Bin are Bin = = np 1/25/2013 Properties of Common Distributions 18

Properties of the Normal Distribution Operations on normally-distributed variables X 1, X 2 ~ Norm 0,1, then X 1 ± X 2 ~N(0,2) X 1 /X 2 ~ Cauchy(0,1) X 1 ~ Norm μ 1, σ 1 2, X 2 ~ Norm μ 2, σ 2 2 and X 1 X 2 then Z = X 1 + X 2 ~ Norm μ 1 + μ 2, σ 1 2 + σ 2 2 If X, Y ~ N μ x μ y, ρσ X σ Y 2, then ρσ X σ Y σ Y σ X 2 X + Y is still normally distributed, the mean is the sum of the means and the variance is σ X+Y 2 = σ X 2 + σ Y 2 + 2ρσ X σ Y, where ρ is the correlation 1/25/2013 Properties of Common Distributions 19

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 20

Estimating Distribution Parameters Let X 1 X n be a sample from a distribution parameterized by θ How can we estimate The mean of the distribution? n i=1 Possible estimator: 1 X n i The median of the distribution? Possible estimator: median(x 1 X n ) The variance of the distribution? Possible estimator: 1 n (X i X) 2 n i=1 1/25/2013 Estimators 21

Bias-Variance Tradeoff When estimating a quantity θ, we evaluate the performance of an estimator by computing its risk expected value of a loss function R θ, θ = E L(θ, θ), where L could be Mean Squared Error Loss 0/1 Loss Hinge Loss (used for SVMs) Bias-Variance Decomposition: Y = f x Err x = E f x f x 2 + ε = (E f x f(x)) 2 +E f x E f x 2 + σε 2 Bias Variance 1/25/2013 Estimators 22

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 23

Review: Conditionals Conditional Variables P X Y = P(X,Y) P(Y) Conditional Independence note X;Y is a different r.v. X Y Z X and Y are cond. independent given Z iif P X, Y Z = P X Z P Y Z Properties of Conditional Independence Symmetry X Y Z Y X Z Decomposition X Y, W Z X Y Z Weak Union X Y, W Z X Y Z, W Can you prove these? Contraction (X W Z, Y), X Y Z X Y, W Z 1/25/2013 Conditional Probabilities 24

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 25

Priors and Posteriors We ve so far introduced the likelihood function P Data θ) - the likelihood of the data given the parameter of the distribution θ MLE = argmax θ P Data θ) What if not all values of θ are equally likely? θ itself is distributed according to the prior P θ Apply Bayes rule P Data θ)p(θ) P θ Data) = P(Data) θ MAP = argmax θ P(θ Data) = argmax θ P Data θ)p(θ) 1/25/2013 Bayes Rule 26

Conjugate Priors If the posterior distributions P θ Data) are in the same family as the prior prob. distribution P θ, then the prior and the posterior are called conjugate distributions and P θ is called conjugate prior Some examples Likelihood Bernoulli/Binomial Poisson (MV) Normal with known (co)variance Exponential Multinomial Conjugate Prior Beta Gamma Normal Gamma Dirichlet How to compute the parameters of the Posterior? I ll send a derivation 1/25/2013 Bayes Rule 27

Probabilistic Inference Problem: You re planning a weekend biking trip with your best friend, Min. Alas, your path to outdoor leisure is strewn with many hurdles. If it happens to rain, your chances of biking reduce to half not counting other factors. Independent of this, Min might be able to bring a tent, the lack of which will only matter if you notice the symptoms of a flu before the trip. Finally, the trip won t happen if your advisor is unhappy with your weekly progress report. 1/25/2013 Probabilistic Inference 28

Probabilistic Inference Problem: You re planning a weekend biking trip with your best friend, Min. Your path to outdoor leisure is strewn with many hurdles. If it happens to rain, your chances of biking reduce to half not counting other factors. Independent of this, Min might be able to bring a tent, the lack of which will only matter if you notice the symptoms of a flu before the trip. Finally, the trip won t happen if your advisor is unhappy with your weekly progress report. Variables: O the outdoor trip happens A advisor is happy R it rains that day T you have a tent F you show flu symptoms 1/25/2013 Probabilistic Inference 29

Probabilistic Inference Problem: You re planning a weekend biking trip with your best friend, Min. Alas, your path to outdoor leisure is strewn with many hurdles. If it happens to rain, your chances of biking reduce to half not counting other factors. Independent of this, Min might be able to bring a tent, the lack of which will only matter if you notice the symptoms of a flu before the trip. Finally, the trip won t happen if your advisor is unhappy with your weekly progress report. Variables: O the outdoor trip happens A advisor is happy R it rains that day T you have a tent F you show flu symptoms A F O T R 1/25/2013 Probabilistic Inference 30

Probabilistic Inference How many parameters determine this model? P(A O) => 1 parameter P(R O) => 1 parameter P(F, T O) => 3 parameters In this problem, the values are given; Otherwise, we would have had to estimate them Variables: O the outdoor trip happens A advisor is happy R it rains that day A F T R T you have a tent F you show flu symptoms O 1/25/2013 Probabilistic Inference 31

Probabilistic Inference The weather forecast is optimistic, the chances of rain are 20%. You ve barely slacked off this week so your advisor is probably happy, let s give it an 80%. Luckily, you don t seem to have the flu. What are the chances that the trip will happen? Think of how you would do this. Hint #1: do the variables F and T influence the result in this case? Hint #2: use the fact that the A F T R combinations of values for A and R represent a partition and use one O of the partition formulas we learned 1/25/2013 Probabilistic Inference 32

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 33

Overview Introduction to Probability Theory Random Variables. Independent RVs Properties of Common Distributions Estimators. Unbiased estimators. Risk Conditional Probabilities/Independence Bayes Rule and Probabilistic Inference 1/25/2013 Recitation 1: Statistics Intro 34