Basics of Probability

Similar documents
Probabilistic Reasoning. Doug Downey, Northwestern EECS 348 Spring 2013

Econ 325: Introduction to Empirical Economics

Uncertain Knowledge and Bayes Rule. George Konidaris

Single Maths B: Introduction to Probability

STAT J535: Introduction

MATH MW Elementary Probability Course Notes Part I: Models and Counting

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Example: Suppose we toss a quarter and observe whether it falls heads or tails, recording the result as 1 for heads and 0 for tails.

V7 Foundations of Probability Theory

CS 4649/7649 Robot Intelligence: Planning

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Recitation 2: Probability

Machine Learning

COMP61011 : Machine Learning. Probabilis*c Models + Bayes Theorem

Uncertainty. Chapter 13

SDS 321: Introduction to Probability and Statistics

18.600: Lecture 3 What is probability?

Lecture 2. Conditional Probability

Probability Theory and Applications

CPSC340. Probability. Nando de Freitas September, 2012 University of British Columbia

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Overview of Probability. Mark Schmidt September 12, 2017

Probability Theory Review

Probability Review I

Independence. CS 109 Lecture 5 April 6th, 2016

CMPSCI 240: Reasoning about Uncertainty

Lecture 4 : Conditional Probability and Bayes Theorem 0/ 26

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Probability Review. September 25, 2015

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probabilistic Systems Analysis Spring 2018 Lecture 6. Random Variables: Probability Mass Function and Expectation

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

The problem Countable additivity Finite additivity Conglomerability Structure Puzzle. The Measure Problem. Alexander R. Pruss

2) There should be uncertainty as to which outcome will occur before the procedure takes place.

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Dept. of Linguistics, Indiana University Fall 2015

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Naïve Bayes classification

COMP9414/ 9814/ 3411: Artificial Intelligence. 14. Uncertainty. Russell & Norvig, Chapter 13. UNSW c AIMA, 2004, Alan Blair, 2012

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Deep Learning for Computer Vision

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

Lecture 1: Probability Fundamentals

Review of probability. Nuno Vasconcelos UCSD

1 INFO 2950, 2 4 Feb 10

Lecture 1. ABC of Probability

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Classification and Regression Trees

Review of probability

Review (Probability & Linear Algebra)

Probability, Entropy, and Inference / More About Inference

Probability Theory for Machine Learning. Chris Cremer September 2015

CS 361: Probability & Statistics

Probability (Devore Chapter Two)

Statistics for Business and Economics

Introduction to Statistical Methods for High Energy Physics

Probability calculus and statistics

Chapter 2. Conditional Probability and Independence. 2.1 Conditional Probability

Brief Review of Probability

Lecture 3 Probability Basics

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Joint, Conditional, & Marginal Probabilities

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction to Machine Learning

L2: Review of probability and statistics

4. Conditional Probability P( ) CSE 312 Autumn 2012 W.L. Ruzzo

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Review of Basic Probability

Dialog on Simple Derivatives

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Lecture 8. Probabilistic Reasoning CS 486/686 May 25, 2006

Discrete Probability and State Estimation

Uncertainty. Introduction to Artificial Intelligence CS 151 Lecture 2 April 1, CS151, Spring 2004

Prof. Thistleton MAT 505 Introduction to Probability Lecture 5

P (E) = P (A 1 )P (A 2 )... P (A n ).

Introduction and basic definitions

Generative Techniques: Bayes Rule and the Axioms of Probability

Lecture 3: Probability

Conditional Independence

Probability - Lecture 4

Basic Probabilistic Reasoning SEG

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Where are we in CS 440?

2. Probability. Chris Piech and Mehran Sahami. Oct 2017

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Information Science 2

COMP5211 Lecture Note on Reasoning under Uncertainty

Cogs 14B: Introduction to Statistical Analysis

9/12/17. Types of learning. Modeling data. Supervised learning: Classification. Supervised learning: Regression. Unsupervised learning: Clustering

Probability 1 (MATH 11300) lecture slides

Intro to Probability Instructor: Alexandre Bouchard

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Priors, Total Probability, Expectation, Multiple Trials

Lecture 2: Probability, conditional probability, and independence

Probability is related to uncertainty and not (only) to the results of repeated experiments

Transcription:

Basics of Probability Lecture 1 Doug Downey, Northwestern EECS 474

Events Event space E.g. for dice, = {1, 2, 3, 4, 5, 6} Set of measurable events S 2 E.g., = event we roll an even number = {2, 4, 6} S S must: Contain the empty event and the trivial event Be closed under union & complement, S S and S - S

Probability Distributions A probability distribution P over (, S) is a mapping from S to real values such that: 1. P() 0 S Sidenote 1 st and 3 rd axioms 2. P() = 1 ensure P is a measure 3., S = P( ) = P() + P( )

Probability Distributions Can visualize probability as fraction of area

Probability: Interpretations & Motivation Interpretations: Frequentist vs. Bayesian Why use probability for subjective beliefs? Beliefs that violate the axioms can lead to bad decisions regardless of the outcome [de Finetti, 1931] Example: P(A) = 0.6, P(not A) = 0.8? Example: P(A) > P(B) and P(B) > P(A)?

Random Variables A random variable is a function from to a value A partition of the event space A short-hand for referring to attributes of events Examples = {1, 2, 3, 4, 5, 6} DieRollEven {true, false} = Val(DieRollEven) = {all possible hmwk/exam grade combinations} FinalGrade {a, b, c}

Joint Distributions Grade Interest Course load P(G, I, C) a high full-time 0.10 a high part-time 0.08 a low full-time 0.03 a low part-time 0.04 b high full-time 0.07 b high part-time 0.02 b low full-time 0.12 b low part-time 0.16 c high full-time 0.01 c high part-time 0.02 c low full-time 0.20 c low part-time 0.15

Conditioning! Grade Interest Course load P(G, I, C) a high full-time 0.10 a high part-time 0.08 a low full-time 0.03 a low part-time 0.04 b high full-time 0.07 b high part-time 0.02 b low full-time 0.12 b low part-time 0.16 c high full-time 0.01 c high part-time 0.02 c low full-time 0.20 c low part-time 0.15

Conditioning! Grade Interest Course load P(G, I, C) a high full-time 0.10 a low full-time 0.03 b high full-time 0.07 b low full-time 0.12 c high full-time 0.01 c low full-time 0.20 / 0.53 / 0.53 / 0.53 / 0.53 / 0.53 / 0.53 0.53

Conditioning! Grade Interest Course load a high full-time a low full-time b high full-time b low full-time c high full-time c low full-time P(G, I C=f) 0.19 0.06 0.13 0.23 0.02 0.38 1.0

Conditional Probability P(Grade = a Interest = high) = 0.6 the probability of getting an A given only Interest = high, and nothing else. If we know Motivation = high or OtherInterests = many, the probability of an A changes even given high Interest Formal Definition: P( ) = P(, ) / P( ) When P( ) > 0

Conditional Probability Also: P(A B, C) = P(A, B, C) / P(B, C) More generally: P(A B) = P(A, B) / P(B) (Boldface indicates vectors of variables) P(Grade = a Grade = a, Interest = high)?

Marginalization Grade Interest Course load P(G, I, C) a high full-time 0.10 a high part-time 0.08 a low full-time 0.03 a low part-time 0.04 b high full-time 0.07 b high part-time 0.02 b low full-time 0.12 b low part-time 0.16 c high full-time 0.01 c high part-time 0.02 c low full-time 0.20 c low part-time 0.15

Marginalization Grade Interest Course load P(G, I, C) a high * 0.10 a high * 0.08 a low * 0.03 a low * 0.04 b high * 0.07 b high * 0.02 b low * 0.12 b low * 0.16 c high * 0.01 c high * 0.02 c low * 0.20 c low * 0.15

Marginalization Grade Interest Course load P(G, I) a high * 0.18 a low * 0.07 b high * 0.09 b low * 0.28 c high * 0.03 c low * 0.35

Marginalization Grade Interest P(G, I) a high 0.18 a low 0.07 b high 0.09 b low 0.28 c high 0.03 c low 0.35 1.0

Marginalization P X = y Val(Y) P(X, Y = y)

Continuous Random Variables For continuous r.v. X, specify a density p(x), such that: E.g., s r x dx x p s X r P otherwise 0 1 a x b a b x p

Uniform Continuous Density p x b 1 0 a b x a otherwise

Gaussian Density p(x) =

Joint Distribution Grade Interest low high a 0.07 0.18 b 0.28 0.09 c 0.35 0.03 Joint Distribution specified with 2*3 1 = 5 values

Conditional Probability Grade Interest low high a 0.07 0.18 b 0.28 0.09 c 0.35 0.03 P(Grade = a Interest = high)? P(Grade = a, Interest = high) = 0.18 P(Interest = high) = 0.18+0.09+0.03 = 0.30 => P(Grade = a Interest = high) = 0.18/0.30 = 0.6

Conditional Probability Grade Interest low high a 0.07 0.18 b 0.28 0.09 c 0.35 0.03 P(Interest Grade = a)? Interest low high 0.28 0.72

Conditional Probability Grade Interest low high a 0.07 0.28 0.18 0.72 b 0.28 0.76 0.09 0.24 c 0.35 0.92 0.03 0.08 P(Interest Grade)? Actually three separate distributions, one for each Grade value (has three independent parameters total)

Chain Rule E.g., P(Grade=b, Int. = high) = P(Grade=b Int.= high)p(int. = high) Can be used for distributions P(A, B) = P(A B)P(B)

Handy Rules for Cond. Probability (1 of 2) P(A B = b) is a single distribution, like P(A) P(A B) is not a single distribution a set of Val(B) distributions

Handy Rules for Cond. Probability (2 of 2) Any statement true for arbitrary distributions is also true if you condition on a new r.v. P(A, B) = P(A B)P(B)? (chain rule) Then also P(A, B C) = P(A B, C) P(B C) Likewise, any statement true for arbitrary distributions is also true if you replace an r.v. with two/more new r.v.s P(A B) = P(A, B) / P(B)? (def. of cond. Prob) P(A C, D) = P(A, C, D) / P(C, D) or P(A B) = P(A, B) / P(B)

Independence P(Rain Cloudy) P(Rain) But: P(FairDie=6 PreviousRoll=6) = P(FairDie=6) We say A and B are independent iff P(A B) = P(A) Logically equivalent to P(A, B) = P(A)*P(B) Denoted A B

Conditional Independence (1 of 2) A and B are conditionally independent given C iff P(A B, C) = P(A C) Equivalent to P(A, B C) = P(A C) P (B C) Denoted (A B C)

Conditional Independence (2 of 2) Example: university admissions Val(GetIntoX) = {yes, no, wait} Val(Application) = {good, bad} 3*3*2*2= 36 Parameters P(GetIntoNU GetIntoUIUC, GetIntoStanford, Application) = P(GetIntoNU Application) 2*2= 4 Parameters

Properties of Conditional Independence Decomposition (X Y, W Z) => (X Y Z) Weak Union (X Y, W Z) => (X Y Z, W) Contraction (X W Z, Y) & (X Y Z) => (X Y, W Z)

Expectation Discrete Continuous E P X = x P(x) x E P X = x p x dx E.g., E[FairDie]=3.5

Expectation is Linear E P X + Y = x,y x + y P x, y = x,y x P x, y + x,y y P x, y = x P x, y + y P x, y x y y x = x x P x + y P y = E P [X] + E P [Y] y

Fun with Expectation BALLMER: For years, I used this one quite a bit. I'd ask people to pick a number between one and a hundred. You get it on the first guess, I give you five bucks. Takes you two guesses, I give you four. Three, two, one, zero. Then you pay me a buck, you pay me two. Do you want to play or not? GATES: And you're telling them if they're high or low? BALLMER: I tell you high, low on your guess. Do you want to play or not? GATES: And getting the right answer isn't the key thing, if the person can think about it in the right way. BALLMER: You want to see that people can think in a disciplined, rational way. Although I will admit that someone once wrote down that this has an expected value of negative 21 cents as soon as I finished talking. [Looks over at Gates, who's started to jot down numbers on a piece of paper.] Look he's working on the problem! GATES: Just trying to get 21 cents, that's all. http://www.newsweek.com/1997/06/22/how-we-did-it.html

What have we learned? Probability a calculus for dealing with uncertainty Built from small set of axioms (ignore at your peril) Joint Distribution P(A, B, C, ) Specifies probability of all combinations of r.v.s Conditional Probability P(A B) Specifies probability of A=a given B=b Conditional Independence Can radically reduce number of model parameters Expectation Next time: Bayes Rule, Statistical Estimation