Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Similar documents
Basic Probability and Statistics

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Uncertainty in the World. Representing Uncertainty. Uncertainty in the World and our Models. Uncertainty

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Machine Learning. CS Spring 2015 a Bayesian Learning (I) Uncertainty

Introduction to Machine Learning

Introduction to Machine Learning

Probabilistic Reasoning

Review of Basic Probability

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Computer Science CPSC 322. Lecture 18 Marginalization, Conditioning

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Probability Review. Chao Lan

Uncertain Knowledge and Bayes Rule. George Konidaris

CS 361: Probability & Statistics

Bayes and Naïve Bayes Classifiers CS434

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

Artificial Intelligence Uncertainty

Probabilistic and Bayesian Analytics

STAT:5100 (22S:193) Statistical Inference I

Dept. of Linguistics, Indiana University Fall 2015

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Grades 7 & 8, Math Circles 24/25/26 October, Probability

Chapter 14. From Randomness to Probability. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Probability 1 (MATH 11300) lecture slides

COMP 328: Machine Learning

CS 331: Artificial Intelligence Probability I. Dealing with Uncertainty

Discrete Probability and State Estimation

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Fundamentals of Probability CE 311S

Cartesian-product sample spaces and independence

Bayes Networks. CS540 Bryan R Gibson University of Wisconsin-Madison. Slides adapted from those used by Prof. Jerry Zhu, CS540-1

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Probability Theory for Machine Learning. Chris Cremer September 2015

Lecture 1. ABC of Probability

CS 188: Artificial Intelligence. Bayes Nets

Information Science 2

Lecture 1: Probability Fundamentals

Probability Theory and Applications

Uncertainty. Outline

CS 188: Artificial Intelligence Spring Today

Pengju

Bayes Formula. MATH 107: Finite Mathematics University of Louisville. March 26, 2014

STA Module 4 Probability Concepts. Rev.F08 1

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

CMPSCI 240: Reasoning about Uncertainty

Machine Learning

Pengju XJTU 2016

Today we ll discuss ways to learn how to think about events that are influenced by chance.

Uncertainty. Chapter 13, Sections 1 6

Probability Rules. MATH 130, Elements of Statistics I. J. Robert Buchanan. Fall Department of Mathematics

7.1 What is it and why should we care?

With Question/Answer Animations. Chapter 7

Introduction to probability and statistics

HW MATH425/525 Lecture Notes 1

Uncertainty. Chapter 13

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Bayes Nets III: Inference

Basic Probability. Introduction

Announcements. Inference. Mid-term. Inference by Enumeration. Reminder: Alarm Network. Introduction to Artificial Intelligence. V22.

Dynamic Programming Lecture #4

Econ 325: Introduction to Empirical Economics

Quantifying uncertainty & Bayesian networks

Brief Review of Probability

STOR Lecture 4. Axioms of Probability - II

Probability Year 10. Terminology

Chapter 13 Quantifying Uncertainty

Dealing with Uncertainty. CS 331: Artificial Intelligence Probability I. Outline. Random Variables. Random Variables.

n(1 p i ) n 1 p i = 1 3 i=1 E(X i p = p i )P(p = p i ) = 1 3 p i = n 3 (p 1 + p 2 + p 3 ). p i i=1 P(X i = 1 p = p i )P(p = p i ) = p1+p2+p3

Artificial Intelligence

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Uncertainty. Chapter 13

Discrete Probability

Lecture notes for probability. Math 124

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

CS 441 Discrete Mathematics for CS Lecture 19. Probabilities. CS 441 Discrete mathematics for CS. Probabilities

Probability Hal Daumé III. Computer Science University of Maryland CS 421: Introduction to Artificial Intelligence 27 Mar 2012

Probability- describes the pattern of chance outcomes

Probability Year 9. Terminology

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Example: Suppose we toss a quarter and observe whether it falls heads or tails, recording the result as 1 for heads and 0 for tails.

Basic notions of probability theory

COMP9414: Artificial Intelligence Reasoning Under Uncertainty

1. When applied to an affected person, the test comes up positive in 90% of cases, and negative in 10% (these are called false negatives ).

Probability theory basics

Probabilistic Robotics

Recitation 2: Probability

Review of Probability Mark Craven and David Page Computer Sciences 760.

STAT Chapter 3: Probability

Where are we in CS 440?

Uncertainty. Logic and Uncertainty. Russell & Norvig. Readings: Chapter 13. One problem with logical-agent approaches: C:145 Artificial

MAE 493G, CpE 493M, Mobile Robotics. 6. Basic Probability

Conditional Probability & Independence. Conditional Probabilities

Probability and Independence Terri Bittner, Ph.D.

Cogs 14B: Introduction to Statistical Analysis

Transcription:

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1

Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one has two black balls. Black balls worth nothing You have two chances to randomly pick up a ball from any envelop You randomly grabbed an envelope, randomly took out one ball it s black. At this point you re given the option to switch the envelope. To switch or not to switch? slide 2

Outline Probability random variable Axioms of probability Conditional probability Probabilistic inference: Bayes rule Independence Conditional independence slide 3

Randomness Is our world random? Uncertainty Uncertainty Ignorance (practical and theoretical) Will my coin flip ends in head? Will bird flu strike tomorrow? Probability is the language of uncertainty Central pillar of modern day artificial intelligence slide 4

Sample space A space of events that we assign probabilities to Events can be binary, multi-valued, or continuous Events are mutually exclusive Examples Coin flip: {head, tail} Die roll: {1,2,3,4,5,6} English words: a dictionary Temperature tomorrow: R + (kelvin) slide 5

Random variable A variable, X, whose domain is the sample space, and whose value is somewhat uncertain Examples: X = coin flip outcome X = first word in tomorrow s headline news X = tomorrow s temperature Kind of like X = rand() Andrey Kolmogorov (1903-1987) slide 6

Probability for discrete events Probability P(X=x) is the fraction of times x takes value a Often we write it as P(x) There are other definitions of probability, and philosophical debates but we ll not go there Examples P(head)=P(tail)=0.5 fair coin P(head)=0.51, P(tail)=0.49 slightly biased coin P(head)=1, P(tail)=0 Jerry s coin P(first word = the when flipping to a random page in R&N)=? slide 7

Probability table Weather Sunny 200/365 Cloudy 100/365 Rainy 65/365 P(Weather = sunny) = P(sunny) = 200/365 P(Weather) = {200/365, 100/365, 65/365} For now we ll be satisfied with obtaining the probabilities by counting frequency from data slide 8

Probability for discrete events Probability for more complex events A (event can be considered as a subset of the sample space) P(A= head or tail )=P(outcome of X is in A)? fair coin P(A= even number )=? fair 6-sided die P(A= two dice rolls sum to 2 )=? slide 9

Probability for discrete events Probability for more complex events A P(A= head or tail )=0.5 + 0.5 = 1 fair coin P(A= even number )=1/6 + 1/6 + 1/6 = 0.5 fair 6- sided die P(A= two dice rolls sum to 2 )=1/6 * 1/6 = 1/36 slide 10

The axioms of probability P(A) [0,1] P(A B) = P(A) + P(B) P(A B) slide 11

The axioms of probability P(A) [0,1] P(A B) = P(A) + P(B) P(A B) The fraction of A can t be smaller than 0 Sample space slide 12

The axioms of probability The fraction of A can t P(A) [0,1] be bigger than 1 P(A B) = P(A) + P(B) P(A B) Sample space slide 13

The axioms of probability P(A) [0,1] P(A B) = P(A) + P(B) P(A B) A Sample space B slide 14

Some theorems derived from the axioms P( A) = 1 P(A) picture? If X can take k different values x 1 x k : P(X=x 1 ) + P(X=x k ) = 1 P(B) = P(B A) + P(B A), if A is a binary event slide 15

Joint probability The joint probability P(X=x, Y=y) is a shorthand for P(X=x Y=y), the probability of both X=x and Y=y happen Generally, P(A, B) = P(x hits A and y hits B) P(X=x), e.g. P(1 st word on a random page = San ) = 0.001 (possibly: San Francisco, San Diego, ) A P(Y=y), e.g. P(2 nd word = Francisco ) = 0.0008 (possibly: San Francisco, Don Francisco, Pablo Francisco ) P(X=x,Y=y), e.g. P(1 st = San,2 nd = Francisco )=0.0007 slide 16

Joint probability table temp weather Sunny Cloudy Rainy hot 150/365 40/365 5/365 cold 50/365 60/365 60/365 P(temp=hot, weather=rainy) = P(hot, rainy) = 5/365 The full joint probability table between N variables, each taking k values, has k N entries (that s a lot!) slide 17

Sum over other variables temp Marginal probability weather Sunny Cloudy Rainy hot 150/365 40/365 5/365 cold 50/365 60/365 60/365 Σ 200/365 100/365 65/365 P(Weather)={200/365, 100/365, 65/365} The name comes from the old days when the sums are written on the margin of a page slide 18

Marginal probability Sum over other variables weather temp Sunny Cloudy Rainy hot 150/365 40/365 5/365 cold 50/365 60/365 60/365 Σ 195/365 170/365 P(temp)={195/365, 170/365} This is nothing but P(Y=y) = Σ i=1 k P(Y=y, X=x i ), if X can take k values P(weather is sunny or rainy) = P(weather is sunny) + P(weather is rainy) = (150/365 + 50/365) + (5/365 +6 0/365) = 265/365 slide 19

Conditional probability The conditional probability P(A B) is the fraction of times A happens, within the region that B happens P(A), e.g. P(1 st word on a random page = San ) = 0.001 A P(B), e.g. P(2 nd word = Francisco ) = 0.0008 P(A B), e.g. P(1 st = San 2 nd = Francisco )=0.875 (possibly: San, Don, Pablo ) Although San is rare and Francisco is rare, given Francisco then San is quite likely! slide 20

Conditional probability P(San Francisco) = #(1 st =S and 2 nd =F) / #(2 nd =F) = P(San Francisco) / P(Francisco) = 0.0007 / 0.0008 = 0.875 P(S)=0.001 P(F)=0.0008 P(S,F)=0.0007 A P(B), e.g. P(2 nd word = Francisco ) = 0.0008 P(A B), e.g. P(1 st = San 2 nd = Francisco )=0.875 (possibly: San, Don, Pablo ) slide 21

Conditional probability (Bayes' rule) In general, the conditional probability is P( A B )= P( A, B ) P( B ) We can have everything conditioned on some other events C, to get a conditional version of conditional probability P( A B,C )= P( A,B ) = All x P( X =x, B ) P( A, B C ) P( B C ) has low precedence. This should read P(A (B,C)) slide 22

The chain rule From the definition of conditional probability we have the chain rule P(A, B) = P(B) * P(A B) It works the other way around P(A, B) = P(A) * P(B A) It works with more than 2 events too P(A 1, A 2,, A n ) = P(A 1 ) * P(A 2 A 1 ) * P(A 3 A 1, A 2 ) * * P(A n A 1,A 2 A n-1 ) slide 23

Reasoning How do we use probabilities in AI? You wake up with a headache (D oh!). Do you have the flu? H = headache, F = flu Logical Inference: if (H) then F. (but the world is often not this clear cut) Statistical Inference: compute the probability of a query given (conditioned on) evidence, i.e. P(F H) [Example from Andrew Moore] slide 24

Inference with Bayes rule: Example 1 Inference: compute the probability of a query given evidence (H = headache, F = flu) You know that P(H) = 0.1 one in ten people has headache P(F) = 0.01 one in 100 people has flu P(H F) = 0.9 90% of people who have flu have headache How likely do you have the flu? 0.9? 0.01?? [Example from Andrew Moore] slide 25

Bayes rule Inference with Bayes rule Essay Towards Solving a Problem in the Doctrine of Chances (1764) p( F H )= P( F, H ) P( H F ) P( F ) = P( H ) P( H ) P(H) = 0.1 one in ten people has headache P(F) = 0.01 one in 100 people has flu P(H F) = 0.9 90% of people who have flu have headache P(F H) = 0.9 * 0.01 / 0.1 = 0.09 So there s a 9% chance you have flu much less than 90% But it s higher than P(F)=1%, since you have the headache slide 26

Inference with Bayes rule P(A B) = P(B A)P(A) / P(B) Bayes rule Why do we make things this complicated? Often P(B A), P(A), P(B) are easier to get Some names: Prior P(A): probability before any evidence Likelihood P(B A): assuming A, how likely is the evidence Posterior P(A B): conditional prob. after knowing evidence Inference: deriving unknown probability from known ones In general, if we have the full joint probability table, we can simply do P(A B)=P(A, B) / P(B) more on this later slide 27

Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one has two black balls. Black balls worth nothing You have two chances to randomly pick up a ball from any envelop You randomly grabbed an envelope, randomly took out one ball it s black. At this point you re given the option to switch the envelope. To switch or not to switch? slide 28

Inference with Bayes rule: Example E: envelope, 1=(R,B), 2=(B,B) B: the event of drawing a black ball P(E B) = P(B E)*P(E) / P(B) from Bayes rule We first compare P(E=1 B) vs. P(E=2 B) P(B E=1) = 0.5, P(B E=2) = 1 P(E=1)=P(E=2)=0.5 P(B)=3/4 (it in fact doesn t matter for the comparison) P(E=1 B)=1/3, P(E=2 B)=2/3 After seeing a black ball, the posterior probability of this envelope being 1 (thus worth $100) is smaller than it being 2 You should switch since the other envelope has red ball with higher probability? slide 29

Think about it more If you switch the envelop which has two balls, you only have 1/2 chance to get the red ball. Thus, overall if you switch you only has the probability 1/2 * 2/3 = 1/3 to win the red ball If you do not switch and keep trying the same envelop, you also has 1/3 chance to get the red ball Thus, it does not matter if you switch or not slide 30

Independence Two events A, B are independent, if (the following are equivalent) P(A, B) = P(A) * P(B) P(A B) = P(A) P(B A) = P(B) For a 4-sided die, let A=outcome is small B=outcome is even Are A and B independent? How about a 6-sided die? slide 31

Independence Independence is a domain knowledge If A, B are independent, the joint probability table between A, B is simple: it has k 2 cells, but only 2k-2 parameters. This is good news more on this later Example: P(burglary)=0.001, P(earthquake)=0.002. Let s say they are independent. The full joint probability table=? slide 32

Independence misused A famous statistician would never travel by airplane, because he had studied air travel and estimated that the probability of there being a bomb on any given flight was one in a million, and he was not prepared to accept these odds. One day, a colleague met him at a conference far from home. "How did you get here, by train?" "No, I flew" "What about the possibility of a bomb?" "Well, I began thinking that if the odds of one bomb are 1:million, then the odds of two bombs are (1/1,000,000) x (1/1,000,000). This is a very, very small probability, which I can accept. So now I bring my own bomb along!" An innocent old math joke slide 33

Conditional independence In general A, B are conditionally independent given C if P(A B, C) = P(A C), or P(B A, C) = P(B C), or P(A, B C) = P(A C) * P(B C) slide 34

Conditional independence Random variables can be dependent, but conditionally independent A: Person 1 gets head B: Person 2 gets head Person 1 and person 2 flip two different coins ; Person 1 flips first; Person 1 and person 2 flip the same coin (unfair coin). Person 1 flips first; Person 1 and person 2 flip the same coin (unfair coin); Person 1 flips first; Event C is the probability with head is 0.9 for this coin; slide 35

Conditional independence Random variables can be dependent, but conditionally independent A: Person 1 gets head B: Person 2 gets head Person 1 and person 2 flip two different coins ; Person 1 flips first; (P(B A) = P(B)) Person 1 and person 2 flip the same coin (unfair coin). Person 1 flips first; (P(B A)~=P(B)) Person 1 and person 2 flip the same coin (unfair coin); Person 1 flips first; Event C is the probability with head is 0.9 for this coin; (P(B A)~=P(B) but P(B A, C) = P(B C)) slide 36

What you should know Joint probabilty Marginal probability Conditional probability Bayes' rule Independence Conditional independence slide 37