COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #2 Scribe: Ellen Kim February 7, 2008

Similar documents
Some Probability and Statistics

Some Probability and Statistics

CS 361: Probability & Statistics

Random Variables and Events

P (A B) P ((B C) A) P (B A) = P (B A) + P (C A) P (A) = P (B A) + P (C A) = Q(A) + Q(B).

Formalizing Probability. Choosing the Sample Space. Probability Measures

Mathematical Foundations

Great Theoretical Ideas in Computer Science

Some Basic Concepts of Probability and Information Theory: Pt. 1

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 10

Elementary Discrete Probability

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Conditional Probability. CS231 Dianna Xu

4.2 Probability Models

Notes slides from before lecture. CSE 21, Winter 2017, Section A00. Lecture 15 Notes. Class URL:

Probability Distributions. Conditional Probability

Uncertainty. Russell & Norvig Chapter 13.

Probability: Part 1 Naima Hammoud

Lecture 8: Conditional probability I: definition, independence, the tree method, sampling, chain rule for independent events

Intermediate Math Circles November 15, 2017 Probability III

The probability of an event is viewed as a numerical measure of the chance that the event will occur.

Probabilistic models

Introduction to Probability 2017/18 Supplementary Problems

Probability Long-Term Memory Review Review 1

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

First Digit Tally Marks Final Count

Econ 325: Introduction to Empirical Economics

Probability Distributions. Conditional Probability.

Discrete Random Variables

Dept. of Linguistics, Indiana University Fall 2015

Physics 1140 Lecture 6: Gaussian Distributions

Probabilistic models

P (E) = P (A 1 )P (A 2 )... P (A n ).

Independence. P(A) = P(B) = 3 6 = 1 2, and P(C) = 4 6 = 2 3.

Probability Experiments, Trials, Outcomes, Sample Spaces Example 1 Example 2

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

LING 473: Day 5. START THE RECORDING Bayes Theorem. University of Washington Linguistics 473: Computational Linguistics Fundamentals

Discrete Probability and State Estimation

2.4 Conditional Probability

Probability Basics. Part 3: Types of Probability. INFO-1301, Quantitative Reasoning 1 University of Colorado Boulder

Lecture 1: Basics of Probability

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

Probability Theory Review

Chapter 2 Class Notes

Sociology 6Z03 Topic 10: Probability (Part I)

Conditional probability

Determining Probabilities. Product Rule for Ordered Pairs/k-Tuples:

1 Review of Probability

CMPT 882 Machine Learning

Lecture 2. Conditional Probability

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

STAT509: Probability


Lecture 3. January 7, () Lecture 3 January 7, / 35

Solution to HW 12. Since B and B 2 form a partition, we have P (A) = P (A B 1 )P (B 1 ) + P (A B 2 )P (B 2 ). Using P (A) = 21.

Basic Probability. Introduction

Lecture #13 Tuesday, October 4, 2016 Textbook: Sections 7.3, 7.4, 8.1, 8.2, 8.3

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Chapter 2 PROBABILITY SAMPLE SPACE

Statistics for Engineers

Lecture 1: Probability Fundamentals

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

CS 188: Artificial Intelligence Fall 2009

Problems from Probability and Statistical Inference (9th ed.) by Hogg, Tanis and Zimmerman.

Ch 14 Randomness and Probability

UC Berkeley Department of Electrical Engineering and Computer Science Department of Statistics. EECS 281A / STAT 241A Statistical Learning Theory

Discrete Random Variables

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 23

Bayes Rule for probability

Be able to define the following terms and answer basic questions about them:

CSC Discrete Math I, Spring Discrete Probability

Probability Theory for Machine Learning. Chris Cremer September 2015

MA : Introductory Probability

Conditional Probability, Independence, Bayes Theorem Spring January 1, / 28

Introduction to Bayesian Learning

ORF 245 Fundamentals of Statistics Chapter 5 Probability

CMPSCI 240: Reasoning Under Uncertainty

Conditional Probability P( )

Example. What is the sample space for flipping a fair coin? Rolling a 6-sided die? Find the event E where E = {x x has exactly one head}

Lecture 04: Conditional Probability. Lisa Yan July 2, 2018

Review of probability

CISC 1100/1400 Structures of Comp. Sci./Discrete Structures Chapter 7 Probability. Outline. Terminology and background. Arthur G.

Conditional Probability, Independence, Bayes Theorem Spring 2018

What is Probability? Probability. Sample Spaces and Events. Simple Event

Introduction to probability and statistics

Monty Hall Puzzle. Draw a tree diagram of possible choices (a possibility tree ) One for each strategy switch or no-switch

Lectures Conditional Probability and Independence

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Announcements. Lecture 5: Probability. Dangling threads from last week: Mean vs. median. Dangling threads from last week: Sampling bias

k P (X = k)

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Example A. Define X = number of heads in ten tosses of a coin. What are the values that X may assume?

Problem # Number of points 1 /20 2 /20 3 /20 4 /20 5 /20 6 /20 7 /20 8 /20 Total /150

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy)

EE126: Probability and Random Processes

Transcription:

COS 424: Interacting with Data Lecturer: Dave Blei Lecture #2 Scribe: Ellen Kim February 7, 2008 1 Probability Review 1.1 Random Variables A random variable can be used to model any probabilistic outcome. Examples A coin toss is a random variable because the coin lands heads up ( heads ) with probability of 1/2 and lands tails up ( tails ) with probability of 1/2. The number of visitors to a store on a given day is not exactly a random variable but can be treated as one becuase the number of visitors is unknown. The temperature on 2/7/2013 is also a random variable. Again, it is not determined randomly, but it is unknown so it can be modeled as a random variable. The temperature on 3/4/1905 can be a random variable in the context of looking up the average temperature on that day, out of the temperatures of all other days. 1.1.1 Types of Random Variables Random variables take on values in a sample space. These values depend on the type of random variable: discrete or continuous. Discrete random variables take on values from a finite sample space. Continuous random variables take on values from an infinite sample space. Examples revisited A coin toss is a discrete random variable because the sample space is {H, T}, a finite set of values. The number of visitors to a store on a given day is also a discrete random variable because the sample space is {0, 1, 2,...}, a[n infinitely] finite set of values. The actual temperature on a given day or at a given time is a continuous random variable because the temperature s sample space is the set of real numbers Re. 1.1.2 Notation for Random Variables A random variable is denoted by a capital letter (ex. X). A realization of a random variable is lowercase (ex. x). Also, for probability notation using random variables, for this course, at least, just use lowercase p for discrete random variables (ex. p(x=x)).

Figure 1: Diagram of the probability of an event a. 1.1.3 Discrete Distributions For discrete distributions, the probabilities of all possible outcomes sums to 1. p(x = x) = 1 x Example revisited An unfair coin toss: p(x = h) = 0.7 p(x = t) = 0.3 1.2 Using Diagrams It is convenient and easy to use visual diagrams to make working with probabilities more intuitive. In one representation, below, all atoms are in the box and areas can be drawn in the box to represent an event. An event is then a subset of atoms; events are drawn so that the proportion of the area of the event to the whole box approximates the probability of that event. The probability of an event is calculated by p(x = x) = p(a). (More examples of diagrams will be used in the rest of this lecture.) 1.3 Distributions of Random Variables x While discrete and continuous random variables are interesting, useful, yay by themselves, typically, we consider collections of random variables, such as the following. 1.3.1 Joint Distribution A joint distribution is a distribution over the configuration of all the random variables in an ensemble. Example revisited again The tossing of four fair coins has 16 different possible outcomes (hhhh, hhht,..., tttt). Each of these outcomes has a probability of occurring. In whole, the distribution of these probabilities is the joint distribution: p(hhhh) = 0.0625 2

Figure 2: L2-2 multiple events.jpgl2-2 Events x, y 1.3.2 Conditional Distribution p(hhht) = 0.0625... p(tttt) = 0.0625. The conditional distribution is the distribution of a random variable given some evidence. The following notation is, for example, the probability that X will be x given that Y is y. p(x = x Y = y). Example (a new one) David really likes the band Steely Dan [and Bread], but David s wife really does not. The probability that David listens to Steely Dan is p(david listens to Steely Dan) = 0.5 but the probability that David listens to Steely Dan given that his wife Toni is home is p(david listens to Steely Dan Toni is home) = 0.1 and the probability that David listens to Steely Dan given that Toni is not at home is p(david listens to Steely Dan Toni is not home) = 0.7. Note that.1 plus.7 does not equal 1, and this is fine, as long as there is a complete distribution for X for each possible outcome of Y. That is, it is fine that p(x = x Y = y) 1 as long as because the latter is necessarily true. y p(x = x Y = y) = 1 x Diagramming multiple events If we know a condition y is true, the area of y becomes the box you are not considering all atoms to be just the atoms in event y. 3

Conditional Probability The conditional probability of an event x given that an event y has occurred is given by p(x = x Y = y) = p(x = x, Y = y) p(y = y) and holds when p(y=y) is greater than 0. 1.4 Manipulating probabilities 1.4.1 Chain Rule The chain rule is useful for many things. For instance, in diagnosing a patient in medicine, if Y is a disease and X is a symptom, knowing the frequency (probability) at which X and Y occur, once can find the joint distribution. The chain rule is so that in general, p(x = x, Y = y) = p(x, Y ) = p(x, Y ) p(y ) p(y ) = p(x Y ) p(y ) N p(x 1, X 2,..., X N ) = p(x 1 ) p(x n X 1, X 2,..., X N ). 1.4.2 Marginalization Given random variables, we are often only interested in a subset, so we use marginalization. We will verify that this is mathematically correct below: p(x) =? p(x, Y = y, Z = z) y z = p(x, y, z) y z = p(x) p(y = y, Z = z X) y z = p(x) p(y = y, Z = z X) n=2 4

1.4.3 Bayes Rule p(y X) = y p(x Y ) p(y ) p(x Y = y) p(y = y) = p(x, Y ) p(x) Bayes Rule is from chain rule, marginalizing out Y in the denominator, and the definition of conditional probability. Example revisited Going back to the disease example, now, knowing the symptom and the probability of the symptom when the disease is present, we can find the probability that the patient has the disease given the presence of the symptom. An important application that will not be discussed here is Bayesian statistics. 1.4.4 Independence Random variables are independent if knowing one doesn t tell us anything about the other. p(x Y = y) = p(x) for all y. This means that if X and Y are independent, the joint probability factorizes: p(x, Y ) = p(x Y ) p(y ) = p(x) p(y ). An implication of this is that we can use 2 5-vectors rather than a 5x5 grid. Another is that we can collect data for 2 factors independently and still find joint probabilities. Examples The following are examples of independent variables: whether it rains and who the president will be the outcome of 2 tossed dice The following examples are of variables that are not independent: whether it rains and whether we go to the beach height and sex of a person the results of drawing without replacement from a deck of cards whether the alarm clock goes off and getting to class on time 1.4.5 Conditional Independence by Example Suppose there are two coins, one fair and the other unfair, so that p(c 1 = H) = 0.5 and p(c 2 = H) = 0.7. Now choose one coin Z at random, so Z 1, 2. 5

Then flip C Z twice to get two outcomes X, Y. Question: Are X and Y independent? Answer: Yes, if it is known whether Z is coin 1 or coin 2, but we don t know which Z is. So the answer is no, because if the first flip comes up heads, then it is more likely to be Z=2 so on the next flip heads is expected more than tail; if the first flip is tails, then it is more likely to be Z=1 so on the next flip tails is expected again with probability 1/2. X and Y are conditinoally independent given Z. Again, this implies a factorization, p(y X, Z = z) = p(y Z = z) p(y, X Z = z) = p(y Z = z) p(x Z = z). 1.5 Monty Hall problem for thought Monty Hall shows you 3 doors with 2 goats and 1 prize behind them. You, the contestant, pick one. Then he shows you one of the others which is a goat; you can stay with the door you chose initially or pick the other remaining door (the door you neither picked nor he revealed as with goat). What should you do? (answer in next lecture...) 1.6 Announcement Using R help session on Monday at 5:30 PM in the small auditorium in the Computer Science building. Also, please sit in the front of the hall during Tuesday and Thursday lectures. 6