Dept. of Linguistics, Indiana University Fall 2015

Similar documents
Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Probabilistic models

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Probability Theory for Machine Learning. Chris Cremer September 2015

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

What is Probability? Probability. Sample Spaces and Events. Simple Event

Conditional Probability

MA : Introductory Probability

M378K In-Class Assignment #1

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Probability the chance that an uncertain event will occur (always between 0 and 1)

Probability Review. Chao Lan

Probability Year 9. Terminology

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Part (A): Review of Probability [Statistics I revision]

Probability COMP 245 STATISTICS. Dr N A Heard. 1 Sample Spaces and Events Sample Spaces Events Combinations of Events...

Probabilistic models

Lecture 3 - Axioms of Probability

Probability Year 10. Terminology

The enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}

Econ 325: Introduction to Empirical Economics

STAT:5100 (22S:193) Statistical Inference I

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

What is a random variable

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Origins of Probability Theory

Example. What is the sample space for flipping a fair coin? Rolling a 6-sided die? Find the event E where E = {x x has exactly one head}

Probability Theory and Applications

BASICS OF PROBABILITY CHAPTER-1 CS6015-LINEAR ALGEBRA AND RANDOM PROCESSES

Probability- describes the pattern of chance outcomes

Deep Learning for Computer Vision

Sample Space: Specify all possible outcomes from an experiment. Event: Specify a particular outcome or combination of outcomes.

7.1 What is it and why should we care?

Chapter 7 Wednesday, May 26th

Lecture 1. ABC of Probability

Properties of Probability

Single Maths B: Introduction to Probability

Mean, Median and Mode. Lecture 3 - Axioms of Probability. Where do they come from? Graphically. We start with a set of 21 numbers, Sta102 / BME102

CSC Discrete Math I, Spring Discrete Probability

Probability 1 (MATH 11300) lecture slides

STAT 430/510 Probability

Outline Conditional Probability The Law of Total Probability and Bayes Theorem Independent Events. Week 4 Classical Probability, Part II

Intermediate Math Circles November 8, 2017 Probability II

STAT509: Probability

Introduction to Probability. Ariel Yadin. Lecture 1. We begin with an example [this is known as Bertrand s paradox]. *** Nov.

UNIT 5 ~ Probability: What Are the Chances? 1

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Basic Probability. Introduction

Recitation 2: Probability

Brief Review of Probability

Chapter 2.5 Random Variables and Probability The Modern View (cont.)

2. Conditional Probability

Statistical Inference

Lecture 16. Lectures 1-15 Review

Chapter 2 PROBABILITY SAMPLE SPACE

4/17/2012. NE ( ) # of ways an event can happen NS ( ) # of events in the sample space

Introduction to Probability 2017/18 Supplementary Problems

Statistics for Managers Using Microsoft Excel (3 rd Edition)

HW MATH425/525 Lecture Notes 1

Discrete Probability. Mark Huiskes, LIACS Probability and Statistics, Mark Huiskes, LIACS, Lecture 2

Statistics for Managers Using Microsoft Excel/SPSS Chapter 4 Basic Probability And Discrete Probability Distributions

1 INFO Sep 05

Statistical Theory 1

Lecture Lecture 5

LECTURE NOTES by DR. J.S.V.R. KRISHNA PRASAD

Quantitative Methods for Decision Making

Business Statistics MBA Pokhara University

CS4705. Probability Review and Naïve Bayes. Slides from Dragomir Radev

Review of Probability. CS1538: Introduction to Simulations

Conditional Probability & Independence. Conditional Probabilities

TOPIC 12 PROBABILITY SCHEMATIC DIAGRAM

Homework 4 Solution, due July 23

Lecture 1: Probability Fundamentals

Conditional Probability 2 Solutions COR1-GB.1305 Statistics and Data Analysis

Probability Theory Review

4th IIA-Penn State Astrostatistics School July, 2013 Vainu Bappu Observatory, Kavalur

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

Statistics for Business and Economics

Probability Theory. Probability and Statistics for Data Science CSE594 - Spring 2016

Introduction to Stochastic Processes

Probability & Random Variables

Notes on Probability

3.2 Probability Rules

Independence Solutions STAT-UB.0103 Statistics for Business Control and Regression Models

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

CME 106: Review Probability theory

Appendix A : Introduction to Probability and stochastic processes

Outline. Probability. Math 143. Department of Mathematics and Statistics Calvin College. Spring 2010

Discrete Probability

Random Variables. Statistics 110. Summer Copyright c 2006 by Mark E. Irwin

TA Qinru Shi: Based on poll result, the first Programming Boot Camp will be: this Sunday 5 Feb, 7-8pm Gates 114.

ELEG 3143 Probability & Stochastic Process Ch. 1 Probability

Lecture 2: Probability and Distributions

Probability: Why do we care? Lecture 2: Probability and Distributions. Classical Definition. What is Probability?

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

STA 2023 EXAM-2 Practice Problems. Ven Mudunuru. From Chapters 4, 5, & Partly 6. With SOLUTIONS

Notes 12 Autumn 2005

4 Lecture 4 Notes: Introduction to Probability. Probability Rules. Independence and Conditional Probability. Bayes Theorem. Risk and Odds Ratio

Events A and B are said to be independent if the occurrence of A does not affect the probability of B.

6.2 Introduction to Probability. The Deal. Possible outcomes: STAT1010 Intro to probability. Definitions. Terms: What are the chances of?

Transcription:

L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34

To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would need to take a statistics course Probability theory = theory to determine how likely it is that some outcome will occur 2 / 34

We state things in terms of an experiment (or trial) e.g., flipping three coins outcome: one particular possible result e.g., coin 1 = heads, coin 2 = tails, coin 3 = tails (HTT) event: one particular possible set of results, i.e., a more abstract idea e.g., two tails and one head ({HTT, THT, TTH}) The set of basic outcomes makes up the sample space (Ω) Discrete sample space: countably infinite outcomes (1, 2, 3,...), e.g., heads or tails Continuous sample space: uncountably infinite outcomes (1.1293..., 8.765...,...), e.g., height We will use F to refer to the set of events, or event space 3 / 34

Sample space Die rolling If we have a 6-sided die Sample space Ω = {One, Two, Three, Four, Five, Six} Event space F = {{One}, {One, Two}, {One, Three, Five}... } With 6 options, there are 2 6 = 64 distinct events 4 / 34

Principles of counting Multiplication Principle: if there are two independent events, P and Q, and P can happen in p different ways and Q in q different ways, then P and Q can happen in p q ways Addition Principle: if there are two independent events, P and Q, and P can happen in p different ways and Q in q different ways, then P or Q can happen in p + q ways 5 / 34

Principles of counting Examples example 1: If there are 3 roads leading from Bloomington to Indianapolis and 5 roads from Indianapolis to Chicago, how many ways are there to get from Bloomington to Chicago? 6 / 34

Principles of counting Examples example 1: If there are 3 roads leading from Bloomington to Indianapolis and 5 roads from Indianapolis to Chicago, how many ways are there to get from Bloomington to Chicago? answer: 3 5 = 15 6 / 34

Principles of counting Examples example 1: If there are 3 roads leading from Bloomington to Indianapolis and 5 roads from Indianapolis to Chicago, how many ways are there to get from Bloomington to Chicago? answer: 3 5 = 15 example 2: If there are 2 roads going south from Bloomington and 6 roads going north. How many roads are there going south or north? 6 / 34

Principles of counting Examples example 1: If there are 3 roads leading from Bloomington to Indianapolis and 5 roads from Indianapolis to Chicago, how many ways are there to get from Bloomington to Chicago? answer: 3 5 = 15 example 2: If there are 2 roads going south from Bloomington and 6 roads going north. How many roads are there going south or north? answer: 2 + 6 = 8 6 / 34

Exercises How many different 7-place license plates are possible if the first two places are for letters and the other 5 for numbers? 7 / 34

Exercises How many different 7-place license plates are possible if the first two places are for letters and the other 5 for numbers? John, Jim, Jack and Jay have formed a band consisting of 4 instruments. If each of the boys can play all 4 instruments, how many different arrangements are possible? What if John and Jim can play all 4 instruments, but Jay and Jack can each play only piano and drums? 7 / 34

Probability functions : if A is an event, P(A) is its 0 P(A) 1 A function (distribution) distributes a mass of 1 over the sample space Ω Probability function: any function P : F [0, 1], where: P(Ω) = 1 A j F: P( A j ) = P(A j ) (countable additivity) j=1 j=1... for disjoint sets: A j A k = for j k i.e., the of any event A j happening = sum of the probabilities of any individual event happening e.g., P(roll = 1 roll = 2) = P(roll = 1) + P(roll = 2) 8 / 34

Example Toss a fair coin three times. What is the chance of exactly 2 heads coming up? Sample space Ω = {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} Event of interest A = {HHT,HTH,THH} Since the coin is fair, we have a uniform distribution, i.e., each outcome is equally likely (1/8) P(A) = A Ω = 3 8 9 / 34

Additivity for non-disjoint sets P(A B) = P(A) + P(B) P(A B) The of unioning A and B requires adding up their individual probabilities... then subtracting out their intersection, so as not to double count that portion 10 / 34

The conditional of an event A occurring given that event B has already occurred is notated as P(A B) Prior of A: P(A) Posterior of A (after additional knowledge B): P(A B) (1) P(A B) = P(A B) P(B) = P(A,B) P(B) P(A, B) (or P(AB)) is the joint In some sense, B has become the sample space 11 / 34

Example A coin is flipped twice. If we assume that all four points in the sample space Ω = {(H, H), (H, T), (T, H), (T, T)} are equally likely, what is the conditional that both flips result in heads given that the first flip does? 12 / 34

Example A coin is flipped twice. If we assume that all four points in the sample space Ω = {(H, H), (H, T), (T, H), (T, T)} are equally likely, what is the conditional that both flips result in heads given that the first flip does? A = {(H, H)}, B = {(H, H), (H, T)} P(A) = 1 4 P(B) = 2 4 P(A, B) = 1 4 12 / 34

Example A coin is flipped twice. If we assume that all four points in the sample space Ω = {(H, H), (H, T), (T, H), (T, T)} are equally likely, what is the conditional that both flips result in heads given that the first flip does? A = {(H, H)}, B = {(H, H), (H, T)} P(A) = 1 4 P(B) = 2 4 P(A, B) = 1 4 P(A B) = P(A,B) = 1 4 P(B) 2 4 = 1 2 12 / 34

The chain rule The multiplication rule restates P(A B) = P(A B) P(B) : (2) P(A B) = P(A B)P(B) = P(B A)P(A) The chain rule (used in Markov models): (3) P(A 1... A n ) = P(A 1 )P(A 2 A 1 )P(A 3 A 1 A 2 )...P(A n n 1 A i ) i=1 i.e., to obtain the of events occurring: select the first event select the second event, given the first... select the n th event, given all the previous ones 13 / 34

Independence Two events are independent if knowing one does not affect the of the other Events A and B are independent if P(A) = P(A B) i.e., P(A B) = P(A)P(B) i.e., of seeing A and B together = product of seeing each one individually, as one does not affect other 14 / 34

Independence Example fair die: event A = divisible by two; event B = divisible by three P(AB) = P({six}) = 1 6 P(A) P(B) = 1 2 1 3 = 1 6 event C = divisible by four P(C) = P({four}) = 1 6 P(AC) = P({four}) = 1 6 P(A) P(C) = 1 2 1 6 = 1 12 15 / 34

allows one to calculate P(B A) in terms of P(A B) (4) P(B A) = P(A B) P(A) = P(A B)P(B) P(A) 16 / 34

Bayes Getting the most likely event takes into account the normalizing constant P(A) (5) P(B A) = P(A B) P(A) = P(A B)P(B) P(A) If P(A) is the same for every event of interest, and we want to find the value of B which maximizes the function: (6) arg max B P(A B)P(B) P(A) = arg max B P(A B)P(B)... so, in these cases, we can ignore the denominator 17 / 34

Partitioning Complement sets Let E and F be events. E = EF EF c where EF and EF c are mutually exclusive. P(E) = P(EF) + P(EF c ) = P(E F)P(F) + P(E F c )P(F c ) = P(E F)P(F) + P(E F c )P(1 P(F)) 18 / 34

Example An insurance company groups people into two classes: Those who are accident-prone and those who are not. Their statistics show that an accident-prone person will have an accident within a fixed 1-year period with prob. 0.4. Whereas the prob. decreases to 0.2 for a non-accident-prone person. Let us assume that 30 percent of the population are accident prone. What is the prob. that a new policyholder will have an accident within a year of purchasing a policy? 19 / 34

Example (2) Y = policyholder will have an accident within one year A = policyholder is accident prone look for P(Y) 20 / 34

Example (2) Y = policyholder will have an accident within one year A = policyholder is accident prone look for P(Y) P(Y) = P(Y A)P(A) + P(Y A c )P(A c ) = 0.4 0.3 + 0.2 0.7 = 0.26 20 / 34

Partitioning Let s say we have i different, disjoint sets B i, and these sets partition A (i.e., A i B i ) Then, the following is true: (7) P(A) = i P(A B i ) = i P(A B i )P(B i ) This gives us a more general form of : (8) P(B j A) = P(A B j)p(b j ) = P(A B j)p(b j ) P(A) i P(A B i )P(B i ) 21 / 34

Example of Assume the following: Bowl B 1 (P(B 1 ) = 1 3 ) has 2 red and 4 white chips Bowl B 2 (P(B 2 ) = 1 6 ) has 1 red and 2 white chips Bowl B 3 (P(B 3 ) = 1 2 ) has 5 red and 4 white chips Given that we have pulled a red chip, what is the that it came from bowl B 1? In other words, what is P(B 1 R)? 22 / 34

Example of Assume the following: Bowl B 1 (P(B 1 ) = 1 3 ) has 2 red and 4 white chips Bowl B 2 (P(B 2 ) = 1 6 ) has 1 red and 2 white chips Bowl B 3 (P(B 3 ) = 1 2 ) has 5 red and 4 white chips Given that we have pulled a red chip, what is the that it came from bowl B 1? In other words, what is P(B 1 R)? P(B 1 ) = 1 3 P(R B 1 ) = 2 2+4 = 1 3 P(R) = P(B 1 R) + P(B 2 R) + P(B 3 R) = P(R B 1 )P(B 1 ) + P(R B 2 )P(B 2 ) + P(R B 3 )P(B 3 ) = 4 9 22 / 34

Example of Assume the following: Bowl B 1 (P(B 1 ) = 1 3 ) has 2 red and 4 white chips Bowl B 2 (P(B 2 ) = 1 6 ) has 1 red and 2 white chips Bowl B 3 (P(B 3 ) = 1 2 ) has 5 red and 4 white chips Given that we have pulled a red chip, what is the that it came from bowl B 1? In other words, what is P(B 1 R)? P(B 1 ) = 1 3 P(R B 1 ) = 2 2+4 = 1 3 P(R) = P(B 1 R) + P(B 2 R) + P(B 3 R) = P(R B 1 )P(B 1 ) + P(R B 2 )P(B 2 ) + P(R B 3 )P(B 3 ) = 4 9 So, we have: P(B 1 R) = P(R B 1)P(B 1 ) P(R) = (1/3)(1/3) 4/9 = 1 4 22 / 34

Random Variables Mapping Outcomes to Numerical Values If we roll 2 dice: Outcomes Numerical Value Probability 1 {(1,1)} 2 36 1 {(1,2),(2,1)} 3 18 1 {(1,3),(2,2),(3,1)} 4 12 1 {(1,4),(2,3),(3,2),(4,1)} 5 9... 1 {(4,6),(5,5),(6,4)} 10 12 1 {(5,6),(6,5)} 11 18 {(6,6)} 12 1 36 23 / 34

Random Variables Motivation A random variable maps the set of outcomes Ω into the set of real numbers. Formally, a random variable is a function X R n Motivation for random variables: They abstract away from outcomes by putting outcomes into equivalence classes Mapping to numerical values makes calculations easier. They facilitate numerical manipulations, such as the definition of mean and standard deviation. 24 / 34

Example Suppose that our experiment consists of tossing 3 fair coins. If we let Y denote the number of heads appearing, then Y is a random variable taking on one of the values 0, 1, 2, 3 with respective probabilities. P{ Y = 0} = P({(T, T, T)}) = 1 8 P{ Y = 1} = P({(T, T, H), (T, H, T), (H, T, T)}) = 3 8 P{ Y = 2} = P({(H, H, T), (H, T, H), (T, H, H)}) = 3 8 P{ Y = 3} = P({(H, H, H)}) = 1 8 25 / 34

Example (2) P{ Y = 0} = P({(T, T, T)}) = 1 8 P{ Y = 1} = P({(T, T, H), (T, H, T), (H, T, T)}) = 3 8 P{ Y = 2} = P({(H, H, T), (H, T, H), (T, H, H)}) = 3 8 P{ Y = 3} = P({(H, H, H)}) = 1 8 Since Y must take on one of the values 0 through 3, 3 3 1 = P( {Y = i}) = P{Y = i} i=0 i=0 26 / 34

Expectation of a Random Variable If X is a discrete random variable with mass function p(a), the expectation or expected value of X, denoted by E[X], is defined by E[X] = x:p(x)>0 x p(x) 27 / 34

Expectation of a Random Variable If X is a discrete random variable with mass function p(a), the expectation or expected value of X, denoted by E[X], is defined by E[X] = x:p(x)>0 x p(x) In other words: E[X] is the weighted average of the possible values of X, each value being weighted by the that X has that particular value. 27 / 34

Example Find E[X] where X is the outcome when we roll a fair die. Solution Since p(1) = = p(6) = 1 6, E[X] = 1 ( 1 6 )+2 (1 6 )+3 (1 6 )+4 (1 6 )+5 (1 6 )+6 (1 6 ) = 7 2 28 / 34

Example (2) Assume we have the following weighted die: p(x = 1) = p(x = 2) = p(x = 5) = 1 6 p(x = 3) = p(x = 4) = 1 12 p(x = 6) = 1 3 What is the expectation here? 29 / 34

Example (3) A school class of 120 students are driven in three buses to a symphonic performance. There are 36 students in one of the buses, 40 in another, and 44 in the third bus. When the buses arrive, one of the 120 students is randomly chosen. Let X be a random variable denoting the students on the bus of that randomly chosen student. Task: Find E[X] 30 / 34

Variance Expectation alone ignores the question: Do the values of a random variable tend to be consistent over many trials or do they tend to vary a lot? If X is a discrete random variable with mean µ, then the variance of X, denoted by Var(X), is defined by Var(X) = E[(X µ) 2 ] = E(X 2 ) (E(X)) 2 The standard deviation of X, denoted by σ(x), is defined as σ(x) = Var(X) 31 / 34

Example Calculate Var(X) if X represents the outcome when a fair die is rolled. 32 / 34

Example Calculate Var(X) if X represents the outcome when a fair die is rolled. Solution E[X 2 ] = 1 2 ( 1 6 ) + 22 ( 1 6 ) + 32 ( 1 6 ) + 42 ( 1 6 ) +5 2 ( 1 6 ) + 62 ( 1 6 ) = ( 1 6 )(91) Hence Var(X) = 91 6 (7 2 )2 = 35 12 32 / 34

Example for expectation and variance When we roll two dice, what is the expectation and the variance for the sum of the numbers on the two dice? (9) E(X) = E(Y + Y) = E(Y) + E(Y) = 3.5 + 3.5 = 7 (10) Var(X) = E((X E(X)) 2 ) = x p(x)(x E(X)) 2 = x p(x)(x 7) 2 = 5 5 6 33 / 34

Joint distributions With (discrete) random variables, we can define: Joint pmf: The of both x and y happening (11) p(x, y) = P(X = x, Y = y) Marginal pmfs: The of x happening is the sum of the occurrences of x with all the different y s (12) p X (x) = y p(x, y) (13) p Y (y) = x p(x, y) If X and Y are independent, then p(x, y) = p X (x)p Y (y), so, e.g., the of rolling two sixes is: p(x = 6, Y = 6) = p(x = 6)p(Y = 6) = ( 1 6 )( 1 6 ) = 1 36 34 / 34