Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy)

Similar documents
1 What are probabilities? 2 Sample Spaces. 3 Events and probability spaces

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Probability theory basics

Probability Theory and Simulation Methods

Computational Cognitive Science

Review of Probabilities and Basic Statistics

Probabilistic Models in the Study of Language

Probability Theory for Machine Learning. Chris Cremer September 2015

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Dept. of Linguistics, Indiana University Fall 2015

7.1 What is it and why should we care?

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #2 Scribe: Ellen Kim February 7, 2008

CS 188: Artificial Intelligence. Our Status in CS188

Lecture 6. Probability events. Definition 1. The sample space, S, of a. probability experiment is the collection of all

Discrete Probability Distributions

Probability Theory and Applications

Day 1: Probability and speech perception

Conditional Probability

RVs and their probability distributions

Chapter Learning Objectives. Random Experiments Dfiii Definition: Dfiii Definition:

An AI-ish view of Probability, Conditional Probability & Bayes Theorem

10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.

Introduction to Probability and Statistics (Continued)

Review of probability

Advanced Probabilistic Modeling in R Day 1

Random Variables. A random variable is some aspect of the world about which we (may) have uncertainty

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Our Status. We re done with Part I Search and Planning!

Sample Spaces, Random Variables

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

CMPSCI 240: Reasoning Under Uncertainty

Review of Probability Mark Craven and David Page Computer Sciences 760.

CS 630 Basic Probability and Information Theory. Tim Campbell

Outline. CSE 573: Artificial Intelligence Autumn Agent. Partial Observability. Markov Decision Process (MDP) 10/31/2012

Independence. CS 109 Lecture 5 April 6th, 2016

Basic Probability and Statistics

Pengju XJTU 2016

Where are we? è Five major sec3ons of this course

Linguistics 251 lecture notes, Fall 2008

2.1 Elementary probability; random sampling

Computational Cognitive Science

Terminology. Experiment = Prior = Posterior =

Outline. Binomial, Multinomial, Normal, Beta, Dirichlet. Posterior mean, MAP, credible interval, posterior distribution

Deep Learning for Computer Vision

Probability and Estimation. Alan Moses

Uncertainty. Chapter 13

Probabilistic Reasoning

Classical and Bayesian inference

Properties of Probability

Machine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation

CS 188: Artificial Intelligence Fall 2011

CS 188: Artificial Intelligence. Bayes Nets

Intelligent Systems (AI-2)

STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Bayes Theorem and Hypergeometric Distribution

Uncertain Knowledge and Bayes Rule. George Konidaris

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Data Analysis and Monte Carlo Methods

Artificial Intelligence Uncertainty

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Probabilistic Methods in Linguistics Lecture 2

Quantifying Uncertainty & Probabilistic Reasoning. Abdulla AlKhenji Khaled AlEmadi Mohammed AlAnsari

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

MATH MW Elementary Probability Course Notes Part I: Models and Counting

Probabilistic Models

UVA CS / Introduc8on to Machine Learning and Data Mining

Machine Learning

Random Variables and Events

{ p if x = 1 1 p if x = 0

Hidden Markov Models. Vibhav Gogate The University of Texas at Dallas

Recap of Basic Probability Theory

Some Probability and Statistics

Grundlagen der Künstlichen Intelligenz

Conditional Probability

Probability Year 9. Terminology

Probabilistic Robotics

Probability. Lecture Notes. Adolfo J. Rumbos

Uncertainty and Bayesian Networks

Statistics 1 - Lecture Notes Chapter 1

Recitation 2: Probability

Recap of Basic Probability Theory

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 11: Probability Review. Dr. Yanjun Qi. University of Virginia. Department of Computer Science

COMP 328: Machine Learning

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Chapter 3 : Conditional Probability and Independence

Discrete Random Variables

Math 105 Course Outline

Sociology 6Z03 Topic 10: Probability (Part I)

Ling 289 Contingency Table Statistics

Statistics for Linguists. Don t pannic

Discrete Mathematics and Probability Theory Fall 2014 Anant Sahai Note 15. Random Variables: Distributions, Independence, and Expectations

CS 361: Probability & Statistics

Lecture 2: Repetition of probability theory and statistics

Estimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio

Lecture 1: Probability Fundamentals

Transcription:

Probability & statistics for linguists Class 2: more probability D. Lassiter (h/t: R. Levy)

conditional probability P (A B) = when in doubt about meaning: draw pictures. P (A \ B) P (B) keep B- consistent values propor:onally the same normalize a-er removing not- B as a possibility FYI: Bayesians often think of conditional probability judgments as conceptually basic, use them to define conjunctive probability: P (A \ B) =P (A B) P (B)

probabilistic dynamics A core Bayesian assumption: For any propositions A and B, your degree of belief P(B), after observing that A is true, should be equal to your conditional degree of belief P(B A) before you made this observation. Dynamics of belief are determined by the initial model, data received, and conditioning. [Don t adjust anything without a reason!]

CP example Hypothetical probabilities from historical English: Pronoun Not Pronoun Object Preverbal 0.224 0.655 Object Postverbal 0.014 0.107 How do we calculate P (Pronoun Postverbal)? P (Postverbal \ Pronoun) P (Pronoun Postverbal) = P (Postverbal) 0.014 = 0.014 + 0.107 =0.116

Bayes rule P (A B) = P (B A) P (A) P (B) Exercise: prove from CP definition. Why is this mathematical triviality so exciting?

More on Bayes rule with conventional names for terms P (A B) = Likelihood z } { P (B A) with extra background variables: Prior z } { P (A) P (B) {z } Normalizing constant P (A B,I) = P (B A, I)P (A I) P (B I)

Bayes rule example consider hypothetical Old English: P (Object Animate) =0.4 P (Object Postverbal Object Animate) =0.7 P (Object Postverbal Object Inanimate) =0.8 Imagine you're an incremental sentence processor. You encounter a transitive verb but haven't encountered the object yet. How likely is it that the object is animate? i.e., compute P (Anim PostV)

random variables a discrete RV is a function from Ω to a finite or countably infinite set of reals. next week: continuous RVs P(X=x) is a probability mass function. Ex: a Bernoulli trial with outcomes success and failure (yes/no, 0/1,..) 8 >< if x =1 P (X = x) = 1 if x =0 >: 0 otherwise

multinomial trials Generalizing the Bernoulli trial to multiple outcomes c 1,...,c r, we get a multinomial trial with r 1 parameters 1,..., r 1. 8 1 if x = c 1 P (X = x) = 2 if x = c 2 ><.. r 1 if x = c r 1 P r 1 1 i=1 >: i if x = c r 0 otherwise [Why only r 1 parameters, instead of r?]

example of multinomial trials You decide to pull Alice in Wonderland o your bookshelf, open to a random page, put your finger down randomly on that page, and record the letter that your finger is resting on. In Alice in Wonderland, 12.6% of the letters are e, 9.9% are t, 8.2% are a, and so forth. We could write the parameters of our model as e =0.126, t =0.099, a =0.082, and so forth.

another perspective a (random) variable is a partition on W what semanticists think of as a question denotation. rain? =[ is it raining? ] = {{w rain(w)}, {w rain(w)}} Dan-hunger =[ How hungry is Dan? ] ={{w hungry(w)(d)}, {w sorta-hungry(w)(d)}, {w very-hungry(w)(d)}}

joint probability We ll often use capital letters for RVs, lowercase for specific answers. P(X=x): prob. that the answer to X is x Joint probability: a distribution over all possible combinations of a set of variables. P (X = x ^ Y = y) usu.written P (X = x, Y = y)

2-Variable model rain no rain not hungry sorta hungry A joint distribu:on determines a number for each cell. very hungry

marginal probability P (X = x) = X y P (X = x ^ Y = y) P(it s raining) is the sum of: P(it s raining and Dan s not hungry) P(it s raining and Dan s kinda hungry) P(it s raining and Dan s very hungry) obvious given that RVs are just partitions

X = independence Y,8x8y : P (X = x) =P (X = x Y = y) X and Y are independent RVs iff: changing P(X) does not affect P(Y) equiv.: if changing P(Y) does not affect P(X) Pearl 00: independence is a default; cognitively more basic than prob. estimation ex.: traffic in LA vs. price of beans in China greatly simplifies probabilistic inference

2-Variable model not hungry rain no rain Here, let probability be propor:onal to area. sorta hungry rain, Dan- hunger independent very hungry probably, it s raining probably, Dan is sorta hungry

2-RV structured model not hungry rain no rain rain, Dan- hunger not indep.: rain reduces appe:te sorta hungry If rain, Dan s probably not hungry very hungry If no rain, Dan s probably sorta hungry

conditional independence If A is conditionally independent of B, given C, we write A B C. = This holds i, for all values of A and B, P (A B,C) =P (A C) Important: A B does not imply A B C. This is exemplified by explaining away. = =

Bayes nets rain sprinkler wet grass dependent on rain and sprinkler upon observing wet grass = 1, update P(V) := P(V wet grass = 1) high probability that at least one enabler is true (Pearl, 1988) wet grass rain and sprinkler independent but condi:onally dependent given wet grass!

cumulative probability Flip a π-weighted coin till you get heads, then stop. Let Z denote the question/rv how many flips before stopping?. Possible values are the z s: 0,1,2,3,... P(0) =? P(1) =? P(2) =? P(3) =?...

cumulative probability Flip a π-weighted coin till you get heads, then stop. Let Z denote the question/rv how many flips before stopping?. Possible values are the z s: 0,1,2,3,... P(0) = 0 P(1) = π P(2) = (1 π) * π P(3) = (1 π) 2 * π...

cumulative probability Flip a π-weighted coin till you get heads, then stop. Let Z denote the question/rv how many flips before stopping?. Possible values are the z s: 0,1,2,3,... in general, P (z) =(1 ) (z 1) why?

cumulative probability Flip a π-weighted coin till you get heads, then stop. Let Z denote the question/rv how many flips before stopping?. Possible values are the z s: 0,1,2,3,... P (z) = prob. of z 1tails z } { (1 ) (z 1) {z} prob. of a head complicated-looking models are usually built up from simple logical reasoning like this

cumulative probability Z is how many flips before stopping? F(z) is the probability that Z z. Values of a RV are mutually exclusive, so just add: F(0) = 0 F(1) = P(1) F(2) = P(1) + P(2) F(3) = P(1) + P(2) + P(3)...

cumulative probability Z is how many flips before stopping? F(z) is the probability that Z z. F(0) = 0 F(1) = P(1) F(2) = F(1) + P(2) F(3) = F(2) + P(3) F(4) = F(3) + P(4)...

cumulative probability Z is how many flips before stopping? F(z) is the probability that Z z. In general, F (z) = X iapplez(1 ) (i 1)

cumulative probability suppose π =.1. Then P(z) and F(z) are: Probability Cumulative probability P(z) 0.00 0.04 0.08 0 20 40 60 80 F(z) 0.2 0.6 1.0 0 20 40 60 80 z z Now let s start learning how to do these calculations and plot them in R.