UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 11: Probability Review. Dr. Yanjun Qi. University of Virginia. Department of Computer Science

Similar documents
Where are we? è Five major sec3ons of this course

UVA CS / Introduc8on to Machine Learning and Data Mining

STAT 598L Probabilistic Graphical Models. Instructor: Sergey Kirshner. Probability Review

Basics on Probability. Jingrui He 09/11/2007

CPSC340. Probability. Nando de Freitas September, 2012 University of British Columbia

Probability Theory for Machine Learning. Chris Cremer September 2015

Machine Learning

Computational Genomics

UVA CS 4501: Machine Learning

Data Mining Techniques. Lecture 3: Probability

COMP 328: Machine Learning

Machine Learning Algorithm. Heejun Kim

Chapter 6: Probability The Study of Randomness

CS 361: Probability & Statistics

Machine Learning

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

LECTURE 1. 1 Introduction. 1.1 Sample spaces and events

Dept. of Linguistics, Indiana University Fall 2015

Recitation 2: Probability

Introduction to Probability and Statistics (Continued)

Lecture 1: Basics of Probability

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

Lecture 10: Bayes' Theorem, Expected Value and Variance Lecturer: Lale Özkahya

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Introduction to Stochastic Processes

PAC-learning, VC Dimension and Margin-based Bounds

Machine Learning: Probability Theory

Review of probability

Econ 325: Introduction to Empirical Economics

Review of Probability. CS1538: Introduction to Simulations

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

ELEG 3143 Probability & Stochastic Process Ch. 2 Discrete Random Variables

SDS 321: Introduction to Probability and Statistics

COS402- Artificial Intelligence Fall Lecture 10: Bayesian Networks & Exact Inference

Brief Review of Probability

Naïve Bayes classification

2/3/04. Syllabus. Probability Lecture #2. Grading. Probability Theory. Events and Event Spaces. Experiments and Sample Spaces

STAT Chapter 3: Probability

Grundlagen der Künstlichen Intelligenz

CS 4649/7649 Robot Intelligence: Planning

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Review of Basic Probability

Introduction to Probability

CIVL 7012/8012. Basic Laws and Axioms of Probability

Intelligent Systems (AI-2)

Conditional Probability

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

CS626 Data Analysis and Simulation

Introduction to Machine Learning

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Be able to define the following terms and answer basic questions about them:

Intelligent Systems (AI-2)

Probability Theory. Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg.

Introduction to Bayesian Learning

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Introduction to probability

Probability Review. Chao Lan

UVA CS / Introduc8on to Machine Learning and Data Mining. Lecture 9: Classifica8on with Support Vector Machine (cont.

CS6220: DATA MINING TECHNIQUES

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

COS 424: Interacting with Data. Lecturer: Dave Blei Lecture #2 Scribe: Ellen Kim February 7, 2008

Probability Theory. Introduction to Probability Theory. Principles of Counting Examples. Principles of Counting. Probability spaces.

Lecture Notes 1 Basic Probability. Elements of Probability. Conditional probability. Sequential Calculation of Probability

Discrete Mathematics and Probability Theory Spring 2016 Rao and Walrand Note 14

Randomized Algorithms

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Announcements. CS 188: Artificial Intelligence Spring Probability recap. Outline. Bayes Nets: Big Picture. Graphical Model Notation

Basic Probability and Information Theory: quick revision

Review of Probability Mark Craven and David Page Computer Sciences 760.

Discrete Random Variable

Generative Learning. INFO-4604, Applied Machine Learning University of Colorado Boulder. November 29, 2018 Prof. Michael Paul

STAT:5100 (22S:193) Statistical Inference I

Lecture Lecture 5

Course Introduction. Probabilistic Modelling and Reasoning. Relationships between courses. Dealing with Uncertainty. Chris Williams.

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

Review of Probabilities and Basic Statistics

Consider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Machine Learning

I - Probability. What is Probability? the chance of an event occuring. 1classical probability. 2empirical probability. 3subjective probability

Bayes Classifiers. CAP5610 Machine Learning Instructor: Guo-Jun QI

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

CIVL Why are we studying probability and statistics? Learning Objectives. Basic Laws and Axioms of Probability

Probability & statistics for linguists Class 2: more probability. D. Lassiter (h/t: R. Levy)

Be able to define the following terms and answer basic questions about them:

Lecture 2: Repetition of probability theory and statistics

Introduc)on to Ar)ficial Intelligence

Lecture 1: Probability Fundamentals

V7 Foundations of Probability Theory

Intelligent Systems (AI-2)

Terminology. Experiment = Prior = Posterior =

Bayesian Networks Representation

EE 178 Lecture Notes 0 Course Introduction. About EE178. About Probability. Course Goals. Course Topics. Lecture Notes EE 178

Probability Lecture #7

CS 630 Basic Probability and Information Theory. Tim Campbell

Machine Learning

UCSD CSE 21, Spring 2014 [Section B00] Mathematics for Algorithm and System Analysis

Transcription:

UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 11: Probability Review 10/17/16 Dr. Yanjun Qi University of Virginia Department of Computer Science 1

Announcements: Schedule Midterm Nov. 26 Wed / 3:30pm 4:45pm / open notes HW4 includes sample midterm quesoons Grading of HW1 is available on Collab SoluOon of HW1 is available on Collab Grading of HW2 will be available next week SoluOon of HW2 will be available next week 10/17/16 2

Where are we? è Five major secoons of this course q Regression (supervised) q ClassificaOon (supervised) q Unsupervised models q Learning theory q Graphical models 10/17/16 3

Where are we? è Three major secoons for classificaoon We can divide the large variety of classification approaches into roughly three major types 1. Discriminative - directly estimate a decision rule/boundary - e.g., support vector machine, decision tree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors 10/17/16 4

Today : Probability Review Dr. Yanjun Qi / UVA CS 6316 / f16 The big picture Events and Event spaces Random variables Joint probability, MarginalizaOon, condiooning, chain rule, Bayes Rule, law of total probability, etc. Structural properoes Independence, condioonal independence 10/17/16 5

The Big Picture Probability Model i.e. Data generaong process Observed Data EsOmaOon / learning / Inference / Data mining But how to specify a model? 10/17/16 6

Probability as frequency Consider the following quesoons: 1. What is the probability that when I flip a coin it is heads? 2. why? We can count è ~1/2 3. What is the probability of Blue Ridge Mountains to have an erupong volcano in the near future? è could not count Message: The frequen*st view is very useful, but it seems that we also use domain knowledge to come up with probabili*es. 10/17/16 7 Adapt from Prof. Nando de Freitas s review slides

Probability as a measure of uncertainty Imagine we are throwing darts at a wall of size 1x1 and that all darts are guaranteed to fall within this 1x1 wall. What is the probability that a dart will hit the shaded area? Dr. Yanjun Qi / UVA CS 6316 / f16 10/17/16 8 Adapt from Prof. Nando de Freitas s review slides

Probability as a measure of uncertainty Probability is a measure of certainty of an event taking place. Dr. Yanjun Qi / UVA CS 6316 / f16 i.e. in the example, we were measuring the chances of hi?ng the shaded area. prob = # RedBoxes # Boxes 10/17/16 9 Adapt from Prof. Nando de Freitas s review slides

Today : Probability Review Dr. Yanjun Qi / UVA CS 6316 / f16 The big picture Sample space, Event and Event spaces Random variables Joint probability, MarginalizaOon, condiooning, chain rule, Bayes Rule, law of total probability, etc. Structural properoes Independence, condioonal independence 10/17/16 10

Probability O die = {1,2,3,4,5,6} O: Elementary Event Throw a 2 The elements of W are called elementary events. 10/17/16 11

Probability Probability allows us to measure many events. The events are subsets of the sample space O. For example, for a die we may consider the following events: e.g., GREATER = {5, 6} EVEN = {2, 4, 6} Assign probabili7es to these events: e.g., P(EVEN) = 1/2 10/17/16 12 Adapt from Prof. Nando de Freitas s review slides

Sample space and Events Dr. Yanjun Qi / UVA CS 6316 / f16 O : Sample Space, result of an experiment / set of all outcomes If you toss a coin twice O = {HH,HT,TH,TT} Event: a subset of O First toss is head = {HH,HT} S: event space, a set of events: 10/17/16 13 Contains the empty event and O

Axioms for Probability Dr. Yanjun Qi / UVA CS 6316 / f16 Defined over (O,S) s.t. 1 >= P(a) >= 0 for all a in S P(O) = 1 If A, B are disjoint, then P(A U B) = p(a) + p(b) 10/17/16 14

Axioms for Probability P(O) = P( B ) i B 5 B 3 B 2 B 4 B 7 B 6 B 1 10/17/16 15

OR operaoon for Probability We can deduce other axioms from the above ones Ex: P(A U B) for non-disjoint events P( Union of A and B) 10/17/16 16

A 10/17/16 17

P( IntersecOon of A and B) B A 10/17/16 18

B A 10/17/16 19

A CondiOonal Probability Dr. Yanjun Qi / UVA CS 6316 / f16 B from Prof. Nando de Freitas s review 10/17/16 20

CondiOonal Probability / Chain Rule More ways to write out chain rule P(A,B) = p(b A)p(A) P(A,B) = p(a B)p(B) 10/17/16 21

Rule of total probability => MarginalizaOon Dr. Yanjun Qi / UVA CS 6316 / f16 B 5 B 3 B 2 B 4 A B 7 B 6 B 1 p A ( )P( A B ) WHY??? i ( ) = P B i 10/17/16 22

Today : Probability Review Dr. Yanjun Qi / UVA CS 6316 / f16 The big picture Events and Event spaces Random variables Joint probability, MarginalizaOon, condiooning, chain rule, Bayes Rule, law of total probability, etc. Structural properoes Independence, condioonal independence 10/17/16 23

From Events to Random Variable Concise way of specifying arributes of outcomes Modeling students (Grade and Intelligence): O = all possible students (sample space) What are events (subset of sample space) Grade_A = all students with grade A Grade_B = all students with grade B HardWorking_Yes = who works hard Very cumbersome Need funcoons that maps from O to an arribute space T. P(H = YES) = P({student ϵ O : H(student) = YES}) 10/17/16 24

O Random Variables (RV) H: hardworking Yes No Dr. Yanjun Qi / UVA CS 6316 / f16 G:Grade A B A+ P(H = Yes) = P( {all students who is working hard on the course}) funcoons that maps from O to an arribute space T. 10/17/16 25

NotaOon Digression P(A) is shorthand for P(A=true) P(~A) is shorthand for P(A=false) Same notaoon applies to other binary RVs: P(Gender=M), P(Gender=F) Same notaoon applies to mul*valued RVs: P(Major=history), P(Age=19), P(Q=c) Note: upper case lerers/names for variables, lower case lerers/names for values 10/17/16 26

Discrete Random Variables Random variables (RVs) which may take on only a countable number of disinct values X is a RV with arity k if it can take on exactly one value out of {x 1,, x k } 10/17/16 27

Probability of Discrete RV Probability mass funcoon (pmf): P(X = x i ) Easy facts about pmf Σ i P(X = x i ) = 1 P(X = x i X = x j ) = 0 if i j P(X = x i U X = x j ) = P(X = x i ) + P(X = x j ) if i j P(X = x 1 U X = x 2 U U X = x k ) = 1 10/17/16 28

e.g. Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect 10/17/16 29

e.g. Coin Flips cont. You flip a coin Head with probability p Binary random variable Bernoulli trial with success probability p You flip k coins How many heads would you expect Number of heads X: discrete random variable Binomial distribuoon with parameters k and p 10/17/16 30

Discrete Random Variables Random variables (RVs) which may take on only a countable number of disinct values E.g. the total number of heads X you get if you flip 100 coins X is a RV with arity k if it can take on exactly one value out of { x 1,, x } k E.g. the possible values that X can take on are 0, 1, 2,, 100 10/17/16 31

e.g., two Common DistribuOons Uniform X U 1,..., N X takes values 1, 2,, N PX ( = i) = 1N E.g. picking balls of different colors from a box Binomial X takes values 0, 1,, k 10/17/16 P( X = i) = X Bin k, p k i E.g. coin flips k Omes ( ) pi 1 p ( ) k i 32

Today : Probability Review Dr. Yanjun Qi / UVA CS 6316 / f16 The big picture Events and Event spaces Random variables Joint probability, MarginalizaOon, condiooning, chain rule, Bayes Rule, law of total probability, etc. Structural properoes Independence, condioonal independence 10/17/16 33

e.g., Coin Flips by Two Persons Your friend and you both flip coins Head with probability 0.5 You flip 50 Omes; your friend flip 100 Omes How many heads will both of you get 10/17/16 34

Joint DistribuOon Given two discrete RVs X and Y, their joint distribuion is the distribuoon of X and Y together E.g. P(You get 21 heads AND you friend get 70 heads) E.g. sum ( x y) 50 100 10/17/16 i= 0 j= 0 x y PX= Y= = 1 ( i j ) PYou get heads AND your friend get heads = 1 35

Joint DistribuOon Given two discrete RVs X and Y, their joint distribuion is the distribuoon of X and Y together E.g. P(You get 21 heads AND you friend get 70 heads) E.g. sum ( x y) 50 100 10/17/16 i= 0 j= 0 x y PX= Y= = 1 ( i j ) PYou get heads AND your friend get heads = 1 36

CondiOonal Probability ( ) P X = x Y = y is the probability of X = x, given the occurrence of Y = y E.g. you get 0 heads, given that your friend gets 61 heads P X ( x Y y) = = = ( = x Y= y) P X ( = y) P Y 10/17/16 37

Law of Total Probability Given two discrete RVs X and Y, which take values in { x and, We have 1,, x } { m y 1,, y } n ( = x ) ( ) i = = xi = yj PX PX Y j ( x ) ( ) i yj yj = PX= Y= PY= j 10/17/16 38

MarginalizaOon B 4 B 5 B 3 B 2 A B 7 B 6 B 1 Marginal Probability Law of Total Probability ( = x ) ( ) i = = xi = yj PX PX Y j ( x ) ( ) i yj yj = PX= Y= PY= j CondiOonal Probability Marginal Probability 10/17/16 39

Simplify NotaOon: Dr. Yanjun Qi / UVA CS 6316 / f16 CondiOonal Probability events ( x Y y) P X = = = ( = x Y= y) P X ( = y) P Y But we will always write it this way: ( x y) P = p( x, y) p( y) X=x Y=y P(X=x true) -> P(X=x) -> P(x) 10/17/16 40

Simplify NotaOon: MarginalizaOon We know p(x, Y), what is P(X=x)? We can use the law of total probability, why? Dr. Yanjun Qi / UVA CS 6316 / f16 p ( x) = P( x, y) = y y P ( y) P( x y) B 4 B 7 B 5 B 6 A B 3 B 1 B 2 10/17/16 41

MarginalizaOon Cont. Another example ( ) ( ) ( ) ( ) = = y z z y z y x P z y P z y x P x p,,,,,, 10/17/16 42 Dr. Yanjun Qi / UVA CS 6316 / f16

Bayes Rule We know that P(rain) = 0.5 If we also know that the grass is wet, then how this affects our belief about whether it rains or not? P( rain wet) = P(rain)P(wet rain) P(wet) 10/17/16 43

Bayes Rule We know that P(rain) = 0.5 If we also know that the grass is wet, then how this affects our belief about whether it rains or not? P( rain wet) = P(rain)P(wet rain) P(wet) P( x y) = P(x)P(y x) P(y) 10/17/16 44

Bayes Rule X and Y are discrete RVs ( x Y y) P X = = = ( = x Y= y) P X ( = y) P Y ( x Y ) i yj P X 10/17/16 = = = ( = y ) ( ) j = xi = xi P( Y = y X ) P( X ) j = xk = xk P Y X P X k 45

10/17/16 46

P(A = a 1 B) = P(B A = a 1 )P(A = a 1 ) P(B A = a i )P(A = a i ) i 10/17/16 47

Bayes Rule cont. You can condioon on more variables ( ) ) ( ), ( ) (, z y P z x y P z x P z y x P = 10/17/16 48 Dr. Yanjun Qi / UVA CS 6316 / f16

CondiOonal Probability Example 10/17/16 49 Adapt from Prof. Nando de Freitas s review slides

CondiOonal Probability Example 10/17/16 50 Adapt from Prof. Nando de Freitas s review slides

CondiOonal Probability Example 10/17/16 51

CondiOonal Probability Example 10/17/16 52

è Matrix NotaOon 10/17/16 53

Today : Probability Review Dr. Yanjun Qi / UVA CS 6316 / f16 The big picture Events and Event spaces Random variables Joint probability, MarginalizaOon, condiooning, chain rule, Bayes Rule, law of total probability, etc. Structural properoes Independence, condioonal independence 10/17/16 54

Independent RVs IntuiOon: X and Y are independent means that X = x that Y neither makes it more or less probable = y DefiniOon: X and Y are independent iff PX= x Y= y = PX= x PY= y ( ) ( ) ( ) 10/17/16 55

More on Independence ( = x = y) = ( = x) ( = y) PX Y PX PY ( = x = y) = ( = x) P( Y = y X = x) = P( Y = y) P X Y P X E.g. no marer how many heads you get, your friend will not be affected, and vice versa 10/17/16 56

More on Independence Dr. Yanjun Qi / UVA CS 6316 / f16 X is independent of Y means that knowing Y does not change our belief about X. P(X Y=y) = P(X) P(X=x, Y=y) = P(X=x) P(Y=y) The above should hold for all x i, y j It is symmetric and wriren as!x Y 10/17/16 57

CondiOonally Independent RVs IntuiOon: X and Y are condioonally independent given Z means that once Z is known, the value of X does not add any addiional informaoon about Y DefiniOon: X and Y are condioonally independent given Z iff ( = x = y = z) = ( = x = z) ( = y = z) P X Y Z P X Z P Y Z 10/17/16 If holding for all x i, y j, z k! X Y Z 58

CondiOonally Independent RVs! X Y Z!X Y 10/17/16 59

More on CondiOonal Independence ( = x = y = z) = ( = x = z) ( = y = z) P X Y Z P X Z P Y Z ( = x = y = z) = ( = x = z) P X Y,Z P X Z ( = y = x = z) = ( = y = z) P Y X,Z P Y Z 10/17/16 60

Today Recap : Probability Review The big picture : data <-> probabilisoc model Sample space, Events and Event spaces Random variables Joint probability, Marginal probability, condioonal probability, Chain rule, Bayes Rule, Law of total probability, etc. Independence, condioonal independence 10/17/16 61

References q Prof. Andrew Moore s review tutorial q Prof. Nando de Freitas s review slides q Prof. Carlos Guestrin recitaoon slides 10/17/16 62