Probability Theory Review
|
|
- Jasmine Franklin
- 5 years ago
- Views:
Transcription
1 Probability Theory Review Brendan O Connor Recitation Sept 11 & 12,
2 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2
3 Probability Theory in AI Fundamental problem: Reasoning about uncertainty Uncertainty in Artificial Intelligence : history of non-probabilistic approaches. fuzzy logic, nonmonotonic reasoning, Dempster-Shafer, three-valued logics... Judea Pearl 2011 Turing Award winner Pearl 1988 (among others): probability theory is the way to go Probabilitistic approaches completely dominate modern machine learning 3
4 Why Probability Theory? Dutch book If you violate these rules, you will lose money. Kolmogorov Axioms Cox Axioms Probabilities of boolean variables. Logic. Bruno de Finetti Andrey Kolmogorov
5 Cox axioms Probability theory as an extension of boolean logic. Let B(X Z) be a degree of belief function. Belief in proposition X Conditional on background knowledge Z Assume 1. Degrees-of-belief are well-ordered 2. B(~X) = f[ B(X) ] 3. B(X and Y) = g( B(X), B(Y X) ) note controversies -- see references on 5
6 Cox axioms Probability theory as an extension of boolean logic. Let B(X Z) be a degree of belief function. Belief in proposition X Conditional on background knowledge Z Assume 1. Degrees-of-belief are well-ordered 2. B(~X) = f[ B(X) ] 3. B(X and Y) = g( B(X), B(Y X) ) Then can map B(.) to P(.) where 1. P(true) = 1 2. P(false) = 0 3. P(X and Y) = P(X) P(Y X) 4. P(~X) = 1 - P(X) note controversies -- see references on 5
7 Cox axioms Probability theory as an extension of boolean logic. Let B(X Z) be a degree of belief function. Belief in proposition X Conditional on background knowledge Z Assume 1. Degrees-of-belief are well-ordered 2. B(~X) = f[ B(X) ] 3. B(X and Y) = g( B(X), B(Y X) ) Then can map B(.) to P(.) where 1. P(true) = 1 2. P(false) = 0 3. P(X and Y) = P(X) P(Y X) 4. P(~X) = 1 - P(X) 5 1. G(true) = 0 2. G(false) = -infinity note controversies -- see references on 3. G(X and Y) = G(X) + G(Y X) 4. G(~X) = log(1 - exp[g(x)])
8 Many views of probability Meaning of a probability Epistemic: knowledge/belief; vs Long-term frequencies Related: Bayesian vs Frequentist statistics 6
9 Discrete Random Variables Views of RVs: (1) outcome of a random process, or (2) has a true value that you don t know Binary (boolean) variable Values(A) = {0, 1} Values(A) = {-1, +1} Values(A) = {, } Categorical variable Values(A) = {0, 1, 2} Values(A) = {55, 32, 10} Values(A) = {,, } Probability (Mass) Function P(A=a) --> [0, 1] P(A=1) = 0.7 P(A=0) = Many notations PA(a) P(A) P(a)
10 Crucial definitions/laws They hold for all RV s. Make sure you know them! Conditional Probability and Chain Rule P (A B) = P (AB) P (B) P (AB) =P (A B)P (B) Law of Total Probability P (A) = b P (A) = b P (A, B = b) P (A B = b)p (B = b) Disjunction (Union) P (A B) =P (A)+P (B) P (AB) Negation (Complement) P ( A) =1 P (A) 8
11 variables A, B, C A 0 0 B 0 0 C 0 1 Prob A C B 0.10 Class Gender Age No Yes Total 1st Male Child st Male Adult st Female Child st Female Adult Lower Male Child Lower Male Adult Lower Female Child Lower Female Adult totals: st Class Lower Male No Gender Survival Female Yes No Yes Adult Child Child Adult Age 9 Any close conditional independences? P(Survive adult lower female) \approx P(Survive child lower female) Survive _ _ Age lower female
12 Data : 1035 (A,B) pairs P(A= ) = A= A= B= P(A= ) = B=
13 Data : 1035 (A,B) pairs A= A= B= B= P(A= ) = P(A= ) =
14 Data : 1035 (A,B) pairs A= A= B= B= P(A= ) = P(A= ) = Sum Rule P (A = a) =1 P(A= ) + P(A= ) = 1 a V alues(a)
15 Data : 1035 (A,B) pairs A= A= B= B= P(A= ) = P(A= ) = Sum Rule P (A = a) =1 P(A= ) + P(A= ) = 1 a V alues(a) Negation (Complement) 1035 P(A!= ) = 1 - P(A= ) =
16 Data : 1035 (A,B) pairs Disjunctions A= A= B= B= P(A= or B= ) = #{A= or B= } N = P (A B) =P (A)+P (B) P (AB) P(A= or B= ) = = P(A= ) + P(B= ) - P(A=,B= ) =
17 Data : 1035 (A,B) pairs A= A= B= B= Joint and Conditional Probs P(A=, B= ) = P(A= B= ) = A= A= B= B=
18 Data : 1035 (A,B) pairs Joint and Conditional Probs A= A= B= P(A=, B= ) = #{A=, B= } N = B= P(A= B= ) = A= A= B= B=
19 Data : 1035 (A,B) pairs Joint and Conditional Probs A= A= B= P(A=, B= ) = #{A=, B= } N = B= #{A=, B= } 15 P(A= B= ) = #{B= } = A= A= B= B=
20 Data : 1035 (A,B) pairs Joint and Conditional Probs A= A= B= P(A=, B= ) = #{A=, B= } N = B= #{A=, B= } 15 P(A= B= ) = #{B= } = Every probability A= A= has a universe or background knowledge P(A=, B= in the table) B= B= P(A= B=, and in the table)
21 Data : 1035 (A,B) pairs Joint and Conditional Probs A= A= B= P(A=, B= ) = #{A=, B= } N = B= P(A= B= ) = #{A=, B= } #{B= } = Fraction of worlds in which B is true that also have A true P (A B) = P (AB) P (B) P(A=, B= ) P(A= B= ) = = P(B= )
22 Data : 1035 (A,B) pairs Joint and Conditional Probs A= A= B= P(A=, B= ) = #{A=, B= } N = B= P(A= B= ) = #{A=, B= } #{B= } = Def n conditional prob. P (A B) = P (AB) P (B) Chain rule P (AB) =P (A B)P (B)
23 Data : 1035 (A,B) pairs Law of Total Probability A= A= P (A) = b P (A, B = b) B= B= Break A into B subgroups Add joint probs across subgroups P(A= ) = P(A=, B= ) + P(A=, B= ) = A= A= B= B=
24 Data : 1035 (A,B) pairs Law of Total Probability A= A= P (A) = b P (A, B = b) B= B= Break A into B subgroups Add joint probs across subgroups A= A= B= B= Break universe into B subgroups Average cond probs across subgroups P(A= ) = P(A=, B= ) + P(A=, B= ) = P (A) = P (A B = b)p (B = b) b P(A= ) = P(A= B= ) P(B= ) + P(A= B= ) P(B= ) =
25 Crucial definitions/laws They hold for all RV s Conditional Probability and Chain Rule P (A B) = P (AB) P (B) P (AB) =P (A B)P (B) Law of Total Probability P (A) = b P (A) = b P (A, B = b) P (A B = b)p (B = b) Disjunction (Union) P (A B) =P (A)+P (B) P (AB) Negation (Complement) P ( A) =1 P (A) 16
26 Background conditional variants P ( A) =1 P (A) P (AB) =P (A B)P (B) etc... picture 17
27 Background conditional variants P ( A) =1 P (A) P ( A Z) =1 P (A Z) P (AB) =P (A B)P (B) etc picture
28 Background conditional variants P ( A) =1 P (A) P ( A Z) =1 P (A Z) P (AB) =P (A B)P (B) P (AB Z) =P (A BZ)P (B Z) etc picture
29 Independence A and B are independent when P(A B) = P(A) (Alternate formulation...) P(AB) =? P(A B) P(B) P(A) P(B) 18
30 Independence A and B are independent when P(A B) = P(A) (Alternate formulation...) P(AB) =? P(A B) P(B) P(A) P(B) 18
31 Independence Chain rule: in general P(ABC) = But when A, B, C are independent P(ABC) =? If you have independence -- can estimate joint probability of thousands of RV s [1] = P(A BC) P(B C) P(C)? = P(A) P(B) P(C) 19
32 Independence Chain rule: in general P(ABC) = P(A BC) P(B C) P(C) But when A, B, C are independent P(ABC) =? If you have independence -- can estimate joint probability of thousands of RV s [1] = P(A BC) P(B C) P(C)? = P(A) P(B) P(C) 19
33 Notation P (AB) P (A B) P (A B) P (A = a, B = b) P (A = a B = b) P (A) = b P (A, B = b) P (A = a) = b P (A = a, B = b) 20
34 Discrete vs Continuous RV s Discrete: Prob Mass Fn 1 P (X) [0, 1] x V alues X P (X = x) =1 2 3 Continuous: Prob Density Fn There is no probability at one point! P (X [a, b]) = p(x)dx =1 b a Quiz: can p(x)>1? p(x)dx p Beta (x;4, 2) x
35 Discrete vs Continuous RV s Discrete: Prob Mass Fn 1 P (X) [0, 1] x V alues X P (X = x) =1 2 3 Continuous: Prob Density Fn There is no probability at one point! P (X [a, b]) = p(x)dx =1 p(x) [0, ) b a p(x)dx p Beta (x;4, 2) x
36 Bayes Rule Want P(H E) but only have P(E H) P (H E)P (E) =P (E H)P (H) H: have disease? P (H E) = P (E H)P (H) P (E) E: outcome of medical test Why? e.g. P(E H) has been measured, or H causes E... Likelihood Prior P (H E) = P (E H)P (H) P (E) Posterior Normalizer 22 Pierre-Simon Laplace Rev. Thomas Bayes c
37 Bayes Rule and its pesky denominator Likelihood Prior P (H E) = P (E H)P (H) P (E) = P (E h)p (H) h P (E h )P (h ) P (H E) = 1 Z P (E H)P (H) Z: whatever lets the posterior, when summed across h, to sum to 1 Zustandssumme, "sum over states" P (H E) P (E H)P (H) Proportional to Zustandssumme, "sum over states" -- from 19th century statistical physics. (Called a partition function for certain types of models, Markov random fields.) Unnormalized posterior By itself does not sum to 1! Computing this can be very difficult in high dimensions -- tons of work in statistical machine learning is essentially devoted to computing Z or variants of it. 23
38 Bayes Rule: Discrete Sum to 1? a b c P (H = h) Prior Multiply P (E H = h) Likelihood P (E H = h)p (H = h) Unnorm. Posterior Normalize 1 Z P (E H = h)p (H = h) Posterior 24
39 Bayes Rule: Discrete [0.2, 0.2, 0.6] a b c P (H = h) Prior Sum to 1? Yes Multiply P (E H = h) Likelihood P (E H = h)p (H = h) Unnorm. Posterior Normalize 1 Z P (E H = h)p (H = h) Posterior 24
40 Bayes Rule: Discrete [0.2, 0.2, 0.6] a b c P (H = h) Prior Sum to 1? Yes Multiply [0.2, 0.05, 0.05] P (E H = h) Likelihood No P (E H = h)p (H = h) Unnorm. Posterior Normalize 1 Z P (E H = h)p (H = h) Posterior 24
41 Bayes Rule: Discrete [0.2, 0.2, 0.6] a b c P (H = h) Prior Sum to 1? Yes Multiply [0.2, 0.05, 0.05] P (E H = h) Likelihood No [0.04, 0.01, 0.03] P (E H = h)p (H = h) Unnorm. Posterior No Normalize 1 Z P (E H = h)p (H = h) Posterior 24
42 Bayes Rule: Discrete [0.2, 0.2, 0.6] a b c P (H = h) Prior Sum to 1? Yes Multiply [0.2, 0.05, 0.05] P (E H = h) Likelihood No [0.04, 0.01, 0.03] P (E H = h)p (H = h) Unnorm. Posterior No Normalize [0.500, 0.125, 0.375] 1 Z P (E H = h)p (H = h) Posterior Yes 24
43 Bayes Rule: Discrete, uniform prior [0.33, 0.33, 0.33] a b c P (H = h) Prior Sum to 1? Yes Multiply [0.2, 0.05, 0.05] P (E H = h) Likelihood No [0.066, 0.016, 0.016] P (E H = h)p (H = h) Unnorm. Posterior No Normalize [0.66, 0.16, 0.16] 1 Z P (E H = h)p (H = h) Posterior Yes 25
44 Bayes Rule: Discrete, uniform prior [0.33, 0.33, 0.33] a b c Uniform distribution: Uninformative prior P (H = h) Prior Sum to 1? Yes Multiply [0.2, 0.05, 0.05] P (E H = h) Likelihood No Normalize [0.066, 0.016, 0.016] [0.66, 0.16, 0.16] 1 Z P (E H = h)p (H = h) Unnorm. Posterior Uniform prior implies that posterior is just renormalized likelihood P (E H = h)p (H = h) Posterior No Yes 25
45 Bayes Rule: Continuous, with prior. dprior Beta(2,2) P (H = h) Prior Sum to 1? Yes Multiply lik L(theta) 0 heads 5 tails P (E H = h) Likelihood No Normalize lik * dprior rior/sum(lik * dprior) P (E H = h)p (H = h) Unnorm. Posterior 1 P (E H = h)p (H = h) Z Posterior 26 No Yes
46 Bayes Rule: Continuous, uniform prior dprior Uniform distribution: Uninformative prior P (H = h) Prior Sum to 1? Yes Multiply lik L(theta) 0 heads 5 tails P (E H = h) Likelihood No Normalize lik * dprior rior/sum(lik * dprior) 1 Z 27 P (E H = h)p (H = h) Unnorm. Posterior Uniform prior implies that posterior is just renormalized likelihood P (E H = h)p (H = h) Posterior No Yes
47 Bayes Rule: More data swamps priors. 1 head, 1 tail 3 heads, 1 tail 8 heads, 1 tail Prior dprior dprior dprior p Beta (θ;2, 2) 1 θ(1 θ) Z Likelihood lik lik lik θ #H (1 θ) #T lik * dprior lik * dprior lik * dprior Posterior prior/sum(lik * dprior) prior/sum(lik * dprior) prior/sum(lik * dprior) 1 Z θ#h+1 #T +1 (1 θ)
48 Posterior prob Bayes Rule: odds-ratio form Likelihood Prior prob P (H E) =P (E H)P (H)/P (E) [0, 1] [0, 1] [0, 1] P (h 1 E) P (h 2 E) = P (E h 1) P (E h 2 ) P (h 1 ) P (h 2 ) Posterior Odds Likelihood Ratio Prior Odds [0, inf) [0, inf) [0, inf) 29
49 Bayes Rule: odds-ratio form Posterior prob Likelihood Prior prob P (H E) =P (E H)P (H)/P (E) [0, 1] [0, 1] [0, 1] Binary prob rescalings Range coin flip prob p [0,1] 1/2 P (h 1 E) P (h 2 E) = P (E h 1) P (E h 2 ) Posterior Odds Likelihood Ratio P (h 1 ) P (h 2 ) Prior Odds [0, inf) [0, inf) [0, inf) log-prob log(p) (-inf, 0] log(1/2) odds p / (1-p) [0, +inf) 1 log-odds logit log[p/(1-p)] (-inf, +inf) 0 29
50 i.i.d. likelihood Bernoulli random process: X_i comes out H at prob θ, else comes out T. Likelihood of θ : L(θ) = P (X 1,X 2,X 3,...,X n θ) =??? Prob of data under parameter θ = i = i P (X i θ) {θ if X i =H else 1 θ} Independent... and Identically Distributed! = i θ 1{X i=h} (1 θ) 1{X i=t } = θ #{X i=h} (1 θ) #{X i=t } 30
51 Maximum Likelihood If we want to believe a single point value of theta, what should we believe? How about the maximum likelihood value of theta: that makes the data most probable L(θ) =θ #{X i=h} (1 θ) #{X i=t } arg max θ arg max θ L(θ) log L(θ) Most important trick: Take the log! Same argmax since monotonic. log(x) #{X i = H} log θ +#{X i = T } log(1 θ) x 31
52 Maximum Posterior (MAP) p(θ D) =P (D θ)p(θ)/p (D) p(θ D) P (D θ)p(θ) arg max θ p(θ D) = arg max θ P (D θ)p(θ) dprior P (H = h) Prior lik P (E H = h) Likelihood Another trick: unnormalized is good enough for MAP lik * dprior P (E H = h)p (H = h) Unnorm. Posterior rior/sum(lik * dprior) 1 Z P (E H = h)p (H = h) Posterior
53 33
54 Entropy H(X) = x p(x) log p(x) H(X) =E[log p(x)] 34
Machine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationProbability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Probability Basics These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationBayesian Reasoning. Adapted from slides by Tim Finin and Marie desjardins.
Bayesian Reasoning Adapted from slides by Tim Finin and Marie desjardins. 1 Outline Probability theory Bayesian inference From the joint distribution Using independence/factoring From sources of evidence
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationProbabilistic and Bayesian Machine Learning
Probabilistic and Bayesian Machine Learning Lecture 1: Introduction to Probabilistic Modelling Yee Whye Teh ywteh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit University College London Why a
More informationSTAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01
STAT 499/962 Topics in Statistics Bayesian Inference and Decision Theory Jan 2018, Handout 01 Nasser Sadeghkhani a.sadeghkhani@queensu.ca There are two main schools to statistical inference: 1-frequentist
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationProbability and Inference
Deniz Yuret ECOE 554 Lecture 3 Outline 1 Probabilities and ensembles 2 3 Ensemble An ensemble X is a triple (x, A X, P X ), where the outcome x is the value of a random variable, which takes on one of
More informationCS 331: Artificial Intelligence Probability I. Dealing with Uncertainty
CS 331: Artificial Intelligence Probability I Thanks to Andrew Moore for some course material 1 Dealing with Uncertainty We want to get to the point where we can reason with uncertainty This will require
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationBasic Probability and Statistics
Basic Probability and Statistics Yingyu Liang yliang@cs.wisc.edu Computer Sciences Department University of Wisconsin, Madison [based on slides from Jerry Zhu, Mark Craven] slide 1 Reasoning with Uncertainty
More informationComputational Biology Lecture #3: Probability and Statistics. Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept
Computational Biology Lecture #3: Probability and Statistics Bud Mishra Professor of Computer Science, Mathematics, & Cell Biology Sept 26 2005 L2-1 Basic Probabilities L2-2 1 Random Variables L2-3 Examples
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationMachine Learning CMPT 726 Simon Fraser University. Binomial Parameter Estimation
Machine Learning CMPT 726 Simon Fraser University Binomial Parameter Estimation Outline Maximum Likelihood Estimation Smoothed Frequencies, Laplace Correction. Bayesian Approach. Conjugate Prior. Uniform
More informationComputational Perception. Bayesian Inference
Computational Perception 15-485/785 January 24, 2008 Bayesian Inference The process of probabilistic inference 1. define model of problem 2. derive posterior distributions and estimators 3. estimate parameters
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationCS 4649/7649 Robot Intelligence: Planning
CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides
More informationAdvanced Probabilistic Modeling in R Day 1
Advanced Probabilistic Modeling in R Day 1 Roger Levy University of California, San Diego July 20, 2015 1/24 Today s content Quick review of probability: axioms, joint & conditional probabilities, Bayes
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationUncertainty. Chapter 13
Uncertainty Chapter 13 Outline Uncertainty Probability Syntax and Semantics Inference Independence and Bayes Rule Uncertainty Let s say you want to get to the airport in time for a flight. Let action A
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationBayesian Methods. David S. Rosenberg. New York University. March 20, 2018
Bayesian Methods David S. Rosenberg New York University March 20, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 March 20, 2018 1 / 38 Contents 1 Classical Statistics 2 Bayesian
More informationGenerative Techniques: Bayes Rule and the Axioms of Probability
Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIMAG 2 / MoSIG M1 Second Semester 2016/2017 Lesson 8 3 March 2017 Generative Techniques: Bayes Rule and the Axioms of Probability Generative
More informationWhy Probability? It's the right way to look at the world.
Probability Why Probability? It's the right way to look at the world. Discrete Random Variables We denote discrete random variables with capital letters. A boolean random variable may be either true or
More informationCHAPTER 2 Estimating Probabilities
CHAPTER 2 Estimating Probabilities Machine Learning Copyright c 2017. Tom M. Mitchell. All rights reserved. *DRAFT OF September 16, 2017* *PLEASE DO NOT DISTRIBUTE WITHOUT AUTHOR S PERMISSION* This is
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationProblem Set Fall 2012 Due: Friday Sept. 14, at 4 pm
Problem Set 1 10-601 Fall 2012 Due: Friday Sept. 14, at 4 pm TA: Brendan O Connor (brenocon@cs.cmu.edu) Due Date This is due at Friday Sept. 14, at 4 pm. Hand in a hard copy to Sharon Cavlovich, GHC 8215.
More informationOrigins of Probability Theory
1 16.584: INTRODUCTION Theory and Tools of Probability required to analyze and design systems subject to uncertain outcomes/unpredictability/randomness. Such systems more generally referred to as Experiments.
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationProbability, Entropy, and Inference / More About Inference
Probability, Entropy, and Inference / More About Inference Mário S. Alvim (msalvim@dcc.ufmg.br) Information Theory DCC-UFMG (2018/02) Mário S. Alvim (msalvim@dcc.ufmg.br) Probability, Entropy, and Inference
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationEstimation of reliability parameters from Experimental data (Parte 2) Prof. Enrico Zio
Estimation of reliability parameters from Experimental data (Parte 2) This lecture Life test (t 1,t 2,...,t n ) Estimate θ of f T t θ For example: λ of f T (t)= λe - λt Classical approach (frequentist
More informationBasics of Probability
Basics of Probability Lecture 1 Doug Downey, Northwestern EECS 474 Events Event space E.g. for dice, = {1, 2, 3, 4, 5, 6} Set of measurable events S 2 E.g., = event we roll an even number = {2, 4, 6} S
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationBayesian Inference: What, and Why?
Winter School on Big Data Challenges to Modern Statistics Geilo Jan, 2014 (A Light Appetizer before Dinner) Bayesian Inference: What, and Why? Elja Arjas UH, THL, UiO Understanding the concepts of randomness
More informationLecture 2: Probability and Language Models
Lecture 2: Probability and Language Models Intro to NLP, CS585, Fall 2014 Brendan O Connor (http://brenocon.com) 1 Admin Waitlist Moodle access: Email me if you don t have it Did you get an announcement
More informationPHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric
PHASES OF STATISTICAL ANALYSIS 1. Initial Data Manipulation Assembling data Checks of data quality - graphical and numeric 2. Preliminary Analysis: Clarify Directions for Analysis Identifying Data Structure:
More informationBayesian Learning. Instructor: Jesse Davis
Bayesian Learning Instructor: Jesse Davis 1 Announcements Homework 1 is due today Homework 2 is out Slides for this lecture are online We ll review some of homework 1 next class Techniques for efficient
More informationSTAT J535: Introduction
David B. Hitchcock E-Mail: hitchcock@stat.sc.edu Spring 2012 Chapter 1: Introduction to Bayesian Data Analysis Bayesian statistical inference uses Bayes Law (Bayes Theorem) to combine prior information
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationReview: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler
Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation
More informationShould all Machine Learning be Bayesian? Should all Bayesian models be non-parametric?
Should all Machine Learning be Bayesian? Should all Bayesian models be non-parametric? Zoubin Ghahramani Department of Engineering University of Cambridge, UK zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationBasic Probabilistic Reasoning SEG
Basic Probabilistic Reasoning SEG 7450 1 Introduction Reasoning under uncertainty using probability theory Dealing with uncertainty is one of the main advantages of an expert system over a simple decision
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationBayesian Networks. Distinguished Prof. Dr. Panos M. Pardalos
Distinguished Prof. Dr. Panos M. Pardalos Center for Applied Optimization Department of Industrial & Systems Engineering Computer & Information Science & Engineering Department Biomedical Engineering Program,
More informationAn AI-ish view of Probability, Conditional Probability & Bayes Theorem
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More information10/18/2017. An AI-ish view of Probability, Conditional Probability & Bayes Theorem. Making decisions under uncertainty.
An AI-ish view of Probability, Conditional Probability & Bayes Theorem Review: Uncertainty and Truth Values: a mismatch Let action A t = leave for airport t minutes before flight. Will A 15 get me there
More informationLecture 2: Probability, Naive Bayes
Lecture 2: Probability, Naive Bayes CS 585, Fall 205 Introduction to Natural Language Processing http://people.cs.umass.edu/~brenocon/inlp205/ Brendan O Connor Today Probability Review Naive Bayes classification
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 8: Frank Keller School of Informatics University of Edinburgh keller@inf.ed.ac.uk Based on slides by Sharon Goldwater October 14, 2016 Frank Keller Computational
More informationSources of Uncertainty
Probability Basics Sources of Uncertainty The world is a very uncertain place... Uncertain inputs Missing data Noisy data Uncertain knowledge Mul>ple causes lead to mul>ple effects Incomplete enumera>on
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationCPSC340. Probability. Nando de Freitas September, 2012 University of British Columbia
CPSC340 Probability Nando de Freitas September, 2012 University of British Columbia Outline of the lecture This lecture is intended at revising probabilistic concepts that play an important role in the
More informationProbabilistic and Bayesian Analytics
Probabilistic and Bayesian Analytics Note to other teachers and users of these slides. Andrew would be delighted if you found this source material useful in giving your own lectures. Feel free to use these
More informationIntro to Bayesian Methods
Intro to Bayesian Methods Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601 Lecture 1 1 Course Webpage Syllabus LaTeX reference manual R markdown reference manual Please come to office
More informationProbability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides
Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one
More informationIntroduction into Bayesian statistics
Introduction into Bayesian statistics Maxim Kochurov EF MSU November 15, 2016 Maxim Kochurov Introduction into Bayesian statistics EF MSU 1 / 7 Content 1 Framework Notations 2 Difference Bayesians vs Frequentists
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationProbability is related to uncertainty and not (only) to the results of repeated experiments
Uncertainty probability Probability is related to uncertainty and not (only) to the results of repeated experiments G. D Agostini, Probabilità e incertezze di misura - Parte 1 p. 40 Uncertainty probability
More informationBayesian Statistics. State University of New York at Buffalo. From the SelectedWorks of Joseph Lucke. Joseph F. Lucke
State University of New York at Buffalo From the SelectedWorks of Joseph Lucke 2009 Bayesian Statistics Joseph F. Lucke Available at: https://works.bepress.com/joseph_lucke/6/ Bayesian Statistics Joseph
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More informationDecision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation
Decision making and problem solving Lecture 1 Review of basic probability Monte Carlo simulation Why probabilities? Most decisions involve uncertainties Probability theory provides a rigorous framework
More informationECE521 Tutorial 11. Topic Review. ECE521 Winter Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides. ECE521 Tutorial 11 / 4
ECE52 Tutorial Topic Review ECE52 Winter 206 Credits to Alireza Makhzani, Alex Schwing, Rich Zemel and TAs for slides ECE52 Tutorial ECE52 Winter 206 Credits to Alireza / 4 Outline K-means, PCA 2 Bayesian
More informationComputational Cognitive Science
Computational Cognitive Science Lecture 9: Bayesian Estimation Chris Lucas (Slides adapted from Frank Keller s) School of Informatics University of Edinburgh clucas2@inf.ed.ac.uk 17 October, 2017 1 / 28
More informationIntroduction to Machine Learning
Introduction to Machine Learning CS4375 --- Fall 2018 Bayesian a Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell 1 Uncertainty Most real-world problems deal with
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationBayesian Analysis (Optional)
Bayesian Analysis (Optional) 1 2 Big Picture There are two ways to conduct statistical inference 1. Classical method (frequentist), which postulates (a) Probability refers to limiting relative frequencies
More informationConsider an experiment that may have different outcomes. We are interested to know what is the probability of a particular set of outcomes.
CMSC 310 Artificial Intelligence Probabilistic Reasoning and Bayesian Belief Networks Probabilities, Random Variables, Probability Distribution, Conditional Probability, Joint Distributions, Bayes Theorem
More informationRandom Variables Chris Piech CS109, Stanford University. Piech, CS106A, Stanford University
Random Variables Chris Piech CS109, Stanford University Piech, CS106A, Stanford University Piech, CS106A, Stanford University Let s Play a Game Game set-up We have a fair coin (come up heads with p = 0.5)
More informationProbability theory basics
Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:
More informationIntroduction to Machine Learning
Uncertainty Introduction to Machine Learning CS4375 --- Fall 2018 a Bayesian Learning Reading: Sections 13.1-13.6, 20.1-20.2, R&N Sections 6.1-6.3, 6.7, 6.9, Mitchell Most real-world problems deal with
More informationWith Question/Answer Animations. Chapter 7
With Question/Answer Animations Chapter 7 Chapter Summary Introduction to Discrete Probability Probability Theory Bayes Theorem Section 7.1 Section Summary Finite Probability Probabilities of Complements
More informationUncertainty and knowledge. Uncertainty and knowledge. Reasoning with uncertainty. Notes
Approximate reasoning Uncertainty and knowledge Introduction All knowledge representation formalism and problem solving mechanisms that we have seen until now are based on the following assumptions: All
More informationUNCERTAINTY. In which we see what an agent should do when not all is crystal-clear.
UNCERTAINTY In which we see what an agent should do when not all is crystal-clear. Outline Uncertainty Probabilistic Theory Axioms of Probability Probabilistic Reasoning Independency Bayes Rule Summary
More informationExercise 1: Basics of probability calculus
: Basics of probability calculus Stig-Arne Grönroos Department of Signal Processing and Acoustics Aalto University, School of Electrical Engineering stig-arne.gronroos@aalto.fi [21.01.2016] Ex 1.1: Conditional
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationChapter 2 Classical Probability Theories
Chapter 2 Classical Probability Theories In principle, those who are not interested in mathematical foundations of probability theory might jump directly to Part II. One should just know that, besides
More informationProbabilistic Reasoning. Doug Downey, Northwestern EECS 348 Spring 2013
Probabilistic Reasoning Doug Downey, Northwestern EECS 348 Spring 2013 Limitations of logic-based agents Qualification Problem Action s preconditions can be complex Action(Grab, t) => Holding(t).unless
More informationBayesian inference. Fredrik Ronquist and Peter Beerli. October 3, 2007
Bayesian inference Fredrik Ronquist and Peter Beerli October 3, 2007 1 Introduction The last few decades has seen a growing interest in Bayesian inference, an alternative approach to statistical inference.
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationProbability Review and Naïve Bayes
Probability Review and Naïve Bayes Instructor: Alan Ritter Some slides adapted from Dan Jurfasky and Brendan O connor What is Probability? The probability the coin will land heads is 0.5 Q: what does this
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationBayesian Networks BY: MOHAMAD ALSABBAGH
Bayesian Networks BY: MOHAMAD ALSABBAGH Outlines Introduction Bayes Rule Bayesian Networks (BN) Representation Size of a Bayesian Network Inference via BN BN Learning Dynamic BN Introduction Conditional
More informationSociology 6Z03 Topic 10: Probability (Part I)
Sociology 6Z03 Topic 10: Probability (Part I) John Fox McMaster University Fall 2014 John Fox (McMaster University) Soc 6Z03: Probability I Fall 2014 1 / 29 Outline: Probability (Part I) Introduction Probability
More informationBasics of Bayesian analysis. Jorge Lillo-Box MCMC Coffee Season 1, Episode 5
Basics of Bayesian analysis Jorge Lillo-Box MCMC Coffee Season 1, Episode 5 Disclaimer This presentation is based heavily on Statistics, Data Mining, and Machine Learning in Astronomy: A Practical Python
More informationLecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan
Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability
More informationThe enumeration of all possible outcomes of an experiment is called the sample space, denoted S. E.g.: S={head, tail}
Random Experiment In random experiments, the result is unpredictable, unknown prior to its conduct, and can be one of several choices. Examples: The Experiment of tossing a coin (head, tail) The Experiment
More informationLecture 23 Maximum Likelihood Estimation and Bayesian Inference
Lecture 23 Maximum Likelihood Estimation and Bayesian Inference Thais Paiva STA 111 - Summer 2013 Term II August 7, 2013 1 / 31 Thais Paiva STA 111 - Summer 2013 Term II Lecture 23, 08/07/2013 Lecture
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationTime Series and Dynamic Models
Time Series and Dynamic Models Section 1 Intro to Bayesian Inference Carlos M. Carvalho The University of Texas at Austin 1 Outline 1 1. Foundations of Bayesian Statistics 2. Bayesian Estimation 3. The
More information