Introduction to Probability and Statistics (Continued)

Size: px
Start display at page:

Download "Introduction to Probability and Statistics (Continued)"

Transcription

1 Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science University of otre Dame otre Dame, Indiana, USA URL: August 3, 018

2 Contents Sum and Product Rules of Probability Bayes rule Conditional Probability Independence, Conditional Independence Theorem Random Variables CDF, Mean and Variance Uniform Distribution, Gaussian, The binomial and Bernoulli distributions The multinomial and multinoulli distributions

3 References Following closely Chris Bishops PRML book, Chapter Kevin Murphy s, Machine Learning: A probablistic perspective, Chapter Jaynes, E. T. (003). Probability Theory: The Logic of Science. Cambridge University Press. Bertsekas, D. and J. Tsitsiklis (008). Introduction to Probability. Athena Scientific. nd Edition Wasserman, L. (004). All of statistics. A Concise Course in Statistical Inference. Springer. 3

4 Joint Probability Probability theory provides a consistent framework for the quantification and manipulation of uncertainty. It forms one of the central foundations for pattern recognition and machine learning. The probability that the event X will take the value x i and event Y will take the value y j is written P(X = x i, Y = y j ) and is called the joint probability of X = x i and Y = y j. nij P( X xi, Y y j) n ij Here is the number of times (in trials) that the event X xi, Y y j occurs. Similarly ci is the number of times that X x i occurs. 4

5 The Sum and Product Rules Even complex calculations in probability are simply derived from the sum and product rules of probability. Sum Rule: L P( X x ) P( X x, Y y ) i i j j1 Product Rule: nij nij ci P( X xi, Y y j) c P( Y y X x ) P( X x ) The product rule leads to the Chain Rule: i j i i P( X1: D ) P( X1) P( X X1) P( X 3 X1, X )... P( X D X1: D 1) 5

6 Conditional Probability and Bayes Rule P( X x, Y y) P( X x) P( Y y X x) P( X x Y y) P( Y y) P( X x ') P( Y y X x ') Bayes theorem plays a central role in pattern recognition and machine learning The normalizing factor P(Y) is given as: x' P( Y y) P( X, Y y) P( X x ') P( Y y X x ') X x' 6

7 Example: Generative Classifier P( y c x, ) Here we classify feature vectors x to classes using the above posterior. It is a generative classifier as it specifies how to generate data x using the class-conditional probabilities P( x y c, ) and class priors P( y c ). θ are (unknown) parameters defining these distributions. In a discriminative setting, the posterior directly fitted. x' P( y c ) P( x y c, ) P( y c ' ) P( x y c ', ) P( y c x) is 7

8 Independency, Conditional Probability Two events A and B are independent (written as P( AB) P( A) P( B) A B) if Conditional probability: Probability that A happens provided that B happens, P( A B) P( A B) ote : P( A B) P( A B) PB ( ) Using the above Eqs, we see that for independent events, P( A B) P( A) 8

9 Independence, Conditional Independence X and Y are unconditionally independent or marginally independent, denoted X Y, if we can represent the joint as the product of the two marginals X Y P( X, Y ) P( X ) P( Y ) X and Y are conditionally independent (CI) given Z iff the conditional joint can be written as a product of conditional marginals: X Y Z P( X, Y Z) P( X Z) P( Y Z) 9

10 Pairwise Vs. Mutual Independence Pairwise independence does not imply mutual independence. Consider 4 balls (numbered 1,,3,4) in a box. You draw one at random. Define the following events: ote that the three events are pairwise independent, e.g. 1 However: X : ball 1or is drawn 1 X : ball or 3 is drawn X : ball 1or 3 is drawn 3 X1 X P( X1, X ) P( X1) P( X ) 4 1/ 1/ 1 P( X1, X, X 3) 0, P( X1) P( X ) P( X 3) 8 1/ 1/ 1/ 10

11 Independence, Conditional Independence Consider the following example. Define: Event a = `it will rain tomorrow Event b =`the ground is wet today and Event c =`raining today. c causes both a and b thus given c we don t need to know about b to predict a a b c P( a, b c) P( a c) P( b c) a b c P( a b, c) P( a c) Observing a root node, separates the children! 11

12 Independence, Conditional Independence Assume unconditional independence X Y. Let us assume that X takes 6 values and Y takes 5 values. The cost for defining p(x, Y) is drastically reduced if X Y. The parameters are reduced from 9 (= 30 1) to 9 = = (6 1) + (5 1). a Independence is key to efficient probabilistic modeling (naïve Bayes classifiers, Markov Models, graphical models, etc.). a We subtract one on the counts to account for the sum-to-one probability constraint rule. 1

13 Conditional Independence Theorem X Y Z iff there exist functions g and h such that the following holds X Y Z P( x, y z) g( x, z) h( y, z) x, y, z such that P( z) 0 Independence implies the factorization: X Y Z P( x, y z) P( x z) P( y z) g( x, z) h( y, z) g ( x, z) g ( y, z) Factorization implies independence: P( x, y z) g( x, z) h( y, z) 1 P( x, y z) g( x, z) h( y, z) g( x, z) h( y, z) (1) x, y x, y x y P( x z) g( x, z) h( y, z) g( x, z) h( y, z) () y P( y z) g( x, z) h( y, z) h( y, z) g( x, z) (3) x (,3) P( x z) P( y z) g( x, z) h( y, z) g( x, z) h( y, z) g( x, z) h( y, z) P( x, y z) x y y x (1) 13

14 Independence Relations: Decomposition The following decomposition independence relation holds: Proof: X Y, W Z X Y Z X Y, W Z P( X, Y, W Z) P( X Z) P( Y, W Z) P( X, Y Z) P( X Z) P( Y, w Z) w P( X, Y Z) P( X Z) P( Y, w Z) P( X Z) P( Y Z) X Y Z w (independence) (sum rule) 14

15 Independence Relations: Weak Union The following Weak Union independence relation holds: Proof: X X Y, W Z X Y Z, W Y, W Z P( X, Y, W Z) P( X Z) P( Y, W Z) P( X, Y, W Z) P( Y, W Z) P( X, Y Z, W ) P( X Z) P( X Z) P( Y Z, W ) P( W Z) P( W Z) P( X, Y Z, W ) P( X Z, W ) P( Y Z, W ) X Y Z, W ote that in the last step, we used the decomposition independence relation: X Y, W Z X W Z P( X, W Z) P( X Z) P( W Z) P( X, W Z) P( X Z) P( X Z, W ) P( X Z) P( X Z, W ) PW ( Z) 15

16 Independence Relations: Contraction The following Contraction independence relation holds: Proof: X W Z, Y & X Y Z X Y, W Z X W Y, Z P( X, W Y, Z) P( X Y, Z) P( W Y, Z) ( a) X Y Z P( X Y, Z) P( X Z) ( b) P( X, Y, W Z) P( X, W Y, Z) P( Y Z) P( X Y, Z) P( W Y, Z) P( Y Z) ( a) ( b) X W, Y Z P( X Z) PW (, Y Z) 16

17 Random Variables Define Ω to be a probability space equipped with a probability measure P that measures the probability of events E. Ω contains all possible events in the form of its own subsets. A real valued random variable X is a mapping We call a realization of X. Probability distribution of X: For B, Probability density We often write: x X( ),, 1 P ( B) P X ( B) P X ( ) B X PX ( B) px ( x) dx B p ( x) p( x) X X : 17

18 Cummulative Density Function 0 p x p( x) dx 1 ( ) z F z p x dx ( cummulative density function) b P x ( a, b) p( x) dx F( b) F( a) a The CDF for a random variable X is the function F(x) that returns the probability that X is less than x 18

19 Mean and Variance The expected (mean) value of X is: The variance of X is defined as: X xpx x dx ( ) The standard deviation is often useful defined as var X X X X X std X var[ X ] 19

20 Mean and Variance If T(X) is a function of X (called a statistic), the mean of T(X) is given by The variance of T(X) is also defined as: T ( X ) T ( x) px ( x) dx The k th moment is defined as: (k = 3, skewness, k = 4, kurtosis) T X T X T X T X T X var ( ) ( ) ( ( )) ( ) ( ) k k X [ X ] ( x [ X ]) px ( x) dx 0

21 Expectation For discrete case For continuous case Conditional expectation x x f p( x) f ( x) x f ( ) ( ) p x f x dx f y p( x y) f ( x) x f y p( x y) f ( x) dx 1

22 Expectation The expectation of a random variable is not necessarily the value that we should expect a realization to have. Let [ 1,1] with the probability P being uniform: 1 1 P( I) dx I, I [ 1,1] Consider the following two random variables: X X 1 1 I :[ 1,1], X ( ) 1 0 :[ 1,1], X ( ) 0 0 We can see that X X dx 1 dx 1, X X dx dx although X ( ) 1.

23 Uniform Random Variable Consider the Uniform distribution 1 U ( x a, b) ( a x b) b a What is the CDF of a uniform random variable U(0,1)? x for 0 x 1, PU ( x) 1, for x 1 0, otherwise U ( x a, b) are: a b a ab b ( b a) 3 1 You can show easily that the mean and variance of x a, b, x a, b, var x a, b ote that it is possible for p(x) > 1 but the density still needs to integrate to 1. For example, note that 1 U ( x 0,1/ ) 1x 0, 3

24 The Gaussian Distribution A random variable X is Gaussian or normally distributed X ~ (, ) if: t 1 1 PX t exp ( x ) dx The probability density is: 1 1 x,, exp ( x ) To show that this density is normalized note the following trick: Let I exp ( x ) dx, then : I exp ( y ) exp ( x ) dxdy 1 1 ( ) ( ), : exp exp Thus 1 : I exp 1 u du e 0 Set r x y then I r rdrd r rdr 4

25 The Gaussian Distribution A random variable X ~ (, ) if: X is Gaussian or normally distributed The following can be shown easily with direct integration: 1 1 x,, exp ( x ) exp ( ), 1 1 X x xdx 1 1 exp ( ), var[ ] X x x dx X X The following integrals are useful in these derivations : exp, exp 0, exp u du u u du u u du We often work with the precision of a Gaussian λ = 1/σ. The higher the λ the narrower the distribution is. 5

26 CDF of a Gaussian Plot of the Standard ormal (0,1) and CDF. Run MatLab function gaussplotdemo from Kevin Murphys PMTK PDF CDF x;0, x;0, x ;, (, ) F x z dz 1 F x;, 1 erf z /, z ( x ) / t erf x e dt x 0 6

27 CDF and the Standard ormal Assume X ~ (0,1) Define Φ(x) = P(X < x) = Cumulative Distribution of X ( x ) p( z) dz 1 x z x z exp z dz Assume X ~ (μ, σ ) x F x = P(X < x μ, σ ) = ( ) 7

28 Univariate Gaussian [ X ] var[ X ] μ p( x) 1 exp ( x ) Shorthand: We say X ~ (μ, σ ) to mean X is distributed as a Gaussian with parameters μ and σ. 8

29 Univariate Gaussian Representation of symmetric phenomena without long tails. Inappropriate for skewness, fat tails, multimodality, etc. The popularity of the Gaussian is related to facts as: p( x) 1 ( x ) exp Completely defined in terms of the mean and the variance The Central Limit Theorem shows that the sum of i.i.d. random variables has approximately a Gaussian distribution making it an appropriate choice for modeling noise (limit of additive small effects) The Gaussian distribution makes the least assumptions (maximum entropy) from all distributions with given mean and variance. 9

30 The ormal Model The normal (or Gaussian) distribution 1 1 exp is one of the most studied and one of the most used distributions, x Sample x 1,..., x n from a normal distribution,. In a typical inference problem, we are interested in: Estimation of (μ, σ) Confidence regions on (μ, σ) Test on (μ, σ) and comparison with other samples

31 Datasets: ormal Data ormaldata : Relative changes in reported larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties (Source: FBI) ormal data Matlab implementation From Bayesian Core, J.M. Marin and C.P. Roberts, Chapter (available online) 31

32 Datasets: CMBData CMBdata: Spectral representation of the cosmological microwave background (CMB), i.e. electromagnetic radiation from photons back to 300,000 years after the Big Bang, expressed as difference in apparent temperature from the mean temperature CMBdata Matlab implementation ormal estimation From Bayesian Core, J.M. Marin and C.P. Roberts, Chapter (available online) MLE based estimates of μ and used in constructing the Gaussian shown with the solid line Maximum Likelihood Estimators (MLE) will be discussed in a follow up lecture 3

33 Quantiles Recall that the probability density function p X ( ) of a random variable X is defined as the derivative of the cumulative density function, so that The value y (a such that F(y (α) ) = α is called the a-quantile of the distribution with CDF F. The median is of course F(y 0.5 ) = 0.5 ote that x 0 F( x0 ) px ( x) dx P ( ) p ( x) dx 1 X X One can define tail area probabilities. The shaded regions each contain α/ of the probability mass. For (0, 1), the leftmost cutoff point is 1 (α/), where is the cdf of (0, 1). a/ a/ If α = 0.05, the central interval is 95%, and the left cutoff is 1.96 and the right is Figure generated by quantiledemo from PMTK -1 (a/) 0-1 (1-a/) For (, ), the 95% interval becomes (-1.96, +1.96) often approximated as () 33

34 Binary Variables Consider a coin flipping experiment with heads = 1 and tails = 0. With [0,1] px ( 1 ) px ( 0 ) 1 This defines the Bernoulli distribution as follows: Bern( ) (1 ) x 1 x x Using the indicator function, we can also write this as: Bern ( x ) (1 ) ( x 1) ( x0) 34

35 Bernoulli Distribution Recall that in general [ f ] p( x) f ( x), [ f ] p( x) f ( x) dx x var[ f ] [ f ( x) ] [ f ( x)] For the Bernoulli distribution can easily show from the definitions: [ x] var[ x] (1 ) Bern( ) (1 ) x 1 x x, we [ x] p( x )ln p( x ) ln (1 )ln(1 ) x 0,1 Here H[x] is the entropy of the distribution 35

36 Likelihood Function for Bernoulli Distribution Consider the data set D x1, x,..., x in which we have m heads (x = 1), and m tails (x = 0) The likelihood function takes the form: p xn 1 xn m m ( ) p( xn ) (1 ) D (1 ) n1 n1 36

37 Binomial Distribution Consider the discrete random variable X 0,1,,..., We define the Binomial distribution as follows: Bin( X m, ) (1 ) m m m In our coin flipping experiment, it gives the probability in flips to get m heads with μ being the probability getting heads in one flip. It can be shown (see S. Ross, Introduction to Probability Models) that the limit of the binomial distribution as, is the Poisson(l) distribution., l binomial distribution Bin( 10, 0.5) Matlab Code 37

38 Binomial Distribution 0.35 The Binomial distribution for = 10, and 0.5, 0.9 is shown below using MatLab function binomdistplot from Kevin Murphys PMTK 0.5 = = Bin(, )

39 Mean,Variance of the Binomial Distribution Consider for independent events the mean of the sum is the sum of the means, and the variance of the sum is the sum of the variances. Because m = x x, and for each observation the mean and variance are known from the Bernoulli distribution: [ m] mbin ( m, ) [ x... x ] m0 m0 One can also compute (twice) the identity 1 var[ m] m [ m] Bin( m, ) var[ x... x ] (1 ) m1 [ m], [ m ] by differentiating (1 ) 1 wrt μ. Try it! m m m 1 39

40 Binomial Distribution: ormalization To show that the Binomial is correctly normalized, we use the following identities: Can be shown with direct substitution: Binomial theorem: m (1 x) x (**) m0m This theorem is proved by induction using (*) and noticing: To finally show normalization using (**): 1 (*) n n 1 n 1 1 m m m 1 m m (1 x) x (1 x) x x x x m0m m0 m m0 m m0 m m1 m 1 * m m m 1 1 m 1 x x x 1 x x x m1 m m1 m 1 m1 m m0 m m m (1 ) (1 ) (1 ) 1 1 m m 1 1 m0 m0 m 40

41 Generalization of the Bernoulli Distribution We are now looking at discrete variables that can take on one of K possible mutually exclusive states. The variable is represented by a K-dimensional vector x in which one of the elements x k equals 1, and all remaining elements equal 0: x = 0,0,, 1,0,, 0 T These vectors satisfy: K k 1 x k 1 Let the probability of x k = 1 be denoted as μ k. Then K K K xk ( xk1) ( x ) k k, k 1, k 0 k1 k1 k 1 p where μ = μ 1,, μ K T. 41

42 Multinoulli/Categorical Distribution The distribution is already normalized: K K k x x k 1 k 1 xk p( x ) 1 The mean of the distribution is computed as: x xp( x ) 1,..., K x similar to the result for the Bernoulli distribution. The Multinoulli also known as the Categorical distribution often denoted as (Mu here is the multinomial distribution): x x x 1, Cat Multinoulli Mu The parameter 1 stands to emphasize that we roll a K-sided dice once ( = 1) see next for the multinomial distribution. k T 4

43 Likelihood: Multinoulli Distribution Let us consider a data set becomes: D x,..., 1 x. The likelihood K K xnk K xnk n 1 mk p( D ), where : m x k k k k nk n1 k1 k1 k1 n1 is the # of observations of x k = 1. m k is the sufficient statistic of the distribution. 43

44 MLE Estimate: Multinoulli Distribution To compute the maximum likelihood (MLE) estimate of μ, we maximize an augmented log-likelihood K K K ln p( D ) l k 1 mk ln k l k 1 k 1 k 1 k 1 Setting the derivative wrt μ k equal to zero: Substitution into the constraint K mk l K m m K k K k 1 k 1 1 l k1 l k1 k K K m k 1 k m k m As expected, this is the fraction in the observations of x k = 1 k 44

45 Multinomial Distribution We can also consider the joint distribution of m 1,, m K in observations conditioned on the parameters μ = (μ 1,, μ K ). From the expression for the likelihood given earlier p K mk ( D ) k k 1 the multinomial distribution Mu 1 parameters and μ takes the form: ( m,..., m, ) K with! K m1 m mk p( m, m,..., m,,,..., )... where m 1 K 1 K 1 k k 1 k m1! m!... mk! 45

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. Nicholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of Notre Dame Notre Dame, Indiana, USA Email:

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.

Lecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

Introduction: MLE, MAP, Bayesian reasoning (28/8/13)

Introduction: MLE, MAP, Bayesian reasoning (28/8/13) STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this

More information

Bayesian Inference and MCMC

Bayesian Inference and MCMC Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the

More information

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning

More information

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

Based on slides by Richard Zemel

Based on slides by Richard Zemel CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we

More information

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015

Human-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015 Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Quantitative Biology II Lecture 4: Variational Methods

Quantitative Biology II Lecture 4: Variational Methods 10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

An Introduction to Bayesian Machine Learning

An Introduction to Bayesian Machine Learning 1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems

More information

Week 2. Review of Probability, Random Variables and Univariate Distributions

Week 2. Review of Probability, Random Variables and Univariate Distributions Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference

More information

Data Analysis and Monte Carlo Methods

Data Analysis and Monte Carlo Methods Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Bayesian Machine Learning

Bayesian Machine Learning Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September

More information

Probabilistic Graphical Models (I)

Probabilistic Graphical Models (I) Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random

More information

Probability and statistics; Rehearsal for pattern recognition

Probability and statistics; Rehearsal for pattern recognition Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

UVA CS / Introduc8on to Machine Learning and Data Mining

UVA CS / Introduc8on to Machine Learning and Data Mining UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

Sample Spaces, Random Variables

Sample Spaces, Random Variables Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016

Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

Expectation Propagation Algorithm

Expectation Propagation Algorithm Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

An Introduction to Statistical and Probabilistic Linear Models

An Introduction to Statistical and Probabilistic Linear Models An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning

More information

Bayesian Learning (II)

Bayesian Learning (II) Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

ECE531: Principles of Detection and Estimation Course Introduction

ECE531: Principles of Detection and Estimation Course Introduction ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus

More information

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory

Chapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory Chapter 1 Statistical Reasoning Why statistics? Uncertainty of nature (weather, earth movement, etc. ) Uncertainty in observation/sampling/measurement Variability of human operation/error imperfection

More information

Discrete Probability Distributions

Discrete Probability Distributions Discrete Probability Distributions EGR 260 R. Van Til Industrial & Systems Engineering Dept. Copyright 2013. Robert P. Van Til. All rights reserved. 1 What s It All About? The behavior of many random processes

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into

More information

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Probabilistic Machine Learning

Probabilistic Machine Learning Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent

More information

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen

Lecture 3. Linear Regression II Bastian Leibe RWTH Aachen Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression

More information

Preliminary statistics

Preliminary statistics 1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),

More information

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS EXAM Exam # Math 3342 Summer II, 2 July 2, 2 ANSWERS i pts. Problem. Consider the following data: 7, 8, 9, 2,, 7, 2, 3. Find the first quartile, the median, and the third quartile. Make a box and whisker

More information

Randomized Algorithms

Randomized Algorithms Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

Artificial Intelligence

Artificial Intelligence Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:

More information

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2

Representation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2 Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training

More information

Chapter 3. Chapter 3 sections

Chapter 3. Chapter 3 sections sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional Distributions 3.7 Multivariate Distributions

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information