Introduction to Probability and Statistics (Continued)
|
|
- Dwight Flynn
- 5 years ago
- Views:
Transcription
1 Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science University of otre Dame otre Dame, Indiana, USA URL: August 3, 018
2 Contents Sum and Product Rules of Probability Bayes rule Conditional Probability Independence, Conditional Independence Theorem Random Variables CDF, Mean and Variance Uniform Distribution, Gaussian, The binomial and Bernoulli distributions The multinomial and multinoulli distributions
3 References Following closely Chris Bishops PRML book, Chapter Kevin Murphy s, Machine Learning: A probablistic perspective, Chapter Jaynes, E. T. (003). Probability Theory: The Logic of Science. Cambridge University Press. Bertsekas, D. and J. Tsitsiklis (008). Introduction to Probability. Athena Scientific. nd Edition Wasserman, L. (004). All of statistics. A Concise Course in Statistical Inference. Springer. 3
4 Joint Probability Probability theory provides a consistent framework for the quantification and manipulation of uncertainty. It forms one of the central foundations for pattern recognition and machine learning. The probability that the event X will take the value x i and event Y will take the value y j is written P(X = x i, Y = y j ) and is called the joint probability of X = x i and Y = y j. nij P( X xi, Y y j) n ij Here is the number of times (in trials) that the event X xi, Y y j occurs. Similarly ci is the number of times that X x i occurs. 4
5 The Sum and Product Rules Even complex calculations in probability are simply derived from the sum and product rules of probability. Sum Rule: L P( X x ) P( X x, Y y ) i i j j1 Product Rule: nij nij ci P( X xi, Y y j) c P( Y y X x ) P( X x ) The product rule leads to the Chain Rule: i j i i P( X1: D ) P( X1) P( X X1) P( X 3 X1, X )... P( X D X1: D 1) 5
6 Conditional Probability and Bayes Rule P( X x, Y y) P( X x) P( Y y X x) P( X x Y y) P( Y y) P( X x ') P( Y y X x ') Bayes theorem plays a central role in pattern recognition and machine learning The normalizing factor P(Y) is given as: x' P( Y y) P( X, Y y) P( X x ') P( Y y X x ') X x' 6
7 Example: Generative Classifier P( y c x, ) Here we classify feature vectors x to classes using the above posterior. It is a generative classifier as it specifies how to generate data x using the class-conditional probabilities P( x y c, ) and class priors P( y c ). θ are (unknown) parameters defining these distributions. In a discriminative setting, the posterior directly fitted. x' P( y c ) P( x y c, ) P( y c ' ) P( x y c ', ) P( y c x) is 7
8 Independency, Conditional Probability Two events A and B are independent (written as P( AB) P( A) P( B) A B) if Conditional probability: Probability that A happens provided that B happens, P( A B) P( A B) ote : P( A B) P( A B) PB ( ) Using the above Eqs, we see that for independent events, P( A B) P( A) 8
9 Independence, Conditional Independence X and Y are unconditionally independent or marginally independent, denoted X Y, if we can represent the joint as the product of the two marginals X Y P( X, Y ) P( X ) P( Y ) X and Y are conditionally independent (CI) given Z iff the conditional joint can be written as a product of conditional marginals: X Y Z P( X, Y Z) P( X Z) P( Y Z) 9
10 Pairwise Vs. Mutual Independence Pairwise independence does not imply mutual independence. Consider 4 balls (numbered 1,,3,4) in a box. You draw one at random. Define the following events: ote that the three events are pairwise independent, e.g. 1 However: X : ball 1or is drawn 1 X : ball or 3 is drawn X : ball 1or 3 is drawn 3 X1 X P( X1, X ) P( X1) P( X ) 4 1/ 1/ 1 P( X1, X, X 3) 0, P( X1) P( X ) P( X 3) 8 1/ 1/ 1/ 10
11 Independence, Conditional Independence Consider the following example. Define: Event a = `it will rain tomorrow Event b =`the ground is wet today and Event c =`raining today. c causes both a and b thus given c we don t need to know about b to predict a a b c P( a, b c) P( a c) P( b c) a b c P( a b, c) P( a c) Observing a root node, separates the children! 11
12 Independence, Conditional Independence Assume unconditional independence X Y. Let us assume that X takes 6 values and Y takes 5 values. The cost for defining p(x, Y) is drastically reduced if X Y. The parameters are reduced from 9 (= 30 1) to 9 = = (6 1) + (5 1). a Independence is key to efficient probabilistic modeling (naïve Bayes classifiers, Markov Models, graphical models, etc.). a We subtract one on the counts to account for the sum-to-one probability constraint rule. 1
13 Conditional Independence Theorem X Y Z iff there exist functions g and h such that the following holds X Y Z P( x, y z) g( x, z) h( y, z) x, y, z such that P( z) 0 Independence implies the factorization: X Y Z P( x, y z) P( x z) P( y z) g( x, z) h( y, z) g ( x, z) g ( y, z) Factorization implies independence: P( x, y z) g( x, z) h( y, z) 1 P( x, y z) g( x, z) h( y, z) g( x, z) h( y, z) (1) x, y x, y x y P( x z) g( x, z) h( y, z) g( x, z) h( y, z) () y P( y z) g( x, z) h( y, z) h( y, z) g( x, z) (3) x (,3) P( x z) P( y z) g( x, z) h( y, z) g( x, z) h( y, z) g( x, z) h( y, z) P( x, y z) x y y x (1) 13
14 Independence Relations: Decomposition The following decomposition independence relation holds: Proof: X Y, W Z X Y Z X Y, W Z P( X, Y, W Z) P( X Z) P( Y, W Z) P( X, Y Z) P( X Z) P( Y, w Z) w P( X, Y Z) P( X Z) P( Y, w Z) P( X Z) P( Y Z) X Y Z w (independence) (sum rule) 14
15 Independence Relations: Weak Union The following Weak Union independence relation holds: Proof: X X Y, W Z X Y Z, W Y, W Z P( X, Y, W Z) P( X Z) P( Y, W Z) P( X, Y, W Z) P( Y, W Z) P( X, Y Z, W ) P( X Z) P( X Z) P( Y Z, W ) P( W Z) P( W Z) P( X, Y Z, W ) P( X Z, W ) P( Y Z, W ) X Y Z, W ote that in the last step, we used the decomposition independence relation: X Y, W Z X W Z P( X, W Z) P( X Z) P( W Z) P( X, W Z) P( X Z) P( X Z, W ) P( X Z) P( X Z, W ) PW ( Z) 15
16 Independence Relations: Contraction The following Contraction independence relation holds: Proof: X W Z, Y & X Y Z X Y, W Z X W Y, Z P( X, W Y, Z) P( X Y, Z) P( W Y, Z) ( a) X Y Z P( X Y, Z) P( X Z) ( b) P( X, Y, W Z) P( X, W Y, Z) P( Y Z) P( X Y, Z) P( W Y, Z) P( Y Z) ( a) ( b) X W, Y Z P( X Z) PW (, Y Z) 16
17 Random Variables Define Ω to be a probability space equipped with a probability measure P that measures the probability of events E. Ω contains all possible events in the form of its own subsets. A real valued random variable X is a mapping We call a realization of X. Probability distribution of X: For B, Probability density We often write: x X( ),, 1 P ( B) P X ( B) P X ( ) B X PX ( B) px ( x) dx B p ( x) p( x) X X : 17
18 Cummulative Density Function 0 p x p( x) dx 1 ( ) z F z p x dx ( cummulative density function) b P x ( a, b) p( x) dx F( b) F( a) a The CDF for a random variable X is the function F(x) that returns the probability that X is less than x 18
19 Mean and Variance The expected (mean) value of X is: The variance of X is defined as: X xpx x dx ( ) The standard deviation is often useful defined as var X X X X X std X var[ X ] 19
20 Mean and Variance If T(X) is a function of X (called a statistic), the mean of T(X) is given by The variance of T(X) is also defined as: T ( X ) T ( x) px ( x) dx The k th moment is defined as: (k = 3, skewness, k = 4, kurtosis) T X T X T X T X T X var ( ) ( ) ( ( )) ( ) ( ) k k X [ X ] ( x [ X ]) px ( x) dx 0
21 Expectation For discrete case For continuous case Conditional expectation x x f p( x) f ( x) x f ( ) ( ) p x f x dx f y p( x y) f ( x) x f y p( x y) f ( x) dx 1
22 Expectation The expectation of a random variable is not necessarily the value that we should expect a realization to have. Let [ 1,1] with the probability P being uniform: 1 1 P( I) dx I, I [ 1,1] Consider the following two random variables: X X 1 1 I :[ 1,1], X ( ) 1 0 :[ 1,1], X ( ) 0 0 We can see that X X dx 1 dx 1, X X dx dx although X ( ) 1.
23 Uniform Random Variable Consider the Uniform distribution 1 U ( x a, b) ( a x b) b a What is the CDF of a uniform random variable U(0,1)? x for 0 x 1, PU ( x) 1, for x 1 0, otherwise U ( x a, b) are: a b a ab b ( b a) 3 1 You can show easily that the mean and variance of x a, b, x a, b, var x a, b ote that it is possible for p(x) > 1 but the density still needs to integrate to 1. For example, note that 1 U ( x 0,1/ ) 1x 0, 3
24 The Gaussian Distribution A random variable X is Gaussian or normally distributed X ~ (, ) if: t 1 1 PX t exp ( x ) dx The probability density is: 1 1 x,, exp ( x ) To show that this density is normalized note the following trick: Let I exp ( x ) dx, then : I exp ( y ) exp ( x ) dxdy 1 1 ( ) ( ), : exp exp Thus 1 : I exp 1 u du e 0 Set r x y then I r rdrd r rdr 4
25 The Gaussian Distribution A random variable X ~ (, ) if: X is Gaussian or normally distributed The following can be shown easily with direct integration: 1 1 x,, exp ( x ) exp ( ), 1 1 X x xdx 1 1 exp ( ), var[ ] X x x dx X X The following integrals are useful in these derivations : exp, exp 0, exp u du u u du u u du We often work with the precision of a Gaussian λ = 1/σ. The higher the λ the narrower the distribution is. 5
26 CDF of a Gaussian Plot of the Standard ormal (0,1) and CDF. Run MatLab function gaussplotdemo from Kevin Murphys PMTK PDF CDF x;0, x;0, x ;, (, ) F x z dz 1 F x;, 1 erf z /, z ( x ) / t erf x e dt x 0 6
27 CDF and the Standard ormal Assume X ~ (0,1) Define Φ(x) = P(X < x) = Cumulative Distribution of X ( x ) p( z) dz 1 x z x z exp z dz Assume X ~ (μ, σ ) x F x = P(X < x μ, σ ) = ( ) 7
28 Univariate Gaussian [ X ] var[ X ] μ p( x) 1 exp ( x ) Shorthand: We say X ~ (μ, σ ) to mean X is distributed as a Gaussian with parameters μ and σ. 8
29 Univariate Gaussian Representation of symmetric phenomena without long tails. Inappropriate for skewness, fat tails, multimodality, etc. The popularity of the Gaussian is related to facts as: p( x) 1 ( x ) exp Completely defined in terms of the mean and the variance The Central Limit Theorem shows that the sum of i.i.d. random variables has approximately a Gaussian distribution making it an appropriate choice for modeling noise (limit of additive small effects) The Gaussian distribution makes the least assumptions (maximum entropy) from all distributions with given mean and variance. 9
30 The ormal Model The normal (or Gaussian) distribution 1 1 exp is one of the most studied and one of the most used distributions, x Sample x 1,..., x n from a normal distribution,. In a typical inference problem, we are interested in: Estimation of (μ, σ) Confidence regions on (μ, σ) Test on (μ, σ) and comparison with other samples
31 Datasets: ormal Data ormaldata : Relative changes in reported larcenies between 1991 and 1995 (relative to 1991) for the 90 most populous US counties (Source: FBI) ormal data Matlab implementation From Bayesian Core, J.M. Marin and C.P. Roberts, Chapter (available online) 31
32 Datasets: CMBData CMBdata: Spectral representation of the cosmological microwave background (CMB), i.e. electromagnetic radiation from photons back to 300,000 years after the Big Bang, expressed as difference in apparent temperature from the mean temperature CMBdata Matlab implementation ormal estimation From Bayesian Core, J.M. Marin and C.P. Roberts, Chapter (available online) MLE based estimates of μ and used in constructing the Gaussian shown with the solid line Maximum Likelihood Estimators (MLE) will be discussed in a follow up lecture 3
33 Quantiles Recall that the probability density function p X ( ) of a random variable X is defined as the derivative of the cumulative density function, so that The value y (a such that F(y (α) ) = α is called the a-quantile of the distribution with CDF F. The median is of course F(y 0.5 ) = 0.5 ote that x 0 F( x0 ) px ( x) dx P ( ) p ( x) dx 1 X X One can define tail area probabilities. The shaded regions each contain α/ of the probability mass. For (0, 1), the leftmost cutoff point is 1 (α/), where is the cdf of (0, 1). a/ a/ If α = 0.05, the central interval is 95%, and the left cutoff is 1.96 and the right is Figure generated by quantiledemo from PMTK -1 (a/) 0-1 (1-a/) For (, ), the 95% interval becomes (-1.96, +1.96) often approximated as () 33
34 Binary Variables Consider a coin flipping experiment with heads = 1 and tails = 0. With [0,1] px ( 1 ) px ( 0 ) 1 This defines the Bernoulli distribution as follows: Bern( ) (1 ) x 1 x x Using the indicator function, we can also write this as: Bern ( x ) (1 ) ( x 1) ( x0) 34
35 Bernoulli Distribution Recall that in general [ f ] p( x) f ( x), [ f ] p( x) f ( x) dx x var[ f ] [ f ( x) ] [ f ( x)] For the Bernoulli distribution can easily show from the definitions: [ x] var[ x] (1 ) Bern( ) (1 ) x 1 x x, we [ x] p( x )ln p( x ) ln (1 )ln(1 ) x 0,1 Here H[x] is the entropy of the distribution 35
36 Likelihood Function for Bernoulli Distribution Consider the data set D x1, x,..., x in which we have m heads (x = 1), and m tails (x = 0) The likelihood function takes the form: p xn 1 xn m m ( ) p( xn ) (1 ) D (1 ) n1 n1 36
37 Binomial Distribution Consider the discrete random variable X 0,1,,..., We define the Binomial distribution as follows: Bin( X m, ) (1 ) m m m In our coin flipping experiment, it gives the probability in flips to get m heads with μ being the probability getting heads in one flip. It can be shown (see S. Ross, Introduction to Probability Models) that the limit of the binomial distribution as, is the Poisson(l) distribution., l binomial distribution Bin( 10, 0.5) Matlab Code 37
38 Binomial Distribution 0.35 The Binomial distribution for = 10, and 0.5, 0.9 is shown below using MatLab function binomdistplot from Kevin Murphys PMTK 0.5 = = Bin(, )
39 Mean,Variance of the Binomial Distribution Consider for independent events the mean of the sum is the sum of the means, and the variance of the sum is the sum of the variances. Because m = x x, and for each observation the mean and variance are known from the Bernoulli distribution: [ m] mbin ( m, ) [ x... x ] m0 m0 One can also compute (twice) the identity 1 var[ m] m [ m] Bin( m, ) var[ x... x ] (1 ) m1 [ m], [ m ] by differentiating (1 ) 1 wrt μ. Try it! m m m 1 39
40 Binomial Distribution: ormalization To show that the Binomial is correctly normalized, we use the following identities: Can be shown with direct substitution: Binomial theorem: m (1 x) x (**) m0m This theorem is proved by induction using (*) and noticing: To finally show normalization using (**): 1 (*) n n 1 n 1 1 m m m 1 m m (1 x) x (1 x) x x x x m0m m0 m m0 m m0 m m1 m 1 * m m m 1 1 m 1 x x x 1 x x x m1 m m1 m 1 m1 m m0 m m m (1 ) (1 ) (1 ) 1 1 m m 1 1 m0 m0 m 40
41 Generalization of the Bernoulli Distribution We are now looking at discrete variables that can take on one of K possible mutually exclusive states. The variable is represented by a K-dimensional vector x in which one of the elements x k equals 1, and all remaining elements equal 0: x = 0,0,, 1,0,, 0 T These vectors satisfy: K k 1 x k 1 Let the probability of x k = 1 be denoted as μ k. Then K K K xk ( xk1) ( x ) k k, k 1, k 0 k1 k1 k 1 p where μ = μ 1,, μ K T. 41
42 Multinoulli/Categorical Distribution The distribution is already normalized: K K k x x k 1 k 1 xk p( x ) 1 The mean of the distribution is computed as: x xp( x ) 1,..., K x similar to the result for the Bernoulli distribution. The Multinoulli also known as the Categorical distribution often denoted as (Mu here is the multinomial distribution): x x x 1, Cat Multinoulli Mu The parameter 1 stands to emphasize that we roll a K-sided dice once ( = 1) see next for the multinomial distribution. k T 4
43 Likelihood: Multinoulli Distribution Let us consider a data set becomes: D x,..., 1 x. The likelihood K K xnk K xnk n 1 mk p( D ), where : m x k k k k nk n1 k1 k1 k1 n1 is the # of observations of x k = 1. m k is the sufficient statistic of the distribution. 43
44 MLE Estimate: Multinoulli Distribution To compute the maximum likelihood (MLE) estimate of μ, we maximize an augmented log-likelihood K K K ln p( D ) l k 1 mk ln k l k 1 k 1 k 1 k 1 Setting the derivative wrt μ k equal to zero: Substitution into the constraint K mk l K m m K k K k 1 k 1 1 l k1 l k1 k K K m k 1 k m k m As expected, this is the fraction in the observations of x k = 1 k 44
45 Multinomial Distribution We can also consider the joint distribution of m 1,, m K in observations conditioned on the parameters μ = (μ 1,, μ K ). From the expression for the likelihood given earlier p K mk ( D ) k k 1 the multinomial distribution Mu 1 parameters and μ takes the form: ( m,..., m, ) K with! K m1 m mk p( m, m,..., m,,,..., )... where m 1 K 1 K 1 k k 1 k m1! m!... mk! 45
Introduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. Nicholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of Notre Dame Notre Dame, Indiana, USA Email:
More informationAarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationLecture 4. Generative Models for Discrete Data - Part 3. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza.
Lecture 4 Generative Models for Discrete Data - Part 3 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 6, 2017 Luigi Freda ( La Sapienza University) Lecture 4 October 6, 2017 1 / 46 Outline
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationIntroduction: MLE, MAP, Bayesian reasoning (28/8/13)
STA561: Probabilistic machine learning Introduction: MLE, MAP, Bayesian reasoning (28/8/13) Lecturer: Barbara Engelhardt Scribes: K. Ulrich, J. Subramanian, N. Raval, J. O Hollaren 1 Classifiers In this
More informationBayesian Inference and MCMC
Bayesian Inference and MCMC Aryan Arbabi Partly based on MCMC slides from CSC412 Fall 2018 1 / 18 Bayesian Inference - Motivation Consider we have a data set D = {x 1,..., x n }. E.g each x i can be the
More informationCS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro
CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes Matt Gormley Lecture 19 March 20, 2018 1 Midterm Exam Reminders
More informationStatistics for scientists and engineers
Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 18 Oct. 31, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Naïve Bayes Matt Gormley Lecture 18 Oct. 31, 2018 1 Reminders Homework 6: PAC Learning
More informationPATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS Parametric Distributions Basic building blocks: Need to determine given Representation: or? Recall Curve Fitting Binary Variables
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.
School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:
More informationBased on slides by Richard Zemel
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 3: Directed Graphical Models and Latent Variables Based on slides by Richard Zemel Learning outcomes What aspects of a model can we
More informationHuman-Oriented Robotics. Probability Refresher. Kai Arras Social Robotics Lab, University of Freiburg Winter term 2014/2015
Probability Refresher Kai Arras, University of Freiburg Winter term 2014/2015 Probability Refresher Introduction to Probability Random variables Joint distribution Marginalization Conditional probability
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationQuantitative Biology II Lecture 4: Variational Methods
10 th March 2015 Quantitative Biology II Lecture 4: Variational Methods Gurinder Singh Mickey Atwal Center for Quantitative Biology Cold Spring Harbor Laboratory Image credit: Mike West Summary Approximate
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More information3. Review of Probability and Statistics
3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationCS 630 Basic Probability and Information Theory. Tim Campbell
CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationAn Introduction to Bayesian Machine Learning
1 An Introduction to Bayesian Machine Learning José Miguel Hernández-Lobato Department of Engineering, Cambridge University April 8, 2013 2 What is Machine Learning? The design of computational systems
More informationWeek 2. Review of Probability, Random Variables and Univariate Distributions
Week 2 Review of Probability, Random Variables and Univariate Distributions Probability Probability Probability Motivation What use is Probability Theory? Probability models Basis for statistical inference
More informationData Analysis and Monte Carlo Methods
Lecturer: Allen Caldwell, Max Planck Institute for Physics & TUM Recitation Instructor: Oleksander (Alex) Volynets, MPP & TUM General Information: - Lectures will be held in English, Mondays 16-18:00 -
More informationLecture 1: August 28
36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random
More informationBayesian Machine Learning
Bayesian Machine Learning Andrew Gordon Wilson ORIE 6741 Lecture 4 Occam s Razor, Model Construction, and Directed Graphical Models https://people.orie.cornell.edu/andrew/orie6741 Cornell University September
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationProbability and statistics; Rehearsal for pattern recognition
Probability and statistics; Rehearsal for pattern recognition Václav Hlaváč Czech Technical University in Prague Czech Institute of Informatics, Robotics and Cybernetics 166 36 Prague 6, Jugoslávských
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationUVA CS / Introduc8on to Machine Learning and Data Mining
UVA CS 4501-001 / 6501 007 Introduc8on to Machine Learning and Data Mining Lecture 13: Probability and Sta3s3cs Review (cont.) + Naïve Bayes Classifier Yanjun Qi / Jane, PhD University of Virginia Department
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationSample Spaces, Random Variables
Sample Spaces, Random Variables Moulinath Banerjee University of Michigan August 3, 22 Probabilities In talking about probabilities, the fundamental object is Ω, the sample space. (elements) in Ω are denoted
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationSingle Maths B: Introduction to Probability
Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction
More informationSummary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016
8. For any two events E and F, P (E) = P (E F ) + P (E F c ). Summary of basic probability theory Math 218, Mathematical Statistics D Joyce, Spring 2016 Sample space. A sample space consists of a underlying
More informationSTAT2201. Analysis of Engineering & Scientific Data. Unit 3
STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationReview of probability
Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables
More informationExpectation Propagation Algorithm
Expectation Propagation Algorithm 1 Shuang Wang School of Electrical and Computer Engineering University of Oklahoma, Tulsa, OK, 74135 Email: {shuangwang}@ou.edu This note contains three parts. First,
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationAn Introduction to Statistical and Probabilistic Linear Models
An Introduction to Statistical and Probabilistic Linear Models Maximilian Mozes Proseminar Data Mining Fakultät für Informatik Technische Universität München June 07, 2017 Introduction In statistical learning
More informationBayesian Learning (II)
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning (II) Niels Landwehr Overview Probabilities, expected values, variance Basic concepts of Bayesian learning MAP
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More informationMidterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric
More informationMidterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas
Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian
More informationECE531: Principles of Detection and Estimation Course Introduction
ECE531: Principles of Detection and Estimation Course Introduction D. Richard Brown III WPI 22-January-2009 WPI D. Richard Brown III 22-January-2009 1 / 37 Lecture 1 Major Topics 1. Web page. 2. Syllabus
More informationChapter 1 Statistical Reasoning Why statistics? Section 1.1 Basics of Probability Theory
Chapter 1 Statistical Reasoning Why statistics? Uncertainty of nature (weather, earth movement, etc. ) Uncertainty in observation/sampling/measurement Variability of human operation/error imperfection
More informationDiscrete Probability Distributions
Discrete Probability Distributions EGR 260 R. Van Til Industrial & Systems Engineering Dept. Copyright 2013. Robert P. Van Til. All rights reserved. 1 What s It All About? The behavior of many random processes
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University February 1, 2011 Today: Generative discriminative classifiers Linear regression Decomposition of error into
More informationCS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning
CS 446 Machine Learning Fall 206 Nov 0, 206 Bayesian Learning Professor: Dan Roth Scribe: Ben Zhou, C. Cervantes Overview Bayesian Learning Naive Bayes Logistic Regression Bayesian Learning So far, we
More informationEE514A Information Theory I Fall 2013
EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationProbabilistic Machine Learning
Probabilistic Machine Learning Bayesian Nets, MCMC, and more Marek Petrik 4/18/2017 Based on: P. Murphy, K. (2012). Machine Learning: A Probabilistic Perspective. Chapter 10. Conditional Independence Independent
More informationLecture 3. Linear Regression II Bastian Leibe RWTH Aachen
Advanced Machine Learning Lecture 3 Linear Regression II 02.11.2015 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de/ leibe@vision.rwth-aachen.de This Lecture: Advanced Machine Learning Regression
More informationPreliminary statistics
1 Preliminary statistics The solution of a geophysical inverse problem can be obtained by a combination of information from observed data, the theoretical relation between data and earth parameters (models),
More informationEXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS
EXAM Exam # Math 3342 Summer II, 2 July 2, 2 ANSWERS i pts. Problem. Consider the following data: 7, 8, 9, 2,, 7, 2, 3. Find the first quartile, the median, and the third quartile. Make a box and whisker
More informationRandomized Algorithms
Randomized Algorithms Prof. Tapio Elomaa tapio.elomaa@tut.fi Course Basics A new 4 credit unit course Part of Theoretical Computer Science courses at the Department of Mathematics There will be 4 hours
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationMachine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples
Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds
More informationArtificial Intelligence
Artificial Intelligence Probabilities Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: AI systems need to reason about what they know, or not know. Uncertainty may have so many sources:
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationChapter 3. Chapter 3 sections
sections 3.1 Random Variables and Discrete Distributions 3.2 Continuous Distributions 3.4 Bivariate Distributions 3.5 Marginal Distributions 3.6 Conditional Distributions 3.7 Multivariate Distributions
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationToday. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1 Classical Artificial Intelligence
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More information