Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Size: px
Start display at page:

Download "Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell"

Transcription

1 Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

2 Announcements Homework is out! Due: Wednesday, Jan 0, 00 (beginning of class) st Recitation Jan 4, 00 5:00-6:30 pm NSH 305 Probability

3 Probability in Machine Learning Machine Learning tasks involve reasoning under uncertainity Sources of uncertainity/randomness: Noise variability in sensor measurements, partial observability, incorrect labels Finite sample size - Training and test data are randomly drawn instances Hand-written digit recognition Probability quantifies uncertainty! 3

4 Basic Probability Concepts Conceptual or physical, repeatable experiment with random outcome at any trial Roll of dice Nucleotide present at a DNA site Time-space position of an aircraft on a radar screen Sample space S - set of all possible outcomes. (can be finite or infinite.) S {,,3,4,5,6} Event A - any subset of S : S { A,T,C,G} o S { 0, Rmax } { 0, 360 } { 0, + } See, 4 or 6 in a roll observe a "G" at a site UA007 in angular location {45-60 } 4

5 Definition Classical: Probability of an event A is the relative frequency (limiting ratio of number of occurrences of event A to the total number of trials) P(A) lim N NA N E.g. P({}) /6 P({,4,6}) / Sample space S Its area is, P(S) A P(A) - area of the oval Complement of A 5

6 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and A A Area of A can t be smaller than 0 Area of A can t be larger than 6

7 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 probability of no outcome is 0 A Area of A can t be smaller than 0 7

8 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 probability of no outcome is 0 P(S) probability of some outcome is A A Area of A can t be smaller than 0 Area of A can t be larger than 8

9 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 no outcome has 0 probability P(S) some outcome is bound to occur P(A U B) P(A) + P(B) P(A B) probability of union of two events A B Area of A U B Area of A + Area of B Area of A B 9

10 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 no outcome has 0 probability P(S) some outcome is bound to occur P(A U B) P(A) + P(B) P(A B) probability of union of two events Probability space is a sample space equipped with an assignment P(A) to every event A S such that P satisfies the Kolmogorov axioms. 0

11 Theorems from the Axioms 0 P(A) P(ϕ) 0 P(S) P(A U B) P(A) + P(B) P(A B) P( A) - P(A) Proof: P(A U A) P(S) P(A A) P(ϕ) 0 P(A) + P( A) + 0 > P( A) - P(A) A A

12 Theorems from the Axioms 0 P(A) P(ϕ) 0 P(S) P(A U B) P(A) + P(B) P(A B) P(A) P(A B) + P(A B) Proof: P(A) P(A S) P(A (B U B)) P((A B) U (A B)) P(A B) + P(A B) P((A B) (A B)) P(A B) + P(A B) P(ϕ) P(A B) + P(A B) A A B A B

13 Why use probability? There have been many other approaches to handle uncertainty: Fuzzy logic Qualitative reasoning (Qualitative physics) Probability theory is nothing but common sense reduced to calculation Pierre Laplace, 8. Any scheme for combining uncertain information really should obey these axioms Di Finetti 93 - If you gamble based on uncertain beliefs that satisfy these axioms, then you can t be exploited by an opponent 3

14 Random Variable A random variable is a function that associates a unique numerical value X(ω) with every outcome ω S of an experiment. (The value of the r.v. will vary from trial to trial as the experiment is repeated) S ω X(ω) P(X < ) P({ω: X(ω) < }) Discrete r.v.: The outcome of a coin-toss H, T 0 (Binary) The outcome of a dice-roll -6 Continuous r.v.: The location of an aircraft Univariate r.v.: The outcome of a dice-roll -6 Multi-variate r.v.: The time-space position of an aircraft on radar screen X R Θ t 4

15 Discrete Probability Distribution In the discrete case, a probability distribution P on S (and hence on the domain of X ) is an assignment of a non-negative real number P(s) to each s S (or each valid value of x) such that 0 P(Xx) Σ x P(X x) X random variable x value it takes E.g. Bernoulli distribution with parameter θ P(x) θ θ for x 0 for x P(x) θ x ( θ) x 5

16 Discrete Probability Distribution In the discrete case, a probability distribution P on S (and hence on the domain of X ) is an assignment of a non-negative real number P(s) to each s S (or each valid value of x) such that 0 P(Xx) Σ x P(X x) X random variable x value it takes E.g. Multinomial distribution with parameters θ,,θk x x M x K P(x), where x j n n! x θ θ Lθ x!x! Lx K! j x x K K 6

17 Continuous Prob. Distribution A continuous random variable X can assume any value in an interval on the real line or in a region in a high dimensional space X usually corresponds to a real-valued measurements of some property, e.g., length, position, It is not possible to talk about the probability of the random variable assuming a particular value --- P(Xx) 0 Instead, we talk about the probability of the random variable assuming a value within a given interval, or half interval P(X [x,x]) P(X < x) P(X [,x]) 7

18 Continuous Prob. Distribution The probability of the random variable assuming a value within some given interval from x to x is defined to be the area under the graph of the probability density function between x and x. Probability mass: P x ( X [ x x ]) p( x ) dx,, x note that + p( x ) dx. Cumulative distribution function (CDF): F(x) P x ( X x) p(x')dx' Probability density function (PDF): p (x) d F( x) dx Car flow on Liberty Bridge (cooked up!) + p(x)dx ; p(x) 0, x 8

19 What is the intuitive meaning of p(x) If p(x ) a and p(x ) b, That is: then when a value X is sampled from the distribution with density p(x), you are a/b times as likely to find that X is very close to x than that X is very close to x. P(x lim 0 P(x h< X< x h< X< x + h) lim + h) h 0 h x + h p(x)dx p(x p(x ) h ) h x h x + h p(x)dx x h a b 9

20 Continuous Distributions Uniform Probability Density Function p( x ) /( b a ) for a x b 0 elsewhere Normal (Gaussian) Probability Density Function f(x) p( x ) ( x µ ) / σ e πσ The distribution is symmetric, and is often illustrated as a bell-shaped curve. Two parameters, µ (mean) and σ (standard deviation), determine the location and shape of the distribution. Exponential Probability Distribution density : x /µ p( x ) x / µ e, o CDF : P ( x x ) e µ f(x) x Time Between Successive Arrivals (mins.) 0 µ P(x < ) area.4866 x

21 Statistical Characterizations Expectation: the centre of mass, mean value, first moment E(X) x xp(x ) xp(x)dx discrete continuous Variance: the spread Var(X) x [x E(X)] [x E(X)] p(x) p(x)dx discrete continuous

22 Gaussian (Normal) density in D If X N(µ, σ ), the probability density function (pdf) of X is defined as ( x µ ) / σ p ( x ) e E(X) µ πσ var( X) σ Here is how we plot the pdf in matlab xs-3:0.0:3; plot(xs,normpdf(xs,mu,sigma)) Zero mean Large variance Zero mean Small variance Note that a density evaluated at a point can be bigger than!

23 Gaussian CDF If Z N(0, ), the cumulative density function is defined as Φ ( x ) x p( z ) dz x z / e dz π This has no closed form expression, but is built in to most software packages (eg. normcdf in matlab stats toolbox). pdf cdf 3

24 Central limit theorem If (X,X, X n ) are i.i.d. (independent and identically distributed to be covered next) random variables Then define As n infinity, p( X ) X n n i Gaussian with mean E[X i ] and variance Var[X i ]/n X i n n n0 Somewhat of a justification for assuming Gaussian distribution 4

25 Independence Training and test samples typically assumed to be i.i.d. (independent and identically distributed) A and B are independent events if P(A B) P(A) * P(B) Outcome of A has no effect on the outcome of B (and vice versa). E.g. Roll of two die P({},{3}) /6*/6 /36 5

26 Independence A, B and C are pairwise independent events if P(A B) P(A) * P(B) P(A C) P(A) * P(C) P(B C) P(B) * P(C) A, B and C are mutually independent events if, in addition to pairwise independence, P(A B C) P(A) * P(B) * P(C) 6

27 Conditional Probability P(A B) Probability of event A conditioned on event B having occurred P(A B) If P(B) > 0, then P(A B) P(B) E.g. H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ Fraction of people with flu that have a headache Corollary: The Chain Rule P(A B) P(A B) P(B) B A B A If A and B are independent, P(A B) P(A) 7

28 Conditional Independence A and B are independent if P(A B) P(A) * P(B) P(A B) P(A) Outcome of B has no effect on the outcome of A (and vice versa). A and B are conditionally independent given C if P(A B C) P(A C) * P(B C) P(A B,C) P(A C) Outcome of B has no effect on the outcome of A (and vice versa) if C is true. 8

29 Prior and Posterior Distribution Suppose that our propositions have a "causal flow" e.g., F B H Prior or unconditional probabilities of propositions e.g., P(Flu) 0.05 and P(DrinkBeer ) 0. correspond to belief prior to arrival of any (new) evidence Posterior or conditional probabilities of propositions e.g., P(Headache Flu) 0.5 and P(Headache Flu,DrinkBeer ) 0.7 correspond to updated belief after arrival of new evidence Not always useful: P(Headache Flu, Steelers win) 0.5 9

30 Probabilistic Inference H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ One day you wake up with a headache. You come with the following reasoning: "since 50% of flues are associated with headaches, so I must have a chance of coming down with flu Is this reasoning correct? 30

31 Probabilistic Inference H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ The Problem: P(F H)? F F H H 3

32 Probabilistic Inference H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ The Problem: F P(F H) P(F H) P(H) F H H P(H F) P(F) P(H) /8 P(H F) 3

33 The Bayes Rule What we have just did leads to the following general expression: P (A B) P(B A)P(A) P(B) This is Bayes Rule 33

34 Quiz P(H)/0 P(F)/40 P(H F)/ P(F H) /8 Which of the following statement is true? P(F H) P(F H) P( F H) P(F H) P(F H) P( H F) P(F) ( P(H F)) P(F) P( H) P(H) 34

35 More General Forms of Bayes Rule P (A B) P(B A)P(A) P(B) Law of total probability P(B) P(B A) + P(B A) P(B A) P(A) + P(B A) P( A) P(A B) P(B P(B A)P(A) A)P(A) + P(B A)P( A) 35

36 More General Forms of Bayes Rule P(Y y X) P(X Y)p(Y) P(X Y y)p(y y y) P(Y X Z) P(X Y Z)p(Y Z) P(X Z) P(X Y Z)p(Y Z) P(X Y Z)p( Y Z) + P(X Y Z)p( Y Z) E.g. P(Flu Headhead DrankBeer) F B F B F H H F H H 36

37 Joint and Marginal Probabilities A joint probability distribution for a set of RVs (say X,X,X3) gives the probability of every atomic event P(X,X,X3) P(Flu,DrinkBeer) a matrix of values: B B F F P(Flu,DrinkBeer, Headache)? Every question about a domain can be answered by the joint distribution, as we will see later. A marginal probability distribution is the probability of every value that a single RV can take P(X) P(Flu)? 37

38 Inference by enumeration Start with a Joint Distribution Building a Joint Distribution of M3 variables Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have M rows). F B H Prob For each combination of values, say how probable it is. F B Normalized, i.e., sums to H 38

39 Inference with the Joint One you have the JD you can ask for the probability of any atomic event consistent with you query P ( E ) P ( row i ) i E F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 E.g. E {( F, B,H),( F,B,H)} F B H 39

40 Inference with the Joint Compute Marginals P(Flu Headache) P (F H B) + P(F H B) F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 Recall: Law of Total Probability F B H 40

41 Inference with the Joint Compute Marginals P(Headache) P (H F) + P(H F) P (H F B) + P(H F B) + P (H F B) + P(H F B) F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 B F H 4

42 Inference with the Joint Compute Conditionals P( E E ) P( E E P ( E ) i E i E IE ) P( row ) P( row ) i i F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 B F H 4

43 Inference with the Joint Compute Conditionals F B H 0.4 F B H 0. P(Flu Headache) P(Flu Headache) P(Headache) F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 General idea: Compute distribution on query variable by fixing evidence variables and F B summing over hidden variables H 43

44 Where do probability distributions come from? Idea One: Human, Domain Experts Idea Two: Simpler probability facts and some algebra e.g., P(F) P(B) P(H F,B) P(H F, B) F F F F F F F B B B B B B B H H H H H H H F B H 0.05 Use chain rule and independence assumptions to compute joint distribution 44

45 Where do probability distributions come from? Idea Three: Learn them from data! A good chunk of this course is essentially about various ways of learning various forms of them! 45

46 Density Estimation A Density Estimator learns a mapping from a set of attributes to a Probability Often know as parameter estimation if the distribution form is specified Binomial, Gaussian Some important issues: Nature of the data (iid, correlated, ) Objective function (MLE, MAP, ) Algorithm (simple algebra, gradient methods, EM, ) Evaluation scheme (likelihood on test data, predictability, consistency, ) 46

47 Parameter Learning from iid data Goal: estimate distribution parameters θ from a dataset of independent, identically distributed (iid), fully observed, training cases D {x,..., x } Maximum likelihood estimation (MLE). One of the most common estimators. With iid and full-observability assumption, write L(θ) as the likelihood of the data: L( θ) P(D; θ) P(x, x, K, x N; θ) P( x; θ ) P( x ; θ ), K, P( x ; θ ) i P( x i ; θ ) 3. pick the setting of parameters most likely to have generated the data we saw: θ MLE arg max L( θ) θ arg max log L( θ ) θ 47

48 Example : Bernoulli model Data: We observed Niid coin tossing: D {, 0,,, 0} Model: P(x) θ for x 0 θ for x P x x ( x) θ ( θ) How to write the likelihood of a single observation x i? P( ) θ ( θ ) x i x i The likelihood of dataset D {x,,x N }: x i L( θ) P(x, x,..., x N ; θ) N i P(x ; θ) i i θ N xi x i ( θ ( θ) ) i x i xi i #head #tails ( θ ) θ ( θ ) 48

49 MLE Objective function: n h n t l( θ) logl( θ) logθ ( θ) n h logθ+ (N n h We need to maximize this w.r.t. θ Take derivatives wrtθ )log( θ) l θ nh n h θ θ 0 ) θ MLE n h or θ ) MLE x i i Sufficient statistics n, where n Frequency as sample mean The counts, are sufficient statistics of data D h h i x i, 49

50 Example : univariate normal Data: We observed Niid real samples: D{-0., 0,, -5.,, 3} Model: / ( x ) ( πσ ) exp{ ( x µ ) / σ } Log likelihood: l ( θ) P θ (µ,σ ) log L( θ) MLE: take derivative and set to zero: ( x µ ) N N i P(xi) log(πσ ) i i σ Π N l ( / σ ) µ l σ N σ + n ( x µ ) n σ 4 ( xn µ ) n µ σ MLE MLE N N n ( x n µ ML) n x n 50

51 Overfitting Recall that for Bernoulli Distribution, we have θ ) head ML n n head head + n tail What if we tossed too few times so that we saw zero head? We have ) head θ ML 0, and we will predict that the probability of seeing a head next is zero!!! The rescue smoothing : Where n' is know as the pseudo- (imaginary) count θ ) head ML n head n + n ' tail + n + n ' head But can we make this more formal? 5

52 Bayesian Learning The Bayesian Rule: Or equivalently, posterior likelihood prior MAP estimate: θ MAP arg max P( θ θ D) (Belief about coin toss probability) If prior is uniform, MLE MAP 5

53 Bayesian estimation for Bernoulli Beta(α,β) distribution: P( θ) Γ( α+β) θ Γ( α) Γ( β) α ( θ) β B( α, β) θ α ( θ) β Posterior distribution of θ : P( θ D) p(x,..., x θ)p( θ) p(x,..., x ) N n h n t α β n h+α ) n t+β θ ( θ) θ ( θ) θ ( θ N Beta(α+nh,β+nt) Notice the isomorphism of the posterior to the prior, such a prior is called a conjugate prior α and β are hyperparameters (parameters of the prior) and correspond to the number of virtual heads/tails (pseudo counts) 53

54 MAP Posterior distribution of θ : P( θ x p( x,..., x θ ) p( θ ) θ nh nt α β nh+ α,..., x ) ( ) ( ) ( p( x,..., x ) θ θ θ θ θ ) n + β t Maximum a posteriori (MAP) estimation: θ MAP arg max log P( θ θ x,..., x N ) Posterior mean estimation: +α θ h MAP Nn +α+β With enough data, prior is forgotten Beta parameters can be understood as pseudo-counts 54

55 Dirichlet distribution 55

56 Estimating the parameters of a distribution Maximum Likelihood estimation (MLE) Choose value that maximizes the probability of observed data θ MLE arg max P(D θ) θ Maximum a posteriori (MAP) estimation Choose value that is most probable given observed data and prior belief θ MAP arg max P( θ D) θ arg max P(D θ)p( θ) θ 56

57 MLE vs MAP (Frequentist vs Bayesian) Frequentist/MLE approach: θ is unknown constant, estimate from data Bayesian/MAP approach: θ is a random variable, assume a probability distribution Drawbacks MLE: Overfits if dataset is too small MAP: Two people with different priors will end up with different estimates 57

58 Bayesian estimation for normal distribution Normal Prior: Joint probability: ( ) { } 0 τ µ µ πτ µ / ) ( exp ) ( / P Posterior: τ σ σ µ τ σ τ τ σ σ µ N N x N N ~ and, / / / / / / ~ where Sample mean ( ) ( ) ( ) { } 0 τ µ µ πτ µ σ πσ µ / ) ( exp exp ), ( / / N n n N x x P ( ) { } σ µ µ πσ µ ~ / ~ ) ( exp ~ ) ( / x P 58

59 Probability Review What you should know:

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,

More information

Machine Learning

Machine Learning Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier

Machine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

Classical AI and ML research ignored this phenomena Another example

Classical AI and ML research ignored this phenomena Another example Wat is tis? Classical AI and ML researc ignored tis enomena Anoter eamle you want to catc a fligt at 0:00am from Pitt to SF, can I make it if I leave at 8am and take a Marta at Gatec? artial observability

More information

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Naïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824 Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266

More information

Review of Probabilities and Basic Statistics

Review of Probabilities and Basic Statistics Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to

More information

Machine Learning: Probability Theory

Machine Learning: Probability Theory Machine Learning: Probability Theory Prof. Dr. Martin Riedmiller Albert-Ludwigs-University Freiburg AG Maschinelles Lernen Theories p.1/28 Probabilities probabilistic statements subsume different effects

More information

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019

Lecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019 Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation

Decision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation Decision making and problem solving Lecture 1 Review of basic probability Monte Carlo simulation Why probabilities? Most decisions involve uncertainties Probability theory provides a rigorous framework

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

Probability Theory. Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg.

Probability Theory. Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg. Probability Theory Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de Prof. Dr. Martin Riedmiller Machine Learning Lab, University

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting

More information

COS513 LECTURE 8 STATISTICAL CONCEPTS

COS513 LECTURE 8 STATISTICAL CONCEPTS COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions

More information

Recitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014

Recitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014 Recitation Shuang Li, Amir Afsharinejad, Kaushik Patnaik Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014 Probability and Statistics 2 Basic Probability Concepts A sample space S is the set of all possible

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology

Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna

More information

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable

Lecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed

More information

CS 4649/7649 Robot Intelligence: Planning

CS 4649/7649 Robot Intelligence: Planning CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides

More information

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables

Why study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section

More information

Lecture 2: Conjugate priors

Lecture 2: Conjugate priors (Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability

More information

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides

Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one

More information

Origins of Probability Theory

Origins of Probability Theory 1 16.584: INTRODUCTION Theory and Tools of Probability required to analyze and design systems subject to uncertain outcomes/unpredictability/randomness. Such systems more generally referred to as Experiments.

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Probability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com Probability Basics These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these

More information

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler

Review: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics

CS 540: Machine Learning Lecture 2: Review of Probability & Statistics CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()

More information

Basics on Probability. Jingrui He 09/11/2007

Basics on Probability. Jingrui He 09/11/2007 Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability

More information

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008

Readings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008 Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

CSC321 Lecture 18: Learning Probabilistic Models

CSC321 Lecture 18: Learning Probabilistic Models CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic

NPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014

More information

Probability Theory Review

Probability Theory Review Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability

More information

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty

Lecture 10: Introduction to reasoning under uncertainty. Uncertainty Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,

More information

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan

Lecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Single Maths B: Introduction to Probability

Single Maths B: Introduction to Probability Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction

More information

Learning Bayesian network : Given structure and completely observed data

Learning Bayesian network : Given structure and completely observed data Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem

More information

Lecture 1: Basics of Probability

Lecture 1: Basics of Probability Lecture 1: Basics of Probability (Luise-Vitetta, Chapter 8) Why probability in data science? Data acquisition is noisy Sampling/quantization external factors: If you record your voice saying machine learning

More information

Probability theory basics

Probability theory basics Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard

More information

Bayesian RL Seminar. Chris Mansley September 9, 2008

Bayesian RL Seminar. Chris Mansley September 9, 2008 Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Introduction to Machine Learning. Lecture 2

Introduction to Machine Learning. Lecture 2 Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for

More information

Uncertain Knowledge and Bayes Rule. George Konidaris

Uncertain Knowledge and Bayes Rule. George Konidaris Uncertain Knowledge and Bayes Rule George Konidaris gdk@cs.brown.edu Fall 2018 Knowledge Logic Logical representations are based on: Facts about the world. Either true or false. We may not know which.

More information

Lecture 2: Review of Basic Probability Theory

Lecture 2: Review of Basic Probability Theory ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber

Data Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields

More information

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2

Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

6.867 Machine Learning

6.867 Machine Learning 6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.

More information

Gaussians Linear Regression Bias-Variance Tradeoff

Gaussians Linear Regression Bias-Variance Tradeoff Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation

More information

LEARNING WITH BAYESIAN NETWORKS

LEARNING WITH BAYESIAN NETWORKS LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016

More information

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Machine Learning

Machine Learning Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Density Estimation. Seungjin Choi

Density Estimation. Seungjin Choi Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/

More information

COMP 328: Machine Learning

COMP 328: Machine Learning COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang

More information

MLE/MAP + Naïve Bayes

MLE/MAP + Naïve Bayes 10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)

More information

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008

Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize

More information

COMP90051 Statistical Machine Learning

COMP90051 Statistical Machine Learning COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide

More information

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.

Naïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch. School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:

More information

1 INFO Sep 05

1 INFO Sep 05 Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Hierarchical Models & Bayesian Model Selection

Hierarchical Models & Bayesian Model Selection Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014

Machine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014 Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,

More information