Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
|
|
- Penelope Bates
- 6 years ago
- Views:
Transcription
1 Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell
2 Announcements Homework is out! Due: Wednesday, Jan 0, 00 (beginning of class) st Recitation Jan 4, 00 5:00-6:30 pm NSH 305 Probability
3 Probability in Machine Learning Machine Learning tasks involve reasoning under uncertainity Sources of uncertainity/randomness: Noise variability in sensor measurements, partial observability, incorrect labels Finite sample size - Training and test data are randomly drawn instances Hand-written digit recognition Probability quantifies uncertainty! 3
4 Basic Probability Concepts Conceptual or physical, repeatable experiment with random outcome at any trial Roll of dice Nucleotide present at a DNA site Time-space position of an aircraft on a radar screen Sample space S - set of all possible outcomes. (can be finite or infinite.) S {,,3,4,5,6} Event A - any subset of S : S { A,T,C,G} o S { 0, Rmax } { 0, 360 } { 0, + } See, 4 or 6 in a roll observe a "G" at a site UA007 in angular location {45-60 } 4
5 Definition Classical: Probability of an event A is the relative frequency (limiting ratio of number of occurrences of event A to the total number of trials) P(A) lim N NA N E.g. P({}) /6 P({,4,6}) / Sample space S Its area is, P(S) A P(A) - area of the oval Complement of A 5
6 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and A A Area of A can t be smaller than 0 Area of A can t be larger than 6
7 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 probability of no outcome is 0 A Area of A can t be smaller than 0 7
8 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 probability of no outcome is 0 P(S) probability of some outcome is A A Area of A can t be smaller than 0 Area of A can t be larger than 8
9 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 no outcome has 0 probability P(S) some outcome is bound to occur P(A U B) P(A) + P(B) P(A B) probability of union of two events A B Area of A U B Area of A + Area of B Area of A B 9
10 Definition Axiomatic (Kolmogorov): Probability of an event A is a number assigned to this event such that 0 P(A) all probabilities are between 0 and P(ϕ) 0 no outcome has 0 probability P(S) some outcome is bound to occur P(A U B) P(A) + P(B) P(A B) probability of union of two events Probability space is a sample space equipped with an assignment P(A) to every event A S such that P satisfies the Kolmogorov axioms. 0
11 Theorems from the Axioms 0 P(A) P(ϕ) 0 P(S) P(A U B) P(A) + P(B) P(A B) P( A) - P(A) Proof: P(A U A) P(S) P(A A) P(ϕ) 0 P(A) + P( A) + 0 > P( A) - P(A) A A
12 Theorems from the Axioms 0 P(A) P(ϕ) 0 P(S) P(A U B) P(A) + P(B) P(A B) P(A) P(A B) + P(A B) Proof: P(A) P(A S) P(A (B U B)) P((A B) U (A B)) P(A B) + P(A B) P((A B) (A B)) P(A B) + P(A B) P(ϕ) P(A B) + P(A B) A A B A B
13 Why use probability? There have been many other approaches to handle uncertainty: Fuzzy logic Qualitative reasoning (Qualitative physics) Probability theory is nothing but common sense reduced to calculation Pierre Laplace, 8. Any scheme for combining uncertain information really should obey these axioms Di Finetti 93 - If you gamble based on uncertain beliefs that satisfy these axioms, then you can t be exploited by an opponent 3
14 Random Variable A random variable is a function that associates a unique numerical value X(ω) with every outcome ω S of an experiment. (The value of the r.v. will vary from trial to trial as the experiment is repeated) S ω X(ω) P(X < ) P({ω: X(ω) < }) Discrete r.v.: The outcome of a coin-toss H, T 0 (Binary) The outcome of a dice-roll -6 Continuous r.v.: The location of an aircraft Univariate r.v.: The outcome of a dice-roll -6 Multi-variate r.v.: The time-space position of an aircraft on radar screen X R Θ t 4
15 Discrete Probability Distribution In the discrete case, a probability distribution P on S (and hence on the domain of X ) is an assignment of a non-negative real number P(s) to each s S (or each valid value of x) such that 0 P(Xx) Σ x P(X x) X random variable x value it takes E.g. Bernoulli distribution with parameter θ P(x) θ θ for x 0 for x P(x) θ x ( θ) x 5
16 Discrete Probability Distribution In the discrete case, a probability distribution P on S (and hence on the domain of X ) is an assignment of a non-negative real number P(s) to each s S (or each valid value of x) such that 0 P(Xx) Σ x P(X x) X random variable x value it takes E.g. Multinomial distribution with parameters θ,,θk x x M x K P(x), where x j n n! x θ θ Lθ x!x! Lx K! j x x K K 6
17 Continuous Prob. Distribution A continuous random variable X can assume any value in an interval on the real line or in a region in a high dimensional space X usually corresponds to a real-valued measurements of some property, e.g., length, position, It is not possible to talk about the probability of the random variable assuming a particular value --- P(Xx) 0 Instead, we talk about the probability of the random variable assuming a value within a given interval, or half interval P(X [x,x]) P(X < x) P(X [,x]) 7
18 Continuous Prob. Distribution The probability of the random variable assuming a value within some given interval from x to x is defined to be the area under the graph of the probability density function between x and x. Probability mass: P x ( X [ x x ]) p( x ) dx,, x note that + p( x ) dx. Cumulative distribution function (CDF): F(x) P x ( X x) p(x')dx' Probability density function (PDF): p (x) d F( x) dx Car flow on Liberty Bridge (cooked up!) + p(x)dx ; p(x) 0, x 8
19 What is the intuitive meaning of p(x) If p(x ) a and p(x ) b, That is: then when a value X is sampled from the distribution with density p(x), you are a/b times as likely to find that X is very close to x than that X is very close to x. P(x lim 0 P(x h< X< x h< X< x + h) lim + h) h 0 h x + h p(x)dx p(x p(x ) h ) h x h x + h p(x)dx x h a b 9
20 Continuous Distributions Uniform Probability Density Function p( x ) /( b a ) for a x b 0 elsewhere Normal (Gaussian) Probability Density Function f(x) p( x ) ( x µ ) / σ e πσ The distribution is symmetric, and is often illustrated as a bell-shaped curve. Two parameters, µ (mean) and σ (standard deviation), determine the location and shape of the distribution. Exponential Probability Distribution density : x /µ p( x ) x / µ e, o CDF : P ( x x ) e µ f(x) x Time Between Successive Arrivals (mins.) 0 µ P(x < ) area.4866 x
21 Statistical Characterizations Expectation: the centre of mass, mean value, first moment E(X) x xp(x ) xp(x)dx discrete continuous Variance: the spread Var(X) x [x E(X)] [x E(X)] p(x) p(x)dx discrete continuous
22 Gaussian (Normal) density in D If X N(µ, σ ), the probability density function (pdf) of X is defined as ( x µ ) / σ p ( x ) e E(X) µ πσ var( X) σ Here is how we plot the pdf in matlab xs-3:0.0:3; plot(xs,normpdf(xs,mu,sigma)) Zero mean Large variance Zero mean Small variance Note that a density evaluated at a point can be bigger than!
23 Gaussian CDF If Z N(0, ), the cumulative density function is defined as Φ ( x ) x p( z ) dz x z / e dz π This has no closed form expression, but is built in to most software packages (eg. normcdf in matlab stats toolbox). pdf cdf 3
24 Central limit theorem If (X,X, X n ) are i.i.d. (independent and identically distributed to be covered next) random variables Then define As n infinity, p( X ) X n n i Gaussian with mean E[X i ] and variance Var[X i ]/n X i n n n0 Somewhat of a justification for assuming Gaussian distribution 4
25 Independence Training and test samples typically assumed to be i.i.d. (independent and identically distributed) A and B are independent events if P(A B) P(A) * P(B) Outcome of A has no effect on the outcome of B (and vice versa). E.g. Roll of two die P({},{3}) /6*/6 /36 5
26 Independence A, B and C are pairwise independent events if P(A B) P(A) * P(B) P(A C) P(A) * P(C) P(B C) P(B) * P(C) A, B and C are mutually independent events if, in addition to pairwise independence, P(A B C) P(A) * P(B) * P(C) 6
27 Conditional Probability P(A B) Probability of event A conditioned on event B having occurred P(A B) If P(B) > 0, then P(A B) P(B) E.g. H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ Fraction of people with flu that have a headache Corollary: The Chain Rule P(A B) P(A B) P(B) B A B A If A and B are independent, P(A B) P(A) 7
28 Conditional Independence A and B are independent if P(A B) P(A) * P(B) P(A B) P(A) Outcome of B has no effect on the outcome of A (and vice versa). A and B are conditionally independent given C if P(A B C) P(A C) * P(B C) P(A B,C) P(A C) Outcome of B has no effect on the outcome of A (and vice versa) if C is true. 8
29 Prior and Posterior Distribution Suppose that our propositions have a "causal flow" e.g., F B H Prior or unconditional probabilities of propositions e.g., P(Flu) 0.05 and P(DrinkBeer ) 0. correspond to belief prior to arrival of any (new) evidence Posterior or conditional probabilities of propositions e.g., P(Headache Flu) 0.5 and P(Headache Flu,DrinkBeer ) 0.7 correspond to updated belief after arrival of new evidence Not always useful: P(Headache Flu, Steelers win) 0.5 9
30 Probabilistic Inference H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ One day you wake up with a headache. You come with the following reasoning: "since 50% of flues are associated with headaches, so I must have a chance of coming down with flu Is this reasoning correct? 30
31 Probabilistic Inference H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ The Problem: P(F H)? F F H H 3
32 Probabilistic Inference H "having a headache" F "coming down with Flu" P(H)/0 P(F)/40 P(H F)/ The Problem: F P(F H) P(F H) P(H) F H H P(H F) P(F) P(H) /8 P(H F) 3
33 The Bayes Rule What we have just did leads to the following general expression: P (A B) P(B A)P(A) P(B) This is Bayes Rule 33
34 Quiz P(H)/0 P(F)/40 P(H F)/ P(F H) /8 Which of the following statement is true? P(F H) P(F H) P( F H) P(F H) P(F H) P( H F) P(F) ( P(H F)) P(F) P( H) P(H) 34
35 More General Forms of Bayes Rule P (A B) P(B A)P(A) P(B) Law of total probability P(B) P(B A) + P(B A) P(B A) P(A) + P(B A) P( A) P(A B) P(B P(B A)P(A) A)P(A) + P(B A)P( A) 35
36 More General Forms of Bayes Rule P(Y y X) P(X Y)p(Y) P(X Y y)p(y y y) P(Y X Z) P(X Y Z)p(Y Z) P(X Z) P(X Y Z)p(Y Z) P(X Y Z)p( Y Z) + P(X Y Z)p( Y Z) E.g. P(Flu Headhead DrankBeer) F B F B F H H F H H 36
37 Joint and Marginal Probabilities A joint probability distribution for a set of RVs (say X,X,X3) gives the probability of every atomic event P(X,X,X3) P(Flu,DrinkBeer) a matrix of values: B B F F P(Flu,DrinkBeer, Headache)? Every question about a domain can be answered by the joint distribution, as we will see later. A marginal probability distribution is the probability of every value that a single RV can take P(X) P(Flu)? 37
38 Inference by enumeration Start with a Joint Distribution Building a Joint Distribution of M3 variables Make a truth table listing all combinations of values of your variables (if there are M Boolean variables then the table will have M rows). F B H Prob For each combination of values, say how probable it is. F B Normalized, i.e., sums to H 38
39 Inference with the Joint One you have the JD you can ask for the probability of any atomic event consistent with you query P ( E ) P ( row i ) i E F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 E.g. E {( F, B,H),( F,B,H)} F B H 39
40 Inference with the Joint Compute Marginals P(Flu Headache) P (F H B) + P(F H B) F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 Recall: Law of Total Probability F B H 40
41 Inference with the Joint Compute Marginals P(Headache) P (H F) + P(H F) P (H F B) + P(H F B) + P (H F B) + P(H F B) F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 B F H 4
42 Inference with the Joint Compute Conditionals P( E E ) P( E E P ( E ) i E i E IE ) P( row ) P( row ) i i F B H 0.4 F B H 0. F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 B F H 4
43 Inference with the Joint Compute Conditionals F B H 0.4 F B H 0. P(Flu Headache) P(Flu Headache) P(Headache) F B H 0.7 F B H 0. F B H 0.05 F B H 0.05 F B H 0.05 F B H 0.05 General idea: Compute distribution on query variable by fixing evidence variables and F B summing over hidden variables H 43
44 Where do probability distributions come from? Idea One: Human, Domain Experts Idea Two: Simpler probability facts and some algebra e.g., P(F) P(B) P(H F,B) P(H F, B) F F F F F F F B B B B B B B H H H H H H H F B H 0.05 Use chain rule and independence assumptions to compute joint distribution 44
45 Where do probability distributions come from? Idea Three: Learn them from data! A good chunk of this course is essentially about various ways of learning various forms of them! 45
46 Density Estimation A Density Estimator learns a mapping from a set of attributes to a Probability Often know as parameter estimation if the distribution form is specified Binomial, Gaussian Some important issues: Nature of the data (iid, correlated, ) Objective function (MLE, MAP, ) Algorithm (simple algebra, gradient methods, EM, ) Evaluation scheme (likelihood on test data, predictability, consistency, ) 46
47 Parameter Learning from iid data Goal: estimate distribution parameters θ from a dataset of independent, identically distributed (iid), fully observed, training cases D {x,..., x } Maximum likelihood estimation (MLE). One of the most common estimators. With iid and full-observability assumption, write L(θ) as the likelihood of the data: L( θ) P(D; θ) P(x, x, K, x N; θ) P( x; θ ) P( x ; θ ), K, P( x ; θ ) i P( x i ; θ ) 3. pick the setting of parameters most likely to have generated the data we saw: θ MLE arg max L( θ) θ arg max log L( θ ) θ 47
48 Example : Bernoulli model Data: We observed Niid coin tossing: D {, 0,,, 0} Model: P(x) θ for x 0 θ for x P x x ( x) θ ( θ) How to write the likelihood of a single observation x i? P( ) θ ( θ ) x i x i The likelihood of dataset D {x,,x N }: x i L( θ) P(x, x,..., x N ; θ) N i P(x ; θ) i i θ N xi x i ( θ ( θ) ) i x i xi i #head #tails ( θ ) θ ( θ ) 48
49 MLE Objective function: n h n t l( θ) logl( θ) logθ ( θ) n h logθ+ (N n h We need to maximize this w.r.t. θ Take derivatives wrtθ )log( θ) l θ nh n h θ θ 0 ) θ MLE n h or θ ) MLE x i i Sufficient statistics n, where n Frequency as sample mean The counts, are sufficient statistics of data D h h i x i, 49
50 Example : univariate normal Data: We observed Niid real samples: D{-0., 0,, -5.,, 3} Model: / ( x ) ( πσ ) exp{ ( x µ ) / σ } Log likelihood: l ( θ) P θ (µ,σ ) log L( θ) MLE: take derivative and set to zero: ( x µ ) N N i P(xi) log(πσ ) i i σ Π N l ( / σ ) µ l σ N σ + n ( x µ ) n σ 4 ( xn µ ) n µ σ MLE MLE N N n ( x n µ ML) n x n 50
51 Overfitting Recall that for Bernoulli Distribution, we have θ ) head ML n n head head + n tail What if we tossed too few times so that we saw zero head? We have ) head θ ML 0, and we will predict that the probability of seeing a head next is zero!!! The rescue smoothing : Where n' is know as the pseudo- (imaginary) count θ ) head ML n head n + n ' tail + n + n ' head But can we make this more formal? 5
52 Bayesian Learning The Bayesian Rule: Or equivalently, posterior likelihood prior MAP estimate: θ MAP arg max P( θ θ D) (Belief about coin toss probability) If prior is uniform, MLE MAP 5
53 Bayesian estimation for Bernoulli Beta(α,β) distribution: P( θ) Γ( α+β) θ Γ( α) Γ( β) α ( θ) β B( α, β) θ α ( θ) β Posterior distribution of θ : P( θ D) p(x,..., x θ)p( θ) p(x,..., x ) N n h n t α β n h+α ) n t+β θ ( θ) θ ( θ) θ ( θ N Beta(α+nh,β+nt) Notice the isomorphism of the posterior to the prior, such a prior is called a conjugate prior α and β are hyperparameters (parameters of the prior) and correspond to the number of virtual heads/tails (pseudo counts) 53
54 MAP Posterior distribution of θ : P( θ x p( x,..., x θ ) p( θ ) θ nh nt α β nh+ α,..., x ) ( ) ( ) ( p( x,..., x ) θ θ θ θ θ ) n + β t Maximum a posteriori (MAP) estimation: θ MAP arg max log P( θ θ x,..., x N ) Posterior mean estimation: +α θ h MAP Nn +α+β With enough data, prior is forgotten Beta parameters can be understood as pseudo-counts 54
55 Dirichlet distribution 55
56 Estimating the parameters of a distribution Maximum Likelihood estimation (MLE) Choose value that maximizes the probability of observed data θ MLE arg max P(D θ) θ Maximum a posteriori (MAP) estimation Choose value that is most probable given observed data and prior belief θ MAP arg max P( θ D) θ arg max P(D θ)p( θ) θ 56
57 MLE vs MAP (Frequentist vs Bayesian) Frequentist/MLE approach: θ is unknown constant, estimate from data Bayesian/MAP approach: θ is a random variable, assume a probability distribution Drawbacks MLE: Overfits if dataset is too small MAP: Two people with different priors will end up with different estimates 57
58 Bayesian estimation for normal distribution Normal Prior: Joint probability: ( ) { } 0 τ µ µ πτ µ / ) ( exp ) ( / P Posterior: τ σ σ µ τ σ τ τ σ σ µ N N x N N ~ and, / / / / / / ~ where Sample mean ( ) ( ) ( ) { } 0 τ µ µ πτ µ σ πσ µ / ) ( exp exp ), ( / / N n n N x x P ( ) { } σ µ µ πσ µ ~ / ~ ) ( exp ~ ) ( / x P 58
59 Probability Review What you should know:
Machine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University August 30, 2017 Today: Decision trees Overfitting The Big Picture Coming soon Probabilistic learning MLE,
More informationMachine Learning
Machine Learning 10-701 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 13, 2011 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationProbability Theory for Machine Learning. Chris Cremer September 2015
Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares
More informationMachine Learning. Theory of Classification and Nonparametric Classifier. Lecture 2, January 16, What is theoretically the best classifier
Machine Learning 10-701/15 701/15-781, 781, Spring 2008 Theory of Classification and Nonparametric Classifier Eric Xing Lecture 2, January 16, 2006 Reading: Chap. 2,5 CB and handouts Outline What is theoretically
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationRecitation 2: Probability
Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions
More informationProbability and Estimation. Alan Moses
Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.
More informationGrundlagen der Künstlichen Intelligenz
Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability
More informationPreliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com
1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More informationLecture 2: Repetition of probability theory and statistics
Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:
More informationBayesian Models in Machine Learning
Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationClassical AI and ML research ignored this phenomena Another example
Wat is tis? Classical AI and ML researc ignored tis enomena Anoter eamle you want to catc a fligt at 0:00am from Pitt to SF, can I make it if I leave at 8am and take a Marta at Gatec? artial observability
More informationNaïve Bayes. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824
Naïve Bayes Jia-Bin Huang ECE-5424G / CS-5824 Virginia Tech Spring 2019 Administrative HW 1 out today. Please start early! Office hours Chen: Wed 4pm-5pm Shih-Yang: Fri 3pm-4pm Location: Whittemore 266
More informationReview of Probabilities and Basic Statistics
Alex Smola Barnabas Poczos TA: Ina Fiterau 4 th year PhD student MLD Review of Probabilities and Basic Statistics 10-701 Recitations 1/25/2013 Recitation 1: Statistics Intro 1 Overview Introduction to
More informationMachine Learning: Probability Theory
Machine Learning: Probability Theory Prof. Dr. Martin Riedmiller Albert-Ludwigs-University Freiburg AG Maschinelles Lernen Theories p.1/28 Probabilities probabilistic statements subsume different effects
More informationLecture 10: Probability distributions TUESDAY, FEBRUARY 19, 2019
Lecture 10: Probability distributions DANIEL WELLER TUESDAY, FEBRUARY 19, 2019 Agenda What is probability? (again) Describing probabilities (distributions) Understanding probabilities (expectation) Partial
More informationQuick Tour of Basic Probability Theory and Linear Algebra
Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationDecision making and problem solving Lecture 1. Review of basic probability Monte Carlo simulation
Decision making and problem solving Lecture 1 Review of basic probability Monte Carlo simulation Why probabilities? Most decisions involve uncertainties Probability theory provides a rigorous framework
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationProbabilistic modeling. The slides are closely adapted from Subhransu Maji s slides
Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationProbability Theory. Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg.
Probability Theory Prof. Dr. Martin Riedmiller AG Maschinelles Lernen Albert-Ludwigs-Universität Freiburg riedmiller@informatik.uni-freiburg.de Prof. Dr. Martin Riedmiller Machine Learning Lab, University
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 14, 2015 Today: The Big Picture Overfitting Review: probability Readings: Decision trees, overfiting
More informationCOS513 LECTURE 8 STATISTICAL CONCEPTS
COS513 LECTURE 8 STATISTICAL CONCEPTS NIKOLAI SLAVOV AND ANKUR PARIKH 1. MAKING MEANINGFUL STATEMENTS FROM JOINT PROBABILITY DISTRIBUTIONS. A graphical model (GM) represents a family of probability distributions
More informationRecitation. Shuang Li, Amir Afsharinejad, Kaushik Patnaik. Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014
Recitation Shuang Li, Amir Afsharinejad, Kaushik Patnaik Machine Learning CS 7641,CSE/ISYE 6740, Fall 2014 Probability and Statistics 2 Basic Probability Concepts A sample space S is the set of all possible
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationReview (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology
Review (probability, linear algebra) CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adopted from Prof. H.R. Rabiee s and also Prof. R. Gutierrez-Osuna
More informationLecture Notes 1 Probability and Random Variables. Conditional Probability and Independence. Functions of a Random Variable
Lecture Notes 1 Probability and Random Variables Probability Spaces Conditional Probability and Independence Random Variables Functions of a Random Variable Generation of a Random Variable Jointly Distributed
More informationCS 4649/7649 Robot Intelligence: Planning
CS 4649/7649 Robot Intelligence: Planning Probability Primer Sungmoon Joo School of Interactive Computing College of Computing Georgia Institute of Technology S. Joo (sungmoon.joo@cc.gatech.edu) 1 *Slides
More informationWhy study probability? Set theory. ECE 6010 Lecture 1 Introduction; Review of Random Variables
ECE 6010 Lecture 1 Introduction; Review of Random Variables Readings from G&S: Chapter 1. Section 2.1, Section 2.3, Section 2.4, Section 3.1, Section 3.2, Section 3.5, Section 4.1, Section 4.2, Section
More informationLecture 2: Conjugate priors
(Spring ʼ) Lecture : Conjugate priors Julia Hockenmaier juliahmr@illinois.edu Siebel Center http://www.cs.uiuc.edu/class/sp/cs98jhm The binomial distribution If p is the probability of heads, the probability
More informationProbability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides
Probability Review Lecturer: Ji Liu Thank Jerry Zhu for sharing his slides slide 1 Inference with Bayes rule: Example In a bag there are two envelopes one has a red ball (worth $100) and a black ball one
More informationOrigins of Probability Theory
1 16.584: INTRODUCTION Theory and Tools of Probability required to analyze and design systems subject to uncertain outcomes/unpredictability/randomness. Such systems more generally referred to as Experiments.
More informationPart IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015
Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationProbability Basics. Robot Image Credit: Viktoriya Sukhanova 123RF.com
Probability Basics These slides were assembled by Eric Eaton, with grateful acknowledgement of the many others who made their course materials freely available online. Feel free to reuse or adapt these
More informationReview: Probability. BM1: Advanced Natural Language Processing. University of Potsdam. Tatjana Scheffler
Review: Probability BM1: Advanced Natural Language Processing University of Potsdam Tatjana Scheffler tatjana.scheffler@uni-potsdam.de October 21, 2016 Today probability random variables Bayes rule expectation
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationLecture 1: Probability Fundamentals
Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability
More informationCS 540: Machine Learning Lecture 2: Review of Probability & Statistics
CS 540: Machine Learning Lecture 2: Review of Probability & Statistics AD January 2008 AD () January 2008 1 / 35 Outline Probability theory (PRML, Section 1.2) Statistics (PRML, Sections 2.1-2.4) AD ()
More informationBasics on Probability. Jingrui He 09/11/2007
Basics on Probability Jingrui He 09/11/2007 Coin Flips You flip a coin Head with probability 0.5 You flip 100 coins How many heads would you expect Coin Flips cont. You flip a coin Head with probability
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationSome Concepts of Probability (Review) Volker Tresp Summer 2018
Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationCSC321 Lecture 18: Learning Probabilistic Models
CSC321 Lecture 18: Learning Probabilistic Models Roger Grosse Roger Grosse CSC321 Lecture 18: Learning Probabilistic Models 1 / 25 Overview So far in this course: mainly supervised learning Language modeling
More informationPROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS
PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationMachine Learning using Bayesian Approaches
Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes
More informationNPFL108 Bayesian inference. Introduction. Filip Jurčíček. Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic
NPFL108 Bayesian inference Introduction Filip Jurčíček Institute of Formal and Applied Linguistics Charles University in Prague Czech Republic Home page: http://ufal.mff.cuni.cz/~jurcicek Version: 21/02/2014
More informationProbability Theory Review
Probability Theory Review Brendan O Connor 10-601 Recitation Sept 11 & 12, 2012 1 Mathematical Tools for Machine Learning Probability Theory Linear Algebra Calculus Wikipedia is great reference 2 Probability
More informationLecture 10: Introduction to reasoning under uncertainty. Uncertainty
Lecture 10: Introduction to reasoning under uncertainty Introduction to reasoning under uncertainty Review of probability Axioms and inference Conditional probability Probability distributions COMP-424,
More informationLecture 9: Naive Bayes, SVM, Kernels. Saravanan Thirumuruganathan
Lecture 9: Naive Bayes, SVM, Kernels Instructor: Outline 1 Probability basics 2 Probabilistic Interpretation of Classification 3 Bayesian Classifiers, Naive Bayes 4 Support Vector Machines Probability
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationParametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012
Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationSingle Maths B: Introduction to Probability
Single Maths B: Introduction to Probability Overview Lecturer Email Office Homework Webpage Dr Jonathan Cumming j.a.cumming@durham.ac.uk CM233 None! http://maths.dur.ac.uk/stats/people/jac/singleb/ 1 Introduction
More informationLearning Bayesian network : Given structure and completely observed data
Learning Bayesian network : Given structure and completely observed data Probabilistic Graphical Models Sharif University of Technology Spring 2017 Soleymani Learning problem Target: true distribution
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationMachine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li. https://funglee.github.io
Machine Learning Lecture 4: Regularization and Bayesian Statistics Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 207 Overfitting Problem
More informationLecture 1: Basics of Probability
Lecture 1: Basics of Probability (Luise-Vitetta, Chapter 8) Why probability in data science? Data acquisition is noisy Sampling/quantization external factors: If you record your voice saying machine learning
More informationProbability theory basics
Probability theory basics Michael Franke Basics of probability theory: axiomatic definition, interpretation, joint distributions, marginalization, conditional probability & Bayes rule. Random variables:
More informationBayesian Methods: Naïve Bayes
Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 2. MLE, MAP, Bayes classification Barnabás Póczos & Aarti Singh 2014 Spring Administration http://www.cs.cmu.edu/~aarti/class/10701_spring14/index.html Blackboard
More informationBayesian RL Seminar. Chris Mansley September 9, 2008
Bayesian RL Seminar Chris Mansley September 9, 2008 Bayes Basic Probability One of the basic principles of probability theory, the chain rule, will allow us to derive most of the background material in
More informationDept. of Linguistics, Indiana University Fall 2015
L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would
More information3. Review of Probability and Statistics
3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture
More informationIntroduction to Machine Learning. Lecture 2
Introduction to Machine Learning Lecturer: Eran Halperin Lecture 2 Fall Semester Scribe: Yishay Mansour Some of the material was not presented in class (and is marked with a side line) and is given for
More informationUncertain Knowledge and Bayes Rule. George Konidaris
Uncertain Knowledge and Bayes Rule George Konidaris gdk@cs.brown.edu Fall 2018 Knowledge Logic Logical representations are based on: Facts about the world. Either true or false. We may not know which.
More informationLecture 2: Review of Basic Probability Theory
ECE 830 Fall 2010 Statistical Signal Processing instructor: R. Nowak, scribe: R. Nowak Lecture 2: Review of Basic Probability Theory Probabilistic models will be used throughout the course to represent
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 12, 2007 D. Blei ProbStat 01 1 / 42 Who wants to scribe? D. Blei ProbStat 01 2 / 42 Random variable Probability is about
More informationAlgorithms for Uncertainty Quantification
Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example
More informationData Modeling & Analysis Techniques. Probability & Statistics. Manfred Huber
Data Modeling & Analysis Techniques Probability & Statistics Manfred Huber 2017 1 Probability and Statistics Probability and statistics are often used interchangeably but are different, related fields
More informationSome slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2
Logistics CSE 446: Point Estimation Winter 2012 PS2 out shortly Dan Weld Some slides from Carlos Guestrin, Luke Zettlemoyer & K Gajos 2 Last Time Random variables, distributions Marginal, joint & conditional
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More information6.867 Machine Learning
6.867 Machine Learning Problem set 1 Solutions Thursday, September 19 What and how to turn in? Turn in short written answers to the questions explicitly stated, and when requested to explain or prove.
More informationGaussians Linear Regression Bias-Variance Tradeoff
Readings listed in class website Gaussians Linear Regression Bias-Variance Tradeoff Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University January 22 nd, 2007 Maximum Likelihood Estimation
More informationLEARNING WITH BAYESIAN NETWORKS
LEARNING WITH BAYESIAN NETWORKS Author: David Heckerman Presented by: Dilan Kiley Adapted from slides by: Yan Zhang - 2006, Jeremy Gould 2013, Chip Galusha -2014 Jeremy Gould 2013Chip Galus May 6th, 2016
More informationCS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas
CS839: Probabilistic Graphical Models Lecture 7: Learning Fully Observed BNs Theo Rekatsinas 1 Exponential family: a basic building block For a numeric random variable X p(x ) =h(x)exp T T (x) A( ) = 1
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationMachine Learning
Machine Learning 10-601 Tom M. Mitchell Machine Learning Department Carnegie Mellon University January 26, 2015 Today: Bayes Classifiers Conditional Independence Naïve Bayes Readings: Mitchell: Naïve Bayes
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationDensity Estimation. Seungjin Choi
Density Estimation Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro, Nam-gu, Pohang 37673, Korea seungjin@postech.ac.kr http://mlg.postech.ac.kr/
More informationCOMP 328: Machine Learning
COMP 328: Machine Learning Lecture 2: Naive Bayes Classifiers Nevin L. Zhang Department of Computer Science and Engineering The Hong Kong University of Science and Technology Spring 2010 Nevin L. Zhang
More informationMLE/MAP + Naïve Bayes
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University MLE/MAP + Naïve Bayes MLE / MAP Readings: Estimating Probabilities (Mitchell, 2016)
More informationProbability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008
Probability theory for Networks (Part 1) CS 249B: Science of Networks Week 02: Monday, 02/04/08 Daniel Bilar Wellesley College Spring 2008 1 Review We saw some basic metrics that helped us characterize
More informationCOMP90051 Statistical Machine Learning
COMP90051 Statistical Machine Learning Semester 2, 2017 Lecturer: Trevor Cohn 2. Statistical Schools Adapted from slides by Ben Rubinstein Statistical Schools of Thought Remainder of lecture is to provide
More informationNaïve Bayes Introduction to Machine Learning. Matt Gormley Lecture 3 September 14, Readings: Mitchell Ch Murphy Ch.
School of Computer Science 10-701 Introduction to Machine Learning aïve Bayes Readings: Mitchell Ch. 6.1 6.10 Murphy Ch. 3 Matt Gormley Lecture 3 September 14, 2016 1 Homewor 1: due 9/26/16 Project Proposal:
More information1 INFO Sep 05
Events A 1,...A n are said to be mutually independent if for all subsets S {1,..., n}, p( i S A i ) = p(a i ). (For example, flip a coin N times, then the events {A i = i th flip is heads} are mutually
More informationClass 26: review for final exam 18.05, Spring 2014
Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event
More informationHierarchical Models & Bayesian Model Selection
Hierarchical Models & Bayesian Model Selection Geoffrey Roeder Departments of Computer Science and Statistics University of British Columbia Jan. 20, 2016 Contact information Please report any typos or
More informationReview of probability
Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables
More informationMachine Learning. Probability Basics. Marc Toussaint University of Stuttgart Summer 2014
Machine Learning Probability Basics Basic definitions: Random variables, joint, conditional, marginal distribution, Bayes theorem & examples; Probability distributions: Binomial, Beta, Multinomial, Dirichlet,
More information