Introduction to Machine Learning

Size: px
Start display at page:

Download "Introduction to Machine Learning"

Transcription

1 Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA CSE 474/574 1 / 37

2 Outline Introduction to Probability Random Variables Bayes Rule More About Conditional Independence Continuous Random Variables Different Types of Distributions Handling Multivariate Distributions Transformations of Random Variables Information Theory - Introduction Chandola@UB CSE 474/574 2 / 37

3 What is Probability? [3, 1] Probability that a coin will land heads is 50% 1 What does this mean? 1 Dr. Persi Diaconis showed that a coin is 51% likely to land facing the same way up as it is started. Chandola@UB CSE 474/574 3 / 37

4 CSE 474/574 4 / 37

5 Frequentist Interpretation Number of times an event will be observed in n trials Chandola@UB CSE 474/574 5 / 37

6 Frequentist Interpretation Number of times an event will be observed in n trials What if the event can only occur once? My winning the next month s powerball. Polar ice caps melting by year Chandola@UB CSE 474/574 5 / 37

7 Bayesian Interpretation Uncertainty of the event Use for making decisions Should I put in an offer for a sports car? Chandola@UB CSE 474/574 6 / 37

8 What is a Random Variable (X )? Can take any value from X Discrete Random Variable - X is finite/countably finite Categorical?? Continuous Random Variable - X is infinite P(X = x) or P(x) is the probability of X taking value x an event What is a distribution? Examples 1. Coin toss (X = {heads, tails}) 2. Six sided dice (X = {1, 2, 3, 4, 5, 6}) Chandola@UB CSE 474/574 7 / 37

9 Notation, Notation, Notation X - random variable (X if multivariate) x - a specific value taken by the random variable ((x if multivariate)) P(X = x) or P(x) is the probability of the event X = x p(x) is either the probability mass function (discrete) or probability density function (continuous) for the random variable X at x Probability mass (or density) at x Chandola@UB CSE 474/574 8 / 37

10 Basic Rules - Quick Review For two events A and B: P(A B) = P(A) + P(B) P(A B) Joint Probability P(A, B) = P(A B) = P(A B)P(B) Also known as the product rule Conditional Probability P(A B) = P(A,B) P(B) Chandola@UB CSE 474/574 9 / 37

11 Chain Rule of Probability Given D random variables, {X 1, X 2,..., X D } P(X 1, X 2,..., X D ) = P(X 1 )P(X 2 X 1 )P(X 3 X 1, X 2 )... P(X D X 1, X 2,..., X D ) Chandola@UB CSE 474/ / 37

12 Marginal Distribution Given P(A, B) what is P(A)? Sum P(A, B) over all values for B P(A) = b P(A, B) = b P(A B = b)p(b = b) Sum rule Chandola@UB CSE 474/ / 37

13 Bayes Rule or Bayes Theorem Computing P(X = x Y = y): Bayes Theorem P(X = x Y = y) = = P(X = x, Y = y) P(Y = y) P(X = x)p(y = y X = x) x P(X = x )P(Y = y X = x ) Chandola@UB CSE 474/ / 37

14 Example Medical Diagnosis Random event 1: A test is positive or negative (X ) Random event 2: A person has cancer (Y ) yes or no What we know: 1. Test has accuracy of 80% 2. Number of times the test is positive when the person has cancer P(X = 1 Y = 1) = Prior probability of having cancer is 0.4% P(Y = 1) = Question? If I test positive, does it mean that I have 80% rate of cancer? Chandola@UB CSE 474/ / 37

15 Base Rate Fallacy Ignored the prior information What we need is: P(Y = 1 X = 1) =? More information: False positive (alarm) rate for the test P(X = 1 Y = 0) = 0.1 P(Y = 1 X = 1) = P(X = 1 Y = 1)P(Y = 1) P(X = 1 Y = 1)P(Y = 1) + P(X = 1 Y = 0)P(Y = 0) Chandola@UB CSE 474/ / 37

16 Classification Using Bayes Rules Given input example X, find the true class P(Y = c X) Y is the random variable denoting the true class Assuming the class-conditional probability is known Applying Bayes Rule P(Y = c X) = P(X Y = c) P(Y = c)p(x Y = c) c P(Y = c ))P(X Y = c ) Chandola@UB CSE 474/ / 37

17 Independence One random variable does not depend on another A B P(A, B) = P(A)P(B) Joint written as a product of marginals Conditional Independence A B C P(A, B C) = P(A C)P(B C) Chandola@UB CSE 474/ / 37

18 More About Conditional Independence Alice and Bob live in the same town but far away from each other Alice drives to work and Bob takes the bus Event A - Alice comes late to work Event B - Bob comes late to work Event C - A snow storm has hit the town P(A C) - Probability that Alice comes late to work given there is a snowstorm Chandola@UB CSE 474/ / 37

19 More About Conditional Independence Alice and Bob live in the same town but far away from each other Alice drives to work and Bob takes the bus Event A - Alice comes late to work Event B - Bob comes late to work Event C - A snow storm has hit the town P(A C) - Probability that Alice comes late to work given there is a snowstorm Now if I know that Bob has also come late to work, will it change the probability that Alice comes late to work? Chandola@UB CSE 474/ / 37

20 More About Conditional Independence Alice and Bob live in the same town but far away from each other Alice drives to work and Bob takes the bus Event A - Alice comes late to work Event B - Bob comes late to work Event C - A snow storm has hit the town P(A C) - Probability that Alice comes late to work given there is a snowstorm Now if I know that Bob has also come late to work, will it change the probability that Alice comes late to work? What if I do not observe C? Will B have any impact on probability of A happening? Chandola@UB CSE 474/ / 37

21 Continuous Random Variables X is continuous Can take any value How does one define probability? x P(X = x) = 1 Chandola@UB CSE 474/ / 37

22 Continuous Random Variables X is continuous Can take any value How does one define probability? x P(X = x) = 1 Probability that X lies in an interval [a, b]? P(a < X b) = P(x b) P(x a) F (q) = P(x q) is the cumulative distribution function P(a < X b) = F (b) F (a) Chandola@UB CSE 474/ / 37

23 Probability Density Probability Density Function p(x) = x F (x) P(a < X b) = Can p(x) be greater than 1? b a p(x)dx Chandola@UB CSE 474/ / 37

24 Expectation Expected value of a random variable E[X ] What is most likely to happen in terms of X? For discrete variables E[X ] xp(x = x) x X For continuous variables E[X ] xp(x)dx Mean of X (µ) X Chandola@UB CSE 474/ / 37

25 Expectation of Functions of Random Variable Let g(x ) be a function of X If X is discrete: E[g(X )] g(x)p(x = x) x X If X is continuous: E[g(X )] g(x)p(x)dx X Properties E[c] = c, c - constant If X Y, then E[X ] E[Y ] E[X + Y ] = E[X ] + E[Y ] E[aX ] = ae[x ] var[x ] = E[(X µ) 2 ] = E[X 2 ] µ 2 Cov[X, Y ] = E[XY ] E[X ]E[Y ] Jensen s inequality: If ϕ(x ) is convex, ϕ(e[x ]) E[ϕ(X )] Chandola@UB CSE 474/ / 37

26 What is a Probability Distribution? Discrete Binomial,Bernoulli Multinomial, Multinolli Poisson Empirical Continuous Gaussian (Normal) Degenerate pdf Laplace Gamma Beta Pareto Chandola@UB CSE 474/ / 37

27 Binomial Distribution X = Number of heads observed in n coin tosses Parameters: n, θ X Bin(n, θ) Probability mass function (pmf) Bin(k n, θ) ( ) n θ k (1 θ) n k k Bernoulli Distribution Binomial distribution with n = 1 Only one parameter (θ) Chandola@UB CSE 474/ / 37

28 Multinomial Distribution Simulates a K sided die Random variable x = (x 1, x 2,..., x K ) Parameters: n, θ θ R K θ j - probability that j th side shows up ( ) n K Mu(x n, θ) x 1, x 2,..., x K Multinoulli Distribution Multinomial distribution with n = 1 j=1 x is a vector of 0s and 1s with only one bit set to 1 Only one parameter (θ) θ x j j Chandola@UB CSE 474/ / 37

29 Gaussian (Normal) Distribution N (x µ, σ 2 ) 1 2πσ 2 e 1 2σ 2 (x µ) Parameters: 1. µ = E[X ] 2. σ 2 = var[x ] = E[(X µ) 2 ] X N (µ, σ 2 ) p(x = x) = N (µ, σ 2 ) X N (0, 1) X is a standard normal random variable Cumulative distribution function: Φ(x; µ, σ 2 ) x N (z µ, σ 2 )dz Chandola@UB CSE 474/ / 37

30 Joint Probability Distributions Multiple related random variables p(x 1, x 2,..., x D ) for D > 1 variables (X 1, X 2,..., X D ) Discrete random variables? Continuous random variables? What do we measure? Covariance How does X vary with respect to Y For linear relationship: cov[x, Y ] E[(X E[X ])(Y E[Y ])] = E[XY ] E[X ]E[Y ] Chandola@UB CSE 474/ / 37

31 Covariance and Correlation x is a d-dimensional random vector cov[x] E[(X E[X])(X E[X]) ] var[x 1 ] cov[x 1, X 2 ] cov[x 1, X d ] cov[x 2, X 1 ] var[x 2 ] cov[x 2, X d ] = cov[x d, X 1 ] cov[x d, X 2 ] var[x d ] Covariances can be between 0 and Normalized covariance Correlation Chandola@UB CSE 474/ / 37

32 Correlation Pearson Correlation Coefficient cov[x, Y ] corr[x, Y ] var[x ]var[y ] What is corr[x, X ]? 1 corr[x, Y ] 1 When is corr[x, Y ] = 1? Chandola@UB CSE 474/ / 37

33 Correlation Pearson Correlation Coefficient cov[x, Y ] corr[x, Y ] var[x ]var[y ] What is corr[x, X ]? 1 corr[x, Y ] 1 When is corr[x, Y ] = 1? Y = ax + b Chandola@UB CSE 474/ / 37

34 Multivariate Gaussian Distribution Most widely used joint probability distribution N (X µ, Σ) 1 exp (2π) D/2 Σ D/2 [ 1 2 (x µ) Σ 1 (x µ) ] Chandola@UB CSE 474/ / 37

35 Linear Transformations of Random Variables What is the distribution of f (X) (X p())? Linear transformation: Y = a X + b E[Y ]? var[y ]? Y = AX + b E[Y]? cov[y]? The Matrix Cookbook [2] Available on Piazza Chandola@UB CSE 474/ / 37

36 General Transformations f () is not linear Example: X - discrete Y = f (X ) = { 1 if X is even 0 if X is odd Chandola@UB CSE 474/ / 37

37 General Transformations for Continuous Variables For continuous variables, work with cdf F Y (y) P(Y y) = P(f (X ) y) = P(X f 1 (y)) = F X (f 1 (y)) For pdf p Y (y) d dy F Y (y) = d dy F X (f 1 (y)) = dx d dy dx F X (x) = dx dy p X (x) x = f 1 (y) Example Let X be Uniform( 1, 1) Let Y = X 2 p Y (y) = 1 2 y 1 2 Chandola@UB CSE 474/ / 37

38 Monte Carlo Approximation Generate N samples from distribution for X For each sample, x i, i [1, N], compute f (x i ) Use empirical distribution as approximate true distribution Approximate Expectation E[f (X )] = f (x)p(x)dx 1 N N f (x i ) i=1 Chandola@UB CSE 474/ / 37

39 Introduction to Information Theory Quantifying uncertainty of a random variable Entropy H(X ) or H(p) H(X ) K p(x = k) log 2 p(x = k) k= Variable with maximum entropy? Lowest entropy? Chandola@UB CSE 474/ / 37

40 Comparing Two Distributions Kullback-Leibler Divergence (or KL Divergence or relative entropy) KL(p q) K k=1 p(k) log p k q k = p(k) log p(k) k k = H(p) + H(p, q) p(k) log q(k) H(p, q) is the cross-entropy Is KL-divergence symmetric? Important fact: H(p, q) H(p) Chandola@UB CSE 474/ / 37

41 Mutual Information What does learning about one variable X tell us about another, Y? Correlation? Mutual Information I(X ; Y ) = x p(x, y) p(x, y) log p(x)p(y) y I(X ; Y ) = I(Y ; X ) I(X ; Y ) 0, equality iff X Y Chandola@UB CSE 474/ / 37

42 References E. Jaynes and G. Bretthorst. Probability Theory: The Logic of Science. Cambridge University Press Cambridge:, K. B. Petersen and M. S. Pedersen. The matrix cookbook, nov Version L. Wasserman. All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics). Springer, Oct Chandola@UB CSE 474/ / 37

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University

CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University CS 591, Lecture 2 Data Analytics: Theory and Applications Boston University Charalampos E. Tsourakakis January 25rd, 2017 Probability Theory The theory of probability is a system for making better guesses.

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability Theory. Patrick Lam

Probability Theory. Patrick Lam Probability Theory Patrick Lam Outline Probability Random Variables Simulation Important Distributions Discrete Distributions Continuous Distributions Most Basic Definition of Probability: number of successes

More information

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS

PROBABILITY AND INFORMATION THEORY. Dr. Gjergji Kasneci Introduction to Information Retrieval WS PROBABILITY AND INFORMATION THEORY Dr. Gjergji Kasneci Introduction to Information Retrieval WS 2012-13 1 Outline Intro Basics of probability and information theory Probability space Rules of probability

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Algorithms for Uncertainty Quantification Tobias Neckel, Ionuț-Gabriel Farcaș Lehrstuhl Informatik V Summer Semester 2017 Lecture 2: Repetition of probability theory and statistics Example: coin flip Example

More information

Recitation 2: Probability

Recitation 2: Probability Recitation 2: Probability Colin White, Kenny Marino January 23, 2018 Outline Facts about sets Definitions and facts about probability Random Variables and Joint Distributions Characteristics of distributions

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016

Lecture 3. Probability - Part 2. Luigi Freda. ALCOR Lab DIAG University of Rome La Sapienza. October 19, 2016 Lecture 3 Probability - Part 2 Luigi Freda ALCOR Lab DIAG University of Rome La Sapienza October 19, 2016 Luigi Freda ( La Sapienza University) Lecture 3 October 19, 2016 1 / 46 Outline 1 Common Continuous

More information

Refresher on Discrete Probability

Refresher on Discrete Probability Refresher on Discrete Probability STAT 27725/CMSC 25400: Machine Learning Shubhendu Trivedi University of Chicago October 2015 Background Things you should have seen before Events, Event Spaces Probability

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Quick Tour of Basic Probability Theory and Linear Algebra

Quick Tour of Basic Probability Theory and Linear Algebra Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra CS224w: Social and Information Network Analysis Fall 2011 Quick Tour of and Linear Algebra Quick Tour of and Linear Algebra Outline Definitions

More information

Lecture 1: Probability Fundamentals

Lecture 1: Probability Fundamentals Lecture 1: Probability Fundamentals IB Paper 7: Probability and Statistics Carl Edward Rasmussen Department of Engineering, University of Cambridge January 22nd, 2008 Rasmussen (CUED) Lecture 1: Probability

More information

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline.

Random Variables. Cumulative Distribution Function (CDF) Amappingthattransformstheeventstotherealline. Random Variables Amappingthattransformstheeventstotherealline. Example 1. Toss a fair coin. Define a random variable X where X is 1 if head appears and X is if tail appears. P (X =)=1/2 P (X =1)=1/2 Example

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

Probability Theory for Machine Learning. Chris Cremer September 2015

Probability Theory for Machine Learning. Chris Cremer September 2015 Probability Theory for Machine Learning Chris Cremer September 2015 Outline Motivation Probability Definitions and Rules Probability Distributions MLE for Gaussian Parameter Estimation MLE and Least Squares

More information

Grundlagen der Künstlichen Intelligenz

Grundlagen der Künstlichen Intelligenz Grundlagen der Künstlichen Intelligenz Uncertainty & Probabilities & Bandits Daniel Hennes 16.11.2017 (WS 2017/18) University Stuttgart - IPVS - Machine Learning & Robotics 1 Today Uncertainty Probability

More information

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27

Probability Review. Yutian Li. January 18, Stanford University. Yutian Li (Stanford University) Probability Review January 18, / 27 Probability Review Yutian Li Stanford University January 18, 2018 Yutian Li (Stanford University) Probability Review January 18, 2018 1 / 27 Outline 1 Elements of probability 2 Random variables 3 Multiple

More information

Lecture 1: August 28

Lecture 1: August 28 36-705: Intermediate Statistics Fall 2017 Lecturer: Siva Balakrishnan Lecture 1: August 28 Our broad goal for the first few lectures is to try to understand the behaviour of sums of independent random

More information

Review of probability

Review of probability Review of probability Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Goals for the lecture you should understand the following concepts definition of probability random variables

More information

STAT 430/510 Probability Lecture 7: Random Variable and Expectation

STAT 430/510 Probability Lecture 7: Random Variable and Expectation STAT 430/510 Probability Lecture 7: Random Variable and Expectation Pengyuan (Penelope) Wang June 2, 2011 Review Properties of Probability Conditional Probability The Law of Total Probability Bayes Formula

More information

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3

Probability. Paul Schrimpf. January 23, Definitions 2. 2 Properties 3 Probability Paul Schrimpf January 23, 2018 Contents 1 Definitions 2 2 Properties 3 3 Random variables 4 3.1 Discrete........................................... 4 3.2 Continuous.........................................

More information

Probability Theory Review Reading Assignments

Probability Theory Review Reading Assignments Probability Theory Review Reading Assignments R. Duda, P. Hart, and D. Stork, Pattern Classification, John-Wiley, 2nd edition, 2001 (appendix A.4, hard-copy). "Everything I need to know about Probability"

More information

Basic Statistics and Probability Theory

Basic Statistics and Probability Theory 0. Basic Statistics and Probability Theory Based on Foundations of Statistical NLP C. Manning & H. Schütze, ch. 2, MIT Press, 2002 Probability theory is nothing but common sense reduced to calculation.

More information

Machine Learning using Bayesian Approaches

Machine Learning using Bayesian Approaches Machine Learning using Bayesian Approaches Sargur N. Srihari University at Buffalo, State University of New York 1 Outline 1. Progress in ML and PR 2. Fully Bayesian Approach 1. Probability theory Bayes

More information

S n = x + X 1 + X X n.

S n = x + X 1 + X X n. 0 Lecture 0 0. Gambler Ruin Problem Let X be a payoff if a coin toss game such that P(X = ) = P(X = ) = /2. Suppose you start with x dollars and play the game n times. Let X,X 2,...,X n be payoffs in each

More information

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 10: Expectation and Variance Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin www.cs.cmu.edu/ psarkar/teaching

More information

SDS 321: Introduction to Probability and Statistics

SDS 321: Introduction to Probability and Statistics SDS 321: Introduction to Probability and Statistics Lecture 13: Expectation and Variance and joint distributions Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin

More information

CME 106: Review Probability theory

CME 106: Review Probability theory : Probability theory Sven Schmit April 3, 2015 1 Overview In the first half of the course, we covered topics from probability theory. The difference between statistics and probability theory is the following:

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Bayesian Models in Machine Learning

Bayesian Models in Machine Learning Bayesian Models in Machine Learning Lukáš Burget Escuela de Ciencias Informáticas 2017 Buenos Aires, July 24-29 2017 Frequentist vs. Bayesian Frequentist point of view: Probability is the frequency of

More information

Probability and Estimation. Alan Moses

Probability and Estimation. Alan Moses Probability and Estimation Alan Moses Random variables and probability A random variable is like a variable in algebra (e.g., y=e x ), but where at least part of the variability is taken to be stochastic.

More information

Multivariate probability distributions and linear regression

Multivariate probability distributions and linear regression Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,

More information

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY

DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY DEEP LEARNING CHAPTER 3 PROBABILITY & INFORMATION THEORY OUTLINE 3.1 Why Probability? 3.2 Random Variables 3.3 Probability Distributions 3.4 Marginal Probability 3.5 Conditional Probability 3.6 The Chain

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

1 Presessional Probability

1 Presessional Probability 1 Presessional Probability Probability theory is essential for the development of mathematical models in finance, because of the randomness nature of price fluctuations in the markets. This presessional

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Chapter 1. Introduction Shuai Huang April 21, 2014 Outline 1 What is Machine Learning? 2 Curve Fitting 3 Probability Theory 4 Model Selection 5 The curse of dimensionality

More information

CS 630 Basic Probability and Information Theory. Tim Campbell

CS 630 Basic Probability and Information Theory. Tim Campbell CS 630 Basic Probability and Information Theory Tim Campbell 21 January 2003 Probability Theory Probability Theory is the study of how best to predict outcomes of events. An experiment (or trial or event)

More information

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Theorems. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Theorems Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables. Probability UBC Economics 326 January 23, 2018 1 2 3 Wooldridge (2013) appendix B Stock and Watson (2009) chapter 2 Linton (2017) chapters 1-5 Abbring (2001) sections 2.1-2.3 Diez, Barr, and Cetinkaya-Rundel

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

3. Probability and Statistics

3. Probability and Statistics FE661 - Statistical Methods for Financial Engineering 3. Probability and Statistics Jitkomut Songsiri definitions, probability measures conditional expectations correlation and covariance some important

More information

Probability Review. Chao Lan

Probability Review. Chao Lan Probability Review Chao Lan Let s start with a single random variable Random Experiment A random experiment has three elements 1. sample space Ω: set of all possible outcomes e.g.,ω={1,2,3,4,5,6} 2. event

More information

EE514A Information Theory I Fall 2013

EE514A Information Theory I Fall 2013 EE514A Information Theory I Fall 2013 K. Mohan, Prof. J. Bilmes University of Washington, Seattle Department of Electrical Engineering Fall Quarter, 2013 http://j.ee.washington.edu/~bilmes/classes/ee514a_fall_2013/

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Notes 12 Autumn 2005

Notes 12 Autumn 2005 MAS 08 Probability I Notes Autumn 005 Conditional random variables Remember that the conditional probability of event A given event B is P(A B) P(A B)/P(B). Suppose that X is a discrete random variable.

More information

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University

Chapter 3, 4 Random Variables ENCS Probability and Stochastic Processes. Concordia University Chapter 3, 4 Random Variables ENCS6161 - Probability and Stochastic Processes Concordia University ENCS6161 p.1/47 The Notion of a Random Variable A random variable X is a function that assigns a real

More information

Dept. of Linguistics, Indiana University Fall 2015

Dept. of Linguistics, Indiana University Fall 2015 L645 Dept. of Linguistics, Indiana University Fall 2015 1 / 34 To start out the course, we need to know something about statistics and This is only an introduction; for a fuller understanding, you would

More information

ECE 4400:693 - Information Theory

ECE 4400:693 - Information Theory ECE 4400:693 - Information Theory Dr. Nghi Tran Lecture 8: Differential Entropy Dr. Nghi Tran (ECE-University of Akron) ECE 4400:693 Lecture 1 / 43 Outline 1 Review: Entropy of discrete RVs 2 Differential

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Lecture 11. Probability Theory: an Overveiw

Lecture 11. Probability Theory: an Overveiw Math 408 - Mathematical Statistics Lecture 11. Probability Theory: an Overveiw February 11, 2013 Konstantin Zuev (USC) Math 408, Lecture 11 February 11, 2013 1 / 24 The starting point in developing the

More information

Lecture 1. ABC of Probability

Lecture 1. ABC of Probability Math 408 - Mathematical Statistics Lecture 1. ABC of Probability January 16, 2013 Konstantin Zuev (USC) Math 408, Lecture 1 January 16, 2013 1 / 9 Agenda Sample Spaces Realizations, Events Axioms of Probability

More information

STAT2201. Analysis of Engineering & Scientific Data. Unit 3

STAT2201. Analysis of Engineering & Scientific Data. Unit 3 STAT2201 Analysis of Engineering & Scientific Data Unit 3 Slava Vaisman The University of Queensland School of Mathematics and Physics What we learned in Unit 2 (1) We defined a sample space of a random

More information

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell

Aarti Singh. Lecture 2, January 13, Reading: Bishop: Chap 1,2. Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Machine Learning 0-70/5 70/5-78, 78, Spring 00 Probability 0 Aarti Singh Lecture, January 3, 00 f(x) µ x Reading: Bishop: Chap, Slides courtesy: Eric Xing, Andrew Moore, Tom Mitchell Announcements Homework

More information

Joint Probability Distributions, Correlations

Joint Probability Distributions, Correlations Joint Probability Distributions, Correlations What we learned so far Events: Working with events as sets: union, intersection, etc. Some events are simple: Head vs Tails, Cancer vs Healthy Some are more

More information

Review of Probability. CS1538: Introduction to Simulations

Review of Probability. CS1538: Introduction to Simulations Review of Probability CS1538: Introduction to Simulations Probability and Statistics in Simulation Why do we need probability and statistics in simulation? Needed to validate the simulation model Needed

More information

6.041/6.431 Fall 2010 Quiz 2 Solutions

6.041/6.431 Fall 2010 Quiz 2 Solutions 6.04/6.43: Probabilistic Systems Analysis (Fall 200) 6.04/6.43 Fall 200 Quiz 2 Solutions Problem. (80 points) In this problem: (i) X is a (continuous) uniform random variable on [0, 4]. (ii) Y is an exponential

More information

Introduction to Statistical Inference Self-study

Introduction to Statistical Inference Self-study Introduction to Statistical Inference Self-study Contents Definition, sample space The fundamental object in probability is a nonempty sample space Ω. An event is a subset A Ω. Definition, σ-algebra A

More information

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 2: Probability Theory (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics, Appendix

More information

Gaussian Mixture Models

Gaussian Mixture Models Gaussian Mixture Models David Rosenberg, Brett Bernstein New York University April 26, 2017 David Rosenberg, Brett Bernstein (New York University) DS-GA 1003 April 26, 2017 1 / 42 Intro Question Intro

More information

1 Review of Probability

1 Review of Probability 1 Review of Probability Random variables are denoted by X, Y, Z, etc. The cumulative distribution function (c.d.f.) of a random variable X is denoted by F (x) = P (X x), < x

More information

Random Variables and Expectations

Random Variables and Expectations Inside ECOOMICS Random Variables Introduction to Econometrics Random Variables and Expectations A random variable has an outcome that is determined by an experiment and takes on a numerical value. A procedure

More information

Probability Theory and Applications

Probability Theory and Applications Probability Theory and Applications Videos of the topics covered in this manual are available at the following links: Lesson 4 Probability I http://faculty.citadel.edu/silver/ba205/online course/lesson

More information

STAT Chapter 5 Continuous Distributions

STAT Chapter 5 Continuous Distributions STAT 270 - Chapter 5 Continuous Distributions June 27, 2012 Shirin Golchi () STAT270 June 27, 2012 1 / 59 Continuous rv s Definition: X is a continuous rv if it takes values in an interval, i.e., range

More information

Joint Distribution of Two or More Random Variables

Joint Distribution of Two or More Random Variables Joint Distribution of Two or More Random Variables Sometimes more than one measurement in the form of random variable is taken on each member of the sample space. In cases like this there will be a few

More information

Lecture 16. Lectures 1-15 Review

Lecture 16. Lectures 1-15 Review 18.440: Lecture 16 Lectures 1-15 Review Scott Sheffield MIT 1 Outline Counting tricks and basic principles of probability Discrete random variables 2 Outline Counting tricks and basic principles of probability

More information

A Brief Review of Probability, Bayesian Statistics, and Information Theory

A Brief Review of Probability, Bayesian Statistics, and Information Theory A Brief Review of Probability, Bayesian Statistics, and Information Theory Brendan Frey Electrical and Computer Engineering University of Toronto frey@psi.toronto.edu http://www.psi.toronto.edu A system

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Logistic Regression Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB CSE 474/574

More information

Continuous Random Variables

Continuous Random Variables 1 / 24 Continuous Random Variables Saravanan Vijayakumaran sarva@ee.iitb.ac.in Department of Electrical Engineering Indian Institute of Technology Bombay February 27, 2013 2 / 24 Continuous Random Variables

More information

STAT/MATH 395 PROBABILITY II

STAT/MATH 395 PROBABILITY II STAT/MATH 395 PROBABILITY II Bivariate Distributions Néhémy Lim University of Washington Winter 2017 Outline Distributions of Two Random Variables Distributions of Two Discrete Random Variables Distributions

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2

COMPSCI 650 Applied Information Theory Jan 21, Lecture 2 COMPSCI 650 Applied Information Theory Jan 21, 2016 Lecture 2 Instructor: Arya Mazumdar Scribe: Gayane Vardoyan, Jong-Chyi Su 1 Entropy Definition: Entropy is a measure of uncertainty of a random variable.

More information

Bivariate Distributions

Bivariate Distributions STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 17 Néhémy Lim Bivariate Distributions 1 Distributions of Two Random Variables Definition 1.1. Let X and Y be two rrvs on probability space (Ω, A, P).

More information

Information Theory Primer:

Information Theory Primer: Information Theory Primer: Entropy, KL Divergence, Mutual Information, Jensen s inequality Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

3. Review of Probability and Statistics

3. Review of Probability and Statistics 3. Review of Probability and Statistics ECE 830, Spring 2014 Probabilistic models will be used throughout the course to represent noise, errors, and uncertainty in signal processing problems. This lecture

More information

Lecture 1: Review on Probability and Statistics

Lecture 1: Review on Probability and Statistics STAT 516: Stochastic Modeling of Scientific Data Autumn 2018 Instructor: Yen-Chi Chen Lecture 1: Review on Probability and Statistics These notes are partially based on those of Mathias Drton. 1.1 Motivating

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

Chapter 2. Continuous random variables

Chapter 2. Continuous random variables Chapter 2 Continuous random variables Outline Review of probability: events and probability Random variable Probability and Cumulative distribution function Review of discrete random variable Introduction

More information

Appendix A : Introduction to Probability and stochastic processes

Appendix A : Introduction to Probability and stochastic processes A-1 Mathematical methods in communication July 5th, 2009 Appendix A : Introduction to Probability and stochastic processes Lecturer: Haim Permuter Scribe: Shai Shapira and Uri Livnat The probability of

More information

Data Mining Techniques. Lecture 3: Probability

Data Mining Techniques. Lecture 3: Probability Data Mining Techniques CS 6220 - Section 3 - Fall 2016 Lecture 3: Probability Jan-Willem van de Meent (credit: Zhao, CS 229, Bishop) Project Vote 1. Freeform: Develop your own project proposals 30% of

More information

MATH Notebook 5 Fall 2018/2019

MATH Notebook 5 Fall 2018/2019 MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

Class 26: review for final exam 18.05, Spring 2014

Class 26: review for final exam 18.05, Spring 2014 Probability Class 26: review for final eam 8.05, Spring 204 Counting Sets Inclusion-eclusion principle Rule of product (multiplication rule) Permutation and combinations Basics Outcome, sample space, event

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Machine Learning Srihari. Probability Theory. Sargur N. Srihari Probability Theory Sargur N. Srihari srihari@cedar.buffalo.edu 1 Probability Theory with Several Variables Key concept is dealing with uncertainty Due to noise and finite data sets Framework for quantification

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

Foundations of Statistical Inference

Foundations of Statistical Inference Foundations of Statistical Inference Jonathan Marchini Department of Statistics University of Oxford MT 2013 Jonathan Marchini (University of Oxford) BS2a MT 2013 1 / 27 Course arrangements Lectures M.2

More information

Homework 4 Solution, due July 23

Homework 4 Solution, due July 23 Homework 4 Solution, due July 23 Random Variables Problem 1. Let X be the random number on a die: from 1 to. (i) What is the distribution of X? (ii) Calculate EX. (iii) Calculate EX 2. (iv) Calculate Var

More information