Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion

Size: px
Start display at page:

Download "Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion"

Transcription

1 Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1

2 Classical Artificial Intelligence Expert Systems Theorem Provers Shakey Chess Largely characterized by determinism. 2

3 Modern Artificial Intelligence Fingerprint ID Internet Search Vision facial ID, object recognition Speech Recognition Asimo Jeopardy! Statistical modeling to generalize from data. 3

4 Two Caveats about Statistical Modeling Black Swans The Long Tail 4

5 Black Swans In the 17 th Century, all known swans were white. Based on evidence, it is impossible for a swan to be anything other than white. In the 18 th Century, black swans were discovered in Western Australia Black Swans are rare, sometimes unpredictable events, that have extreme impact Almost all statistical models underestimate the likelihood of unseen events. 5

6 The Long Tail Many events follow an exponential distribution These distributions have a very long tail. I.e. A large region with significant probability mass, but low likelihood at any particular point. Often, interesting events occur in the Long Tail, but it is difficult to accurately model behavior in this region. 6

7 Boxes and Balls 2 Boxes, one red and one blue. Each contain colored balls. 7

8 Boxes and Balls Suppose we randomly select a box, then randomly draw a ball from that box. The identity of the Box is a random variable, B. The identity of the ball is a random variable, L. B can take 2 values, r, or b L can take 2 values, g or o. 8

9 Boxes and Balls Given some information about B and L, we want to ask questions about the likelihood of different events. What is the probability of selecting an apple? If I chose an orange ball, what is the probability that I chose from the blue box? 9

10 Some basics The probability (or likelihood) of an event is the fraction of times that the event occurs out of n trials, as n approaches infinity. Probabilities lie in the range [0,1] Mutually exclusive events are events that cannot simultaneously occur. The sum of the likelihoods of all mutually exclusive events must equal 1. If two events are independent then, p(x, Y) = p(x)p(y) p(x Y) = p(x) 10

11 Joint Probability P(X,Y) A Joint Probability function defines the likelihood of two (or more) events occurring. Orange Green Blue box Red box Let n ij be the number of times event i and event j simultaneously occur. p(x = x i,y = y i )= n ij N 11

12 Generalizing the Joint Probability n ij r i = j n ij c j = i n ij n ij = N i j 12

13 Marginalization Consider the probability of X irrespective of Y. p(x = x j )= c j The number of instances in column j is the sum of instances in each cell L c j = n ij i=1 Therefore, we can marginalize or sum over Y: L p(x = x j )= p(x = x j,y = y i ) j=1 N 13

14 Conditional Probability Consider only instances where X = x j. The fraction of these instances where Y = y i is the conditional probability The probability of y given x p(y = y i X = x j )= n ij c j 14

15 Relating the Joint, Conditional and Marginal p(x = x i,y = y j )= n ij N = n ij ci c i N = p(y = y j X = x i )p(x = x i ) 15

16 Sum and Product Rules In general, we ll refer to a distribution over a random variable as p(x) and a distribution evaluated at a particular value as p(x). Sum Rule p(x) = Y p(x, Y ) Product Rule p(x, Y )=p(y X)p(X) 16

17 Bayes Rule p(y X) = p(x Y )p(y ) p(x) 17

18 Interpretation of Bayes Rule Posterior p(y X) = Likelihood p(x Y )p(y ) p(x) Prior Prior: Information we have before observation. Posterior: The distribution of Y after observing X Likelihood: The likelihood of observing X given Y 18

19 Boxes and Balls with Bayes Rule Assuming I m inherently more likely to select the red box (66.6%) than the blue box (33.3%). If I selected an orange ball, what is the likelihood that I selected the red box? The blue box? 19

20 Boxes and Balls p(b = r L = o) = = p(b = b L = o) = = p(l = o B = r)p(b = r) p(l = o) = 6 7 p(l = o B = b)p(b = b) p(l = o) =

21 Naïve Bayes Classification This is a simple case of a simple classification approach. Here the Box is the class, and the colored ball is a feature, or the observation. We can extend this Bayesian classification approach to incorporate more independent features. 21

22 Naïve Bayes Classification Some theory first. c = argmax c c = argmax c p(c x 1,x 2,...,x n ) p(x 1,x 2,...,x n c)p(c) p(x 1,x 2,...,x n ) p(x 1,x 2,...,x n c) =p(x 1 c)p(x 2 c) p(x n c) 22

23 Naïve Bayes Classification Assuming independent features simplifies the math. c = argmax c p(x 1 c)p(x 2 c) p(x n c)p(c) p(x 1,x 2,...,x n ) c = argmax c p(x 1 c)p(x 2 c) p(x n c)p(c) 23

24 Naïve Bayes Example Data HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY FIRM BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? c = argmax c p(x 1 c)p(x 2 c) p(x n c)p(c) 24

25 Naïve Bayes Example Data HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY FIRM BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? Prior: p(c = red) =0.5 p(c = blue) =0.5 25

26 Naïve Bayes Example Data HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY SOFT BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? p(hot c = red) =0.75 p(hot c = blue) =0.5 p(heavy c = red) =0.5 p(firm c = red) =0.5 p(heavy c = blue) =0.5 p(firm c = blue) =0.5 26

27 Naïve Bayes Example Data p(hot c = red)p(heavy c = red)p(firm c = red)p(c = red) = HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY SOFT BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? = p(hot c = blue)p(heavy c = blue)p(firm c = blue)p(c = blue) = =

28 Continuous Probabilities So far, X has been discrete where it can take one of M values. What if X is continuous? Now p(x) is a continuous probability density function. The probability that x will lie in an interval (a,b) is: p(x (a, b)) = a b p(x)dx 28

29 Continuous probability example 29

30 Properties of probability density functions p(x) 1 p(x) = p(x)dx =1 Sum Rule Product Rule p(x, y)dy p(x, y) =p(y x)p(x) 30

31 Expected Values Given a random variable, with a distribution p (X), what is the expected value of X? E[x] = x p(x)x E[x] = p(x)xdx 31

32 Multinomial Distribution If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution. The probability of x being in state k is µ k K K k=1 µ k =1 p(x; µ) = k=1 µ x k k 32

33 Expected Value of a Multinomial The expected value is the mean values. E[x; µ] = x p(x; µ)x =(µ 0,µ 1,...,µ K 1 ) T 33

34 Gaussian Distribution One Dimension N(x; µ, σ 2 )= 1 2πσ 2 exp 1 (x µ)2 2σ2 D-Dimensions (x µ, Σ) = 1 exp (2π) D/2 Σ 1/2 1 2 (x µ)t Σ 1 (x µ) 34

35 Gaussians 35

36 How machine learning uses Expectation statistical modeling The expected value of a function is the hypothesis Variance The variance is the confidence in that hypothesis 36

37 Variance The variance of a random variable describes how much variability around the expected value there is. Calculated as the expected squared error. var[f] =E[(f(x) E[f(x)]) 2 ] var[f] =E[f(x) 2 ] E[f(x)] 2 37

38 Covariance The covariance of two random variables expresses how they vary together. cov[x, y] =E x,y [(x E(x))(y E[y])] = E x,y [xy] E[x]E[y] If two variables are independent, their covariance equals zero. 38

39 Linear Algebra Vectors A one dimensional array. If not specified, assume x is a column vector. Matrices Higher dimensional array. Typically denoted with capital letters. n rows by m columns A = x = x 0 x 1... x n 1 a 0,0 a 0,1... a 0,m 1 a 1,0 a 1,1 a 1,m a n 1,0 a n 1,1... a n 1,m 1 39

40 Transposition Transposing a matrix swaps columns and rows. x = x 0 x 1... x n 1 x T = x 0 x 1... x n 1 40

41 Transposition Transposing a matrix swaps columns and rows. a 0,0 a 0,1... a 0,m 1 a 1,0 a 1,1 a 1,m 1 A =..... a n 1,0 a n 1,1... a n 1,m 1 a 0,0 a 1,0... a n 1,0 A T a 0,1 a 1,1 a 1,m 1 =..... a 0,m 1 a 1,m 1... a n 1,m 1 41

42 Addition Matrices can be added to themselves iff they have the same dimensions. A and B are both n-by-m matrices. + B = a 0,0 + b 0,0 a 0,1 + b 0,1... a 0,m 1 + b 0,m 1 a 1,0 + b 1,0 a 1,1 + b 1,1 a 1,m 1 + b 1,m a n 1,0 + b n 1,0 a n 1,1 + b n 1,1... a n 1,m 1 + b n 1,m 1 42

43 Multiplication To multiply two matrices, the inner dimensions must be the same. An n-by-m matrix can be multiplied by an m-by-k matrix AB = C c ij = m a ik b kj k=0 43

44 Inversion The inverse of an n-by-n or square matrix A is denoted A -1, and has the following property. AA 1 = I Where I is the identity matrix is an n-by-n matrix with ones along the diagonal. I ij = 1 iff i = j, 0 otherwise 44

45 Identity Matrix Matrices are invariant under multiplication by the identity matrix. AI = A IA = A 45

46 Helpful matrix inversion properties (A 1 ) 1 = A (ka) 1 = k 1 A 1 (A T ) 1 =(A 1 ) T (AB) 1 = B 1 A 1 46

47 Norm The norm of a vector, x, represents the euclidean length of a vector. x = = n 1 i=0 x 2 i x x x2 n 1 47

48 Positive Definite-ness Quadratic form Scalar c 0 + c 1 x + c 2 x 2 Vector x T Ax Positive Definite matrix M x T Mx > 0 Positive Semi-definite x T Mx 0 48

49 Calculus Derivatives and Integrals Optimization 49

50 Derivatives A derivative of a function defines the slope at a point x. d dx f(x) or f (x) 50

51 Derivative Example 51

52 Integrals Integration is the inverse operation of derivation (plus a constant) f(x)dx = F (x)+c F (x) =f(x) Graphically, an integral can be considered the area under the curve defined by f(x) 52

53 Integration Example 53

54 Vector Calculus Derivation with respect to a matrix or vector Gradient Change of Variables with a Vector 54

55 Derivative w.r.t. a vector Given a vector x, and a function f(x), how can we find f (x)? f(x) :R n R 55

56 Derivative w.r.t. a vector Given a vector x, and a function f(x), how can we find f (x)? f(x) x = f(x) x 0 f(x) x 1... f(x) x n 1 f(x) :R n R 56

57 Example Derivation f(x) =x 0 +4x 1 x 2 f(x) x 0 =1 f(x) x 1 =4x 2 f(x) x 2 =4x 1 57

58 Example Derivation f(x) =x 0 +4x 1 x 2 f(x) x = f(x) x 0 f(x) = x 1 f(x) x 2 1 4x 2 4x 1 Also referred to as the gradient of a function. f(x) or f 58

59 Useful Vector Calculus identities Scalar Multiplication x (xt a) = Product Rule x (at x) =a x (AB) = A x B + A B x x (xt A)=A (Ax) =AT x 59

60 Useful Vector Calculus identities Derivative of an inverse A x (A 1 )= A 1 x A 1 Change of Variable f(x)dx = f(u) x u du 60

61 Optimization Have an objective function that we d like to maximize or minimize, f(x) Set the first derivative to zero. 61

62 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject to a constraint. Two functions: An objective function to maximize An inequality that must be satisfied 62

63 Lagrange Multipliers Find maxima of f (x,y) subject to a constraint. f(x, y) =x +2y x 2 + y 2 =1 63

64 General form Maximizing: Subject to: f(x, y) g(x, y) =c Introduce a new variable, and find a maxima. Λ(x, y, λ) =f(x, y)+λ(g(x, y) c) 64

65 Example Maximizing: Subject to: f(x, y) =x +2y x 2 + y 2 =1 Introduce a new variable, and find a maxima. Λ(x, y, λ) =x +2y + λ(x 2 + y 2 1) 65

66 Example Λ(x, y, λ) x Λ(x, y, λ) y =1+2λx =0 =2+2λy =0 Λ(x, y, λ) λ =(x 2 + y 2 1) = 0 Now have 3 equations with 3 unknowns. 66

67 Example Eliminate Lambda 1=2λx 2=2λy 1 x =2λ = 2 y y =2x Substitute and Solve x 2 + y 2 =1 x 2 +(2x) 2 =1 5x 2 =1 x = ± 1 5 y = ±

68 Why does Machine Learning need these tools? Calculus We need to identify the maximum likelihood, or minimum risk. Optimization Integration allows the marginalization of continuous probability density functions Linear Algebra Many features leads to high dimensional spaces Vectors and matrices allow us to compactly describe and manipulate high dimension al feature spaces. 68

69 Why does Machine Learning need Vector Calculus these tools? All of the optimization needs to be performed in high dimensional spaces Optimization of multiple variables simultaneously Gradient Descent Want to take a marginal over high dimensional distributions like Gaussians. 69

70 Next Time Linear Regression and Regularization Read Chapter 1.1, 3.1,

[POLS 8500] Review of Linear Algebra, Probability and Information Theory

[POLS 8500] Review of Linear Algebra, Probability and Information Theory [POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming

More information

Today. Calculus. Linear Regression. Lagrange Multipliers

Today. Calculus. Linear Regression. Lagrange Multipliers Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject

More information

01 Probability Theory and Statistics Review

01 Probability Theory and Statistics Review NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement

More information

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1

Machine Learning for Large-Scale Data Analysis and Decision Making A. Week #1 Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Gaussian Processes for Machine Learning

Gaussian Processes for Machine Learning Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Some Probability and Statistics

Some Probability and Statistics Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my

More information

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics)

Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Final Exam # 3. Sta 230: Probability. December 16, 2012

Final Exam # 3. Sta 230: Probability. December 16, 2012 Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets

More information

Lecture Note 1: Probability Theory and Statistics

Lecture Note 1: Probability Theory and Statistics Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

Machine Learning for Signal Processing Bayes Classification and Regression

Machine Learning for Signal Processing Bayes Classification and Regression Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For

More information

Machine Learning Srihari. Probability Theory. Sargur N. Srihari

Machine Learning Srihari. Probability Theory. Sargur N. Srihari Probability Theory Sargur N. Srihari srihari@cedar.buffalo.edu 1 Probability Theory with Several Variables Key concept is dealing with uncertainty Due to noise and finite data sets Framework for quantification

More information

Practice Examination # 3

Practice Examination # 3 Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single

More information

B4 Estimation and Inference

B4 Estimation and Inference B4 Estimation and Inference 6 Lectures Hilary Term 27 2 Tutorial Sheets A. Zisserman Overview Lectures 1 & 2: Introduction sensors, and basics of probability density functions for representing sensor error

More information

Naïve Bayes classification

Naïve Bayes classification Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss

More information

Intro to Probability. Andrei Barbu

Intro to Probability. Andrei Barbu Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems

More information

Statistical Data Mining and Machine Learning Hilary Term 2016

Statistical Data Mining and Machine Learning Hilary Term 2016 Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes

More information

Bayesian Linear Regression [DRAFT - In Progress]

Bayesian Linear Regression [DRAFT - In Progress] Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory

More information

Lecture 1 October 9, 2013

Lecture 1 October 9, 2013 Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB

More information

Introduction to Bayesian Learning. Machine Learning Fall 2018

Introduction to Bayesian Learning. Machine Learning Fall 2018 Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability

More information

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner

Fundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Chris Bishop s PRML Ch. 8: Graphical Models

Chris Bishop s PRML Ch. 8: Graphical Models Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Machine Learning Srihari. Information Theory. Sargur N. Srihari

Machine Learning Srihari. Information Theory. Sargur N. Srihari Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy

More information

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows. Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage

More information

Probability and Information Theory. Sargur N. Srihari

Probability and Information Theory. Sargur N. Srihari Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal

More information

Lecture 1: Bayesian Framework Basics

Lecture 1: Bayesian Framework Basics Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of

More information

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability

Naïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)

a b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2) This is my preperation notes for teaching in sections during the winter 2018 quarter for course CSE 446. Useful for myself to review the concepts as well. More Linear Algebra Definition 1.1 (Dot Product).

More information

Statistical Learning Theory

Statistical Learning Theory Statistical Learning Theory Part I : Mathematical Learning Theory (1-8) By Sumio Watanabe, Evaluation : Report Part II : Information Statistical Mechanics (9-15) By Yoshiyuki Kabashima, Evaluation : Report

More information

Some Concepts of Probability (Review) Volker Tresp Summer 2018

Some Concepts of Probability (Review) Volker Tresp Summer 2018 Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV

More information

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016

Probabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016 Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier

More information

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014

Probability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014 Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I

SYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability

More information

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review You can t see this text! Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Matrix Algebra Review 1 / 54 Outline 1

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Lecture 2: Simple Classifiers

Lecture 2: Simple Classifiers CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412

More information

OR MSc Maths Revision Course

OR MSc Maths Revision Course OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

Introduction to Probability and Statistics (Continued)

Introduction to Probability and Statistics (Continued) Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:

More information

{ p if x = 1 1 p if x = 0

{ p if x = 1 1 p if x = 0 Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =

More information

p(z)

p(z) Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples

Machine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds

More information

Review of Probability Theory

Review of Probability Theory Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving

More information

Robots Autónomos. Depto. CCIA. 2. Bayesian Estimation and sensor models. Domingo Gallardo

Robots Autónomos. Depto. CCIA. 2. Bayesian Estimation and sensor models.  Domingo Gallardo Robots Autónomos 2. Bayesian Estimation and sensor models Domingo Gallardo Depto. CCIA http://www.rvg.ua.es/master/robots References Recursive State Estimation: Thrun, chapter 2 Sensor models and robot

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Let X and Y denote two random variables. The joint distribution of these random

Let X and Y denote two random variables. The joint distribution of these random EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Basic Linear Algebra in MATLAB

Basic Linear Algebra in MATLAB Basic Linear Algebra in MATLAB 9.29 Optional Lecture 2 In the last optional lecture we learned the the basic type in MATLAB is a matrix of double precision floating point numbers. You learned a number

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

Mixtures of Gaussians. Sargur Srihari

Mixtures of Gaussians. Sargur Srihari Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm

More information

Lecture Notes on the Gaussian Distribution

Lecture Notes on the Gaussian Distribution Lecture Notes on the Gaussian Distribution Hairong Qi The Gaussian distribution is also referred to as the normal distribution or the bell curve distribution for its bell-shaped density curve. There s

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables

Chapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results

More information

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21

Discrete Mathematics and Probability Theory Fall 2015 Lecture 21 CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about

More information

Statistical Machine Learning Lectures 4: Variational Bayes

Statistical Machine Learning Lectures 4: Variational Bayes 1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference

More information

Probabilistic & Unsupervised Learning

Probabilistic & Unsupervised Learning Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College

More information

Mathematical foundations. Our goal here is to present the basic results and definitions from linear algebra, Appendix A. A.

Mathematical foundations. Our goal here is to present the basic results and definitions from linear algebra, Appendix A. A. Appendix A Mathematical foundations Our goal here is to present the basic results and definitions from linear algebra, probability theory, information theory and computational complexity that serve as

More information

Lecture 9: PGM Learning

Lecture 9: PGM Learning 13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and

More information

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro

CS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

Machine Learning for Signal Processing Bayes Classification

Machine Learning for Signal Processing Bayes Classification Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

Notes on Mathematics Groups

Notes on Mathematics Groups EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties

More information

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

Mathematical foundations - linear algebra

Mathematical foundations - linear algebra Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar

More information

Parameter estimation Conditional risk

Parameter estimation Conditional risk Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution

More information

Introduction to Machine Learning

Introduction to Machine Learning What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Inf2b Learning and Data

Inf2b Learning and Data Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/

More information

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)

SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems

More information

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.

APPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2. APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II

More information

INTRODUCTION TO PATTERN RECOGNITION

INTRODUCTION TO PATTERN RECOGNITION INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take

More information

Stat 206: Sampling theory, sample moments, mahalanobis

Stat 206: Sampling theory, sample moments, mahalanobis Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Machine Learning, Fall 2012 Homework 2

Machine Learning, Fall 2012 Homework 2 0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair

More information