Today. Probability and Statistics. Linear Algebra. Calculus. Naïve Bayes Classification. Matrix Multiplication Matrix Inversion
|
|
- Phillip Floyd
- 6 years ago
- Views:
Transcription
1 Today Probability and Statistics Naïve Bayes Classification Linear Algebra Matrix Multiplication Matrix Inversion Calculus Vector Calculus Optimization Lagrange Multipliers 1
2 Classical Artificial Intelligence Expert Systems Theorem Provers Shakey Chess Largely characterized by determinism. 2
3 Modern Artificial Intelligence Fingerprint ID Internet Search Vision facial ID, object recognition Speech Recognition Asimo Jeopardy! Statistical modeling to generalize from data. 3
4 Two Caveats about Statistical Modeling Black Swans The Long Tail 4
5 Black Swans In the 17 th Century, all known swans were white. Based on evidence, it is impossible for a swan to be anything other than white. In the 18 th Century, black swans were discovered in Western Australia Black Swans are rare, sometimes unpredictable events, that have extreme impact Almost all statistical models underestimate the likelihood of unseen events. 5
6 The Long Tail Many events follow an exponential distribution These distributions have a very long tail. I.e. A large region with significant probability mass, but low likelihood at any particular point. Often, interesting events occur in the Long Tail, but it is difficult to accurately model behavior in this region. 6
7 Boxes and Balls 2 Boxes, one red and one blue. Each contain colored balls. 7
8 Boxes and Balls Suppose we randomly select a box, then randomly draw a ball from that box. The identity of the Box is a random variable, B. The identity of the ball is a random variable, L. B can take 2 values, r, or b L can take 2 values, g or o. 8
9 Boxes and Balls Given some information about B and L, we want to ask questions about the likelihood of different events. What is the probability of selecting an apple? If I chose an orange ball, what is the probability that I chose from the blue box? 9
10 Some basics The probability (or likelihood) of an event is the fraction of times that the event occurs out of n trials, as n approaches infinity. Probabilities lie in the range [0,1] Mutually exclusive events are events that cannot simultaneously occur. The sum of the likelihoods of all mutually exclusive events must equal 1. If two events are independent then, p(x, Y) = p(x)p(y) p(x Y) = p(x) 10
11 Joint Probability P(X,Y) A Joint Probability function defines the likelihood of two (or more) events occurring. Orange Green Blue box Red box Let n ij be the number of times event i and event j simultaneously occur. p(x = x i,y = y i )= n ij N 11
12 Generalizing the Joint Probability n ij r i = j n ij c j = i n ij n ij = N i j 12
13 Marginalization Consider the probability of X irrespective of Y. p(x = x j )= c j The number of instances in column j is the sum of instances in each cell L c j = n ij i=1 Therefore, we can marginalize or sum over Y: L p(x = x j )= p(x = x j,y = y i ) j=1 N 13
14 Conditional Probability Consider only instances where X = x j. The fraction of these instances where Y = y i is the conditional probability The probability of y given x p(y = y i X = x j )= n ij c j 14
15 Relating the Joint, Conditional and Marginal p(x = x i,y = y j )= n ij N = n ij ci c i N = p(y = y j X = x i )p(x = x i ) 15
16 Sum and Product Rules In general, we ll refer to a distribution over a random variable as p(x) and a distribution evaluated at a particular value as p(x). Sum Rule p(x) = Y p(x, Y ) Product Rule p(x, Y )=p(y X)p(X) 16
17 Bayes Rule p(y X) = p(x Y )p(y ) p(x) 17
18 Interpretation of Bayes Rule Posterior p(y X) = Likelihood p(x Y )p(y ) p(x) Prior Prior: Information we have before observation. Posterior: The distribution of Y after observing X Likelihood: The likelihood of observing X given Y 18
19 Boxes and Balls with Bayes Rule Assuming I m inherently more likely to select the red box (66.6%) than the blue box (33.3%). If I selected an orange ball, what is the likelihood that I selected the red box? The blue box? 19
20 Boxes and Balls p(b = r L = o) = = p(b = b L = o) = = p(l = o B = r)p(b = r) p(l = o) = 6 7 p(l = o B = b)p(b = b) p(l = o) =
21 Naïve Bayes Classification This is a simple case of a simple classification approach. Here the Box is the class, and the colored ball is a feature, or the observation. We can extend this Bayesian classification approach to incorporate more independent features. 21
22 Naïve Bayes Classification Some theory first. c = argmax c c = argmax c p(c x 1,x 2,...,x n ) p(x 1,x 2,...,x n c)p(c) p(x 1,x 2,...,x n ) p(x 1,x 2,...,x n c) =p(x 1 c)p(x 2 c) p(x n c) 22
23 Naïve Bayes Classification Assuming independent features simplifies the math. c = argmax c p(x 1 c)p(x 2 c) p(x n c)p(c) p(x 1,x 2,...,x n ) c = argmax c p(x 1 c)p(x 2 c) p(x n c)p(c) 23
24 Naïve Bayes Example Data HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY FIRM BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? c = argmax c p(x 1 c)p(x 2 c) p(x n c)p(c) 24
25 Naïve Bayes Example Data HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY FIRM BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? Prior: p(c = red) =0.5 p(c = blue) =0.5 25
26 Naïve Bayes Example Data HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY SOFT BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? p(hot c = red) =0.75 p(hot c = blue) =0.5 p(heavy c = red) =0.5 p(firm c = red) =0.5 p(heavy c = blue) =0.5 p(firm c = blue) =0.5 26
27 Naïve Bayes Example Data p(hot c = red)p(heavy c = red)p(firm c = red)p(c = red) = HOT LIGHT SOFT RED COLD HEAVY SOFT RED HOT HEAVY FIRM RED HOT LIGHT FIRM RED COLD LIGHT SOFT BLUE COLD HEAVY SOFT BLUE HOT HEAVY FIRM BLUE HOT LIGHT FIRM BLUE HOT HEAVY FIRM????? = p(hot c = blue)p(heavy c = blue)p(firm c = blue)p(c = blue) = =
28 Continuous Probabilities So far, X has been discrete where it can take one of M values. What if X is continuous? Now p(x) is a continuous probability density function. The probability that x will lie in an interval (a,b) is: p(x (a, b)) = a b p(x)dx 28
29 Continuous probability example 29
30 Properties of probability density functions p(x) 1 p(x) = p(x)dx =1 Sum Rule Product Rule p(x, y)dy p(x, y) =p(y x)p(x) 30
31 Expected Values Given a random variable, with a distribution p (X), what is the expected value of X? E[x] = x p(x)x E[x] = p(x)xdx 31
32 Multinomial Distribution If a variable, x, can take 1-of-K states, we represent the distribution of this variable as a multinomial distribution. The probability of x being in state k is µ k K K k=1 µ k =1 p(x; µ) = k=1 µ x k k 32
33 Expected Value of a Multinomial The expected value is the mean values. E[x; µ] = x p(x; µ)x =(µ 0,µ 1,...,µ K 1 ) T 33
34 Gaussian Distribution One Dimension N(x; µ, σ 2 )= 1 2πσ 2 exp 1 (x µ)2 2σ2 D-Dimensions (x µ, Σ) = 1 exp (2π) D/2 Σ 1/2 1 2 (x µ)t Σ 1 (x µ) 34
35 Gaussians 35
36 How machine learning uses Expectation statistical modeling The expected value of a function is the hypothesis Variance The variance is the confidence in that hypothesis 36
37 Variance The variance of a random variable describes how much variability around the expected value there is. Calculated as the expected squared error. var[f] =E[(f(x) E[f(x)]) 2 ] var[f] =E[f(x) 2 ] E[f(x)] 2 37
38 Covariance The covariance of two random variables expresses how they vary together. cov[x, y] =E x,y [(x E(x))(y E[y])] = E x,y [xy] E[x]E[y] If two variables are independent, their covariance equals zero. 38
39 Linear Algebra Vectors A one dimensional array. If not specified, assume x is a column vector. Matrices Higher dimensional array. Typically denoted with capital letters. n rows by m columns A = x = x 0 x 1... x n 1 a 0,0 a 0,1... a 0,m 1 a 1,0 a 1,1 a 1,m a n 1,0 a n 1,1... a n 1,m 1 39
40 Transposition Transposing a matrix swaps columns and rows. x = x 0 x 1... x n 1 x T = x 0 x 1... x n 1 40
41 Transposition Transposing a matrix swaps columns and rows. a 0,0 a 0,1... a 0,m 1 a 1,0 a 1,1 a 1,m 1 A =..... a n 1,0 a n 1,1... a n 1,m 1 a 0,0 a 1,0... a n 1,0 A T a 0,1 a 1,1 a 1,m 1 =..... a 0,m 1 a 1,m 1... a n 1,m 1 41
42 Addition Matrices can be added to themselves iff they have the same dimensions. A and B are both n-by-m matrices. + B = a 0,0 + b 0,0 a 0,1 + b 0,1... a 0,m 1 + b 0,m 1 a 1,0 + b 1,0 a 1,1 + b 1,1 a 1,m 1 + b 1,m a n 1,0 + b n 1,0 a n 1,1 + b n 1,1... a n 1,m 1 + b n 1,m 1 42
43 Multiplication To multiply two matrices, the inner dimensions must be the same. An n-by-m matrix can be multiplied by an m-by-k matrix AB = C c ij = m a ik b kj k=0 43
44 Inversion The inverse of an n-by-n or square matrix A is denoted A -1, and has the following property. AA 1 = I Where I is the identity matrix is an n-by-n matrix with ones along the diagonal. I ij = 1 iff i = j, 0 otherwise 44
45 Identity Matrix Matrices are invariant under multiplication by the identity matrix. AI = A IA = A 45
46 Helpful matrix inversion properties (A 1 ) 1 = A (ka) 1 = k 1 A 1 (A T ) 1 =(A 1 ) T (AB) 1 = B 1 A 1 46
47 Norm The norm of a vector, x, represents the euclidean length of a vector. x = = n 1 i=0 x 2 i x x x2 n 1 47
48 Positive Definite-ness Quadratic form Scalar c 0 + c 1 x + c 2 x 2 Vector x T Ax Positive Definite matrix M x T Mx > 0 Positive Semi-definite x T Mx 0 48
49 Calculus Derivatives and Integrals Optimization 49
50 Derivatives A derivative of a function defines the slope at a point x. d dx f(x) or f (x) 50
51 Derivative Example 51
52 Integrals Integration is the inverse operation of derivation (plus a constant) f(x)dx = F (x)+c F (x) =f(x) Graphically, an integral can be considered the area under the curve defined by f(x) 52
53 Integration Example 53
54 Vector Calculus Derivation with respect to a matrix or vector Gradient Change of Variables with a Vector 54
55 Derivative w.r.t. a vector Given a vector x, and a function f(x), how can we find f (x)? f(x) :R n R 55
56 Derivative w.r.t. a vector Given a vector x, and a function f(x), how can we find f (x)? f(x) x = f(x) x 0 f(x) x 1... f(x) x n 1 f(x) :R n R 56
57 Example Derivation f(x) =x 0 +4x 1 x 2 f(x) x 0 =1 f(x) x 1 =4x 2 f(x) x 2 =4x 1 57
58 Example Derivation f(x) =x 0 +4x 1 x 2 f(x) x = f(x) x 0 f(x) = x 1 f(x) x 2 1 4x 2 4x 1 Also referred to as the gradient of a function. f(x) or f 58
59 Useful Vector Calculus identities Scalar Multiplication x (xt a) = Product Rule x (at x) =a x (AB) = A x B + A B x x (xt A)=A (Ax) =AT x 59
60 Useful Vector Calculus identities Derivative of an inverse A x (A 1 )= A 1 x A 1 Change of Variable f(x)dx = f(u) x u du 60
61 Optimization Have an objective function that we d like to maximize or minimize, f(x) Set the first derivative to zero. 61
62 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject to a constraint. Two functions: An objective function to maximize An inequality that must be satisfied 62
63 Lagrange Multipliers Find maxima of f (x,y) subject to a constraint. f(x, y) =x +2y x 2 + y 2 =1 63
64 General form Maximizing: Subject to: f(x, y) g(x, y) =c Introduce a new variable, and find a maxima. Λ(x, y, λ) =f(x, y)+λ(g(x, y) c) 64
65 Example Maximizing: Subject to: f(x, y) =x +2y x 2 + y 2 =1 Introduce a new variable, and find a maxima. Λ(x, y, λ) =x +2y + λ(x 2 + y 2 1) 65
66 Example Λ(x, y, λ) x Λ(x, y, λ) y =1+2λx =0 =2+2λy =0 Λ(x, y, λ) λ =(x 2 + y 2 1) = 0 Now have 3 equations with 3 unknowns. 66
67 Example Eliminate Lambda 1=2λx 2=2λy 1 x =2λ = 2 y y =2x Substitute and Solve x 2 + y 2 =1 x 2 +(2x) 2 =1 5x 2 =1 x = ± 1 5 y = ±
68 Why does Machine Learning need these tools? Calculus We need to identify the maximum likelihood, or minimum risk. Optimization Integration allows the marginalization of continuous probability density functions Linear Algebra Many features leads to high dimensional spaces Vectors and matrices allow us to compactly describe and manipulate high dimension al feature spaces. 68
69 Why does Machine Learning need Vector Calculus these tools? All of the optimization needs to be performed in high dimensional spaces Optimization of multiple variables simultaneously Gradient Descent Want to take a marginal over high dimensional distributions like Gaussians. 69
70 Next Time Linear Regression and Regularization Read Chapter 1.1, 3.1,
[POLS 8500] Review of Linear Algebra, Probability and Information Theory
[POLS 8500] Review of Linear Algebra, Probability and Information Theory Professor Jason Anastasopoulos ljanastas@uga.edu January 12, 2017 For today... Basic linear algebra. Basic probability. Programming
More informationToday. Calculus. Linear Regression. Lagrange Multipliers
Today Calculus Lagrange Multipliers Linear Regression 1 Optimization with constraints What if I want to constrain the parameters of the model. The mean is less than 10 Find the best likelihood, subject
More information01 Probability Theory and Statistics Review
NAVARCH/EECS 568, ROB 530 - Winter 2018 01 Probability Theory and Statistics Review Maani Ghaffari January 08, 2018 Last Time: Bayes Filters Given: Stream of observations z 1:t and action data u 1:t Sensor/measurement
More informationMachine Learning for Large-Scale Data Analysis and Decision Making A. Week #1
Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Week #1 Today Introduction to machine learning The course (syllabus) Math review (probability + linear algebra) The future
More informationThe Multivariate Gaussian Distribution [DRAFT]
The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,
More informationEcon 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines
Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the
More informationGaussian Processes for Machine Learning
Gaussian Processes for Machine Learning Carl Edward Rasmussen Max Planck Institute for Biological Cybernetics Tübingen, Germany carl@tuebingen.mpg.de Carlos III, Madrid, May 2006 The actual science of
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationSome Probability and Statistics
Some Probability and Statistics David M. Blei COS424 Princeton University February 13, 2012 Card problem There are three cards Red/Red Red/Black Black/Black I go through the following process. Close my
More informationBrandon C. Kelly (Harvard Smithsonian Center for Astrophysics)
Brandon C. Kelly (Harvard Smithsonian Center for Astrophysics) Probability quantifies randomness and uncertainty How do I estimate the normalization and logarithmic slope of a X ray continuum, assuming
More informationBasic Concepts in Matrix Algebra
Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1
More informationFinal Exam # 3. Sta 230: Probability. December 16, 2012
Final Exam # 3 Sta 230: Probability December 16, 2012 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use the extra sheets
More informationLecture Note 1: Probability Theory and Statistics
Univ. of Michigan - NAME 568/EECS 568/ROB 530 Winter 2018 Lecture Note 1: Probability Theory and Statistics Lecturer: Maani Ghaffari Jadidi Date: April 6, 2018 For this and all future notes, if you would
More informationMark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.
CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.
More informationMachine Learning for Signal Processing Bayes Classification and Regression
Machine Learning for Signal Processing Bayes Classification and Regression Instructor: Bhiksha Raj 11755/18797 1 Recap: KNN A very effective and simple way of performing classification Simple model: For
More informationMachine Learning Srihari. Probability Theory. Sargur N. Srihari
Probability Theory Sargur N. Srihari srihari@cedar.buffalo.edu 1 Probability Theory with Several Variables Key concept is dealing with uncertainty Due to noise and finite data sets Framework for quantification
More informationPractice Examination # 3
Practice Examination # 3 Sta 23: Probability December 13, 212 This is a closed-book exam so do not refer to your notes, the text, or any other books (please put them on the floor). You may use a single
More informationB4 Estimation and Inference
B4 Estimation and Inference 6 Lectures Hilary Term 27 2 Tutorial Sheets A. Zisserman Overview Lectures 1 & 2: Introduction sensors, and basics of probability density functions for representing sensor error
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More informationIntro to Probability. Andrei Barbu
Intro to Probability Andrei Barbu Some problems Some problems A means to capture uncertainty Some problems A means to capture uncertainty You have data from two sources, are they different? Some problems
More informationStatistical Data Mining and Machine Learning Hilary Term 2016
Statistical Data Mining and Machine Learning Hilary Term 2016 Dino Sejdinovic Department of Statistics Oxford Slides and other materials available at: http://www.stats.ox.ac.uk/~sejdinov/sdmml Naïve Bayes
More informationBayesian Linear Regression [DRAFT - In Progress]
Bayesian Linear Regression [DRAFT - In Progress] David S. Rosenberg Abstract Here we develop some basics of Bayesian linear regression. Most of the calculations for this document come from the basic theory
More informationLecture 1 October 9, 2013
Probabilistic Graphical Models Fall 2013 Lecture 1 October 9, 2013 Lecturer: Guillaume Obozinski Scribe: Huu Dien Khue Le, Robin Bénesse The web page of the course: http://www.di.ens.fr/~fbach/courses/fall2013/
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2
MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is
More informationIntroduction to Machine Learning
Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola Computer Science & Engineering State University of New York at Buffalo Buffalo, NY, USA chandola@buffalo.edu Chandola@UB
More informationIntroduction to Bayesian Learning. Machine Learning Fall 2018
Introduction to Bayesian Learning Machine Learning Fall 2018 1 What we have seen so far What does it mean to learn? Mistake-driven learning Learning by counting (and bounding) number of mistakes PAC learnability
More informationFundamentals. CS 281A: Statistical Learning Theory. Yangqing Jia. August, Based on tutorial slides by Lester Mackey and Ariel Kleiner
Fundamentals CS 281A: Statistical Learning Theory Yangqing Jia Based on tutorial slides by Lester Mackey and Ariel Kleiner August, 2011 Outline 1 Probability 2 Statistics 3 Linear Algebra 4 Optimization
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationChris Bishop s PRML Ch. 8: Graphical Models
Chris Bishop s PRML Ch. 8: Graphical Models January 24, 2008 Introduction Visualize the structure of a probabilistic model Design and motivate new models Insights into the model s properties, in particular
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationVectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =
Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.
More informationIf we want to analyze experimental or simulated data we might encounter the following tasks:
Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction
More informationMachine Learning Srihari. Information Theory. Sargur N. Srihari
Information Theory Sargur N. Srihari 1 Topics 1. Entropy as an Information Measure 1. Discrete variable definition Relationship to Code Length 2. Continuous Variable Differential Entropy 2. Maximum Entropy
More informationPerhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.
Chapter 5 Two Random Variables In a practical engineering problem, there is almost always causal relationship between different events. Some relationships are determined by physical laws, e.g., voltage
More informationProbability and Information Theory. Sargur N. Srihari
Probability and Information Theory Sargur N. srihari@cedar.buffalo.edu 1 Topics in Probability and Information Theory Overview 1. Why Probability? 2. Random Variables 3. Probability Distributions 4. Marginal
More informationLecture 1: Bayesian Framework Basics
Lecture 1: Bayesian Framework Basics Melih Kandemir melih.kandemir@iwr.uni-heidelberg.de April 21, 2014 What is this course about? Building Bayesian machine learning models Performing the inference of
More informationNaïve Bayes classification. p ij 11/15/16. Probability theory. Probability theory. Probability theory. X P (X = x i )=1 i. Marginal Probability
Probability theory Naïve Bayes classification Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. s: A person s height, the outcome of a coin toss Distinguish
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf 2013-14 We know that X ~ B(n,p), but we do not know p. We get a random sample
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationa b = a T b = a i b i (1) i=1 (Geometric definition) The dot product of two Euclidean vectors a and b is defined by a b = a b cos(θ a,b ) (2)
This is my preperation notes for teaching in sections during the winter 2018 quarter for course CSE 446. Useful for myself to review the concepts as well. More Linear Algebra Definition 1.1 (Dot Product).
More informationStatistical Learning Theory
Statistical Learning Theory Part I : Mathematical Learning Theory (1-8) By Sumio Watanabe, Evaluation : Report Part II : Information Statistical Mechanics (9-15) By Yoshiyuki Kabashima, Evaluation : Report
More informationSome Concepts of Probability (Review) Volker Tresp Summer 2018
Some Concepts of Probability (Review) Volker Tresp Summer 2018 1 Definition There are different way to define what a probability stands for Mathematically, the most rigorous definition is based on Kolmogorov
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 143 Part IV
More informationProbabilistic classification CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2016
Probabilistic classification CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2016 Topics Probabilistic approach Bayes decision theory Generative models Gaussian Bayes classifier
More informationProbability. Machine Learning and Pattern Recognition. Chris Williams. School of Informatics, University of Edinburgh. August 2014
Probability Machine Learning and Pattern Recognition Chris Williams School of Informatics, University of Edinburgh August 2014 (All of the slides in this course have been adapted from previous versions
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationSYDE 372 Introduction to Pattern Recognition. Probability Measures for Classification: Part I
SYDE 372 Introduction to Pattern Recognition Probability Measures for Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo Outline 1 2 3 4 Why use probability
More informationIntroduction to Computational Finance and Financial Econometrics Matrix Algebra Review
You can t see this text! Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Matrix Algebra Review 1 / 54 Outline 1
More informationLecture 25: Review. Statistics 104. April 23, Colin Rundel
Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April
More informationLecture 2: Simple Classifiers
CSC 412/2506 Winter 2018 Probabilistic Learning and Reasoning Lecture 2: Simple Classifiers Slides based on Rich Zemel s All lecture slides will be available on the course website: www.cs.toronto.edu/~jessebett/csc412
More informationOR MSc Maths Revision Course
OR MSc Maths Revision Course Tom Byrne School of Mathematics University of Edinburgh t.m.byrne@sms.ed.ac.uk 15 September 2017 General Information Today JCMB Lecture Theatre A, 09:30-12:30 Mathematics revision
More information2. Matrix Algebra and Random Vectors
2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns
More informationIntroduction to Probability and Statistics (Continued)
Introduction to Probability and Statistics (Continued) Prof. icholas Zabaras Center for Informatics and Computational Science https://cics.nd.edu/ University of otre Dame otre Dame, Indiana, USA Email:
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationp(z)
Chapter Statistics. Introduction This lecture is a quick review of basic statistical concepts; probabilities, mean, variance, covariance, correlation, linear regression, probability density functions and
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationMachine Learning. Bayes Basics. Marc Toussaint U Stuttgart. Bayes, probabilities, Bayes theorem & examples
Machine Learning Bayes Basics Bayes, probabilities, Bayes theorem & examples Marc Toussaint U Stuttgart So far: Basic regression & classification methods: Features + Loss + Regularization & CV All kinds
More informationReview of Probability Theory
Review of Probability Theory Arian Maleki and Tom Do Stanford University Probability theory is the study of uncertainty Through this class, we will be relying on concepts from probability theory for deriving
More informationRobots Autónomos. Depto. CCIA. 2. Bayesian Estimation and sensor models. Domingo Gallardo
Robots Autónomos 2. Bayesian Estimation and sensor models Domingo Gallardo Depto. CCIA http://www.rvg.ua.es/master/robots References Recursive State Estimation: Thrun, chapter 2 Sensor models and robot
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 12 Dynamical Models CS/CNS/EE 155 Andreas Krause Homework 3 out tonight Start early!! Announcements Project milestones due today Please email to TAs 2 Parameter learning
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationLet X and Y denote two random variables. The joint distribution of these random
EE385 Class Notes 9/7/0 John Stensby Chapter 3: Multiple Random Variables Let X and Y denote two random variables. The joint distribution of these random variables is defined as F XY(x,y) = [X x,y y] P.
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationIntroduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak
Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak 1 Introduction. Random variables During the course we are interested in reasoning about considered phenomenon. In other words,
More informationReview (Probability & Linear Algebra)
Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationBasic Linear Algebra in MATLAB
Basic Linear Algebra in MATLAB 9.29 Optional Lecture 2 In the last optional lecture we learned the the basic type in MATLAB is a matrix of double precision floating point numbers. You learned a number
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationLecture Notes on the Gaussian Distribution
Lecture Notes on the Gaussian Distribution Hairong Qi The Gaussian distribution is also referred to as the normal distribution or the bell curve distribution for its bell-shaped density curve. There s
More informationLIST OF FORMULAS FOR STK1100 AND STK1110
LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function
More informationChapter 2. Some Basic Probability Concepts. 2.1 Experiments, Outcomes and Random Variables
Chapter 2 Some Basic Probability Concepts 2.1 Experiments, Outcomes and Random Variables A random variable is a variable whose value is unknown until it is observed. The value of a random variable results
More informationDiscrete Mathematics and Probability Theory Fall 2015 Lecture 21
CS 70 Discrete Mathematics and Probability Theory Fall 205 Lecture 2 Inference In this note we revisit the problem of inference: Given some data or observations from the world, what can we infer about
More informationStatistical Machine Learning Lectures 4: Variational Bayes
1 / 29 Statistical Machine Learning Lectures 4: Variational Bayes Melih Kandemir Özyeğin University, İstanbul, Turkey 2 / 29 Synonyms Variational Bayes Variational Inference Variational Bayesian Inference
More informationProbabilistic & Unsupervised Learning
Probabilistic & Unsupervised Learning Week 2: Latent Variable Models Maneesh Sahani maneesh@gatsby.ucl.ac.uk Gatsby Computational Neuroscience Unit, and MSc ML/CSML, Dept Computer Science University College
More informationMathematical foundations. Our goal here is to present the basic results and definitions from linear algebra, Appendix A. A.
Appendix A Mathematical foundations Our goal here is to present the basic results and definitions from linear algebra, probability theory, information theory and computational complexity that serve as
More informationLecture 9: PGM Learning
13 Oct 2014 Intro. to Stats. Machine Learning COMP SCI 4401/7401 Table of Contents I Learning parameters in MRFs 1 Learning parameters in MRFs Inference and Learning Given parameters (of potentials) and
More informationCS37300 Class Notes. Jennifer Neville, Sebastian Moreno, Bruno Ribeiro
CS37300 Class Notes Jennifer Neville, Sebastian Moreno, Bruno Ribeiro 2 Background on Probability and Statistics These are basic definitions, concepts, and equations that should have been covered in your
More informationCOM336: Neural Computing
COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk
More informationMachine Learning for Signal Processing Bayes Classification
Machine Learning for Signal Processing Bayes Classification Class 16. 24 Oct 2017 Instructor: Bhiksha Raj - Abelino Jimenez 11755/18797 1 Recap: KNN A very effective and simple way of performing classification
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationNotes on Mathematics Groups
EPGY Singapore Quantum Mechanics: 2007 Notes on Mathematics Groups A group, G, is defined is a set of elements G and a binary operation on G; one of the elements of G has particularly special properties
More informationPattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions
Pattern Recognition and Machine Learning Chapter 2: Probability Distributions Cécile Amblard Alex Kläser Jakob Verbeek October 11, 27 Probability Distributions: General Density Estimation: given a finite
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Introduction. Basic Probability and Bayes Volkan Cevher, Matthias Seeger Ecole Polytechnique Fédérale de Lausanne 26/9/2011 (EPFL) Graphical Models 26/9/2011 1 / 28 Outline
More informationMA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems
MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions
More informationMathematical foundations - linear algebra
Mathematical foundations - linear algebra Andrea Passerini passerini@disi.unitn.it Machine Learning Vector space Definition (over reals) A set X is called a vector space over IR if addition and scalar
More informationParameter estimation Conditional risk
Parameter estimation Conditional risk Formalizing the problem Specify random variables we care about e.g., Commute Time e.g., Heights of buildings in a city We might then pick a particular distribution
More informationIntroduction to Machine Learning
What does this mean? Outline Contents Introduction to Machine Learning Introduction to Probabilistic Methods Varun Chandola December 26, 2017 1 Introduction to Probability 1 2 Random Variables 3 3 Bayes
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationInf2b Learning and Data
Inf2b Learning and Data Lecture 13: Review (Credit: Hiroshi Shimodaira Iain Murray and Steve Renals) Centre for Speech Technology Research (CSTR) School of Informatics University of Edinburgh http://www.inf.ed.ac.uk/teaching/courses/inf2b/
More informationSUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416)
SUMMARY OF PROBABILITY CONCEPTS SO FAR (SUPPLEMENT FOR MA416) D. ARAPURA This is a summary of the essential material covered so far. The final will be cumulative. I ve also included some review problems
More informationAPPENDIX A. Background Mathematics. A.1 Linear Algebra. Vector algebra. Let x denote the n-dimensional column vector with components x 1 x 2.
APPENDIX A Background Mathematics A. Linear Algebra A.. Vector algebra Let x denote the n-dimensional column vector with components 0 x x 2 B C @. A x n Definition 6 (scalar product). The scalar product
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 (Many figures from C. M. Bishop, "Pattern Recognition and ") 1of 89 Part II
More informationINTRODUCTION TO PATTERN RECOGNITION
INTRODUCTION TO PATTERN RECOGNITION INSTRUCTOR: WEI DING 1 Pattern Recognition Automatic discovery of regularities in data through the use of computer algorithms With the use of these regularities to take
More informationStat 206: Sampling theory, sample moments, mahalanobis
Stat 206: Sampling theory, sample moments, mahalanobis topology James Johndrow (adapted from Iain Johnstone s notes) 2016-11-02 Notation My notation is different from the book s. This is partly because
More informationThe Multivariate Gaussian Distribution
The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance
More informationMachine Learning, Fall 2012 Homework 2
0-60 Machine Learning, Fall 202 Homework 2 Instructors: Tom Mitchell, Ziv Bar-Joseph TA in charge: Selen Uguroglu email: sugurogl@cs.cmu.edu SOLUTIONS Naive Bayes, 20 points Problem. Basic concepts, 0
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 3: Probability, Bayes Theorem, and Bayes Classification Peter Belhumeur Computer Science Columbia University Probability Should you play this game? Game: A fair
More information