A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

Size: px
Start display at page:

Download "A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring"

Transcription

1 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 23:! Nonlinear least squares!! Notes Modeling2015.pdf on course web site (will be posted this tonight)! Chapter 11 in Gregory (Nonlinear model fitting)! Chapter 29 of Mackay (Monte Carlo Methods)! Chapter 12 in Gregory (MCMC)! An Introduction to MCMC for Machine Learning (Andrieu et al. 2003, Machine Learning, 50, 5! Genetic Algorithms: Principles of Natural Selection Applied to Computation (Stephanie Forrest, Science 1993, 261, 872)!! Assignment 4 posted!!!

2 Clustering Algorithms A very general problem is to find groupings or clustering of objects in some space. Clusters are useful for classification, description, discovery, and for learning algorithms. New objects may be identified because they are outliers from clusters defined by prior data. Simple clusters = islands of points. Hierarchical clustering: clusters within clusters. K-means Algorithm: Called K-means because membership in one of K clusters is identified by proximity of individual data points to the cluster mean. The treatment here is an amalgam of Chapter 8 of Introduction to Data Mining, Tan, Steinbach, and Kumar (available on the web at kumar/dmbook/index.php) and Chapters 20 and 21 of Information Theory, Inference, and Learning Algorithms, MacKay (also available on the web, Let there be N objects at locations {x j,j =1,,N} in an L-dimensional space. 1. Specify the number of clusters K and initialize their mean locations, {m k,k =1,,K}, as random vectors in the space. 2. Identify each object with a cluster by calculating the Euclidean distance and finding the nearest cluster. d jk = x j m k 1

3 3. The number of objects associated with each cluster is n(k). 4. Recalculate the mean location of each cluster, m k = 1 N X j:x j ink If a cluster is found to have no associated objects, its mean does not change. 5. Iterate the previous three steps until there is no change in cluster locations and membership. The K-means algorithm always converges but it need not converge to the correct grouping. Issues: 1. All points are treated equally in calculating the mean cluster location, even points that are on the periphery of the cluster. 2. The algorithm does not incorporate any prior shape information. It can be fooled by filamentary shapes. The first issue can be dealt with by calculating a weighted mean m k that weights more strongly the points that are nearest the previous mean. To do so requires imposing a length scale on the cluster, such as using a Gaussian function with some size in L-space. The second issue, elongated clusters, can be dealt with by using elliptical Gaussian functions to generate the weighted cluster means, with the variance on each axis updated along with cluster membership. Examples are given in Chapter 21 of MacKay. x j 2

4 Copyright Cambridge University Press On-screen viewing permitted. Printing not permitted. You can buy this book for 30 pounds or $50. See for links. 20.1: K-means clustering 287 Data: Figure K-means algorithm applied to a data set of 40 points. K = 2 means evolve to stable locations after three iterations. Assignment Update Assignment Update Assignment Update Run 1 Figure K-means algorithm applied to a data set of 40 points. Two separate runs, both with K = 4 means, reach different solutions. Each frame shows a successive assignment step. Run 2 Exercise [4, p.291] See if you can prove that K-means always converges. [Hint: find a physical analogy and an associated Lyapunov function.]

5 Copyright Cambridge University Press On-screen viewing permitted. Printing not permitted. You can buy this book for 30 pounds or $50. See for links An Example Inference Task: Clustering (a) (b) Figure K-means algorithm for a case with two dissimilar clusters. (a) The little n large data. (b) A stable set of assignments and means. Note that four points belonging to the broad cluster have been incorrectly assigned to the narrower cluster. (Points assigned to the right-hand cluster are shown by plus signs.) Figure Two elongated clusters, and the stable solution found by the K-means algorithm. (a) (b) function is provided as part of the problem definition; but I m assuming we are interested in data-modelling rather than vector quantization.] How do we choose K? Having found multiple alternative clusterings for a given K, how can we choose among them? Cases where K-means might be viewed as failing. Further questions arise when we look for cases where the algorithm behaves badly (compared with what the man in the street would call clustering ). Figure 20.5a shows a set of 75 data points generated from a mixture of two Gaussians. The right-hand Gaussian has less weight (only one fifth of the data

6 Nonlinear Least Squares Summary of linear least squares! Features of nonlinear least squares! Tackling the cost-function landscape!

7 Summary of Linear Least Squares Unweighted Least squares, equal uncorrelated errors: Cost function: Q( ) = I = j 2 j. Parameter vector that minimzes Q: ˆ =(X X) 1 X y Covariance matrix for the parameters: P (ˆ )(ˆ ) = 2 X X 1 Cost function dependence on = ˆ + : Q( ) =(y X ) (y X ) =(y Xˆ ) (y Xˆ )+ X X quadraticform 0 The cost function hypersurface is quadratic and has only one minimum.. 27

8 Weighted Least squares: Arbitrary covariance matrix V for : Cost function: Q( ) = V 1 Parameter vector that minimzes Q: ˆ =(X V 1 X) 1 X V 1 y Covariance matrix for the parameters: P (ˆ )(ˆ ) = X V 1 X Cost function dependence on = ˆ +, Q( ): 1 The cost function hypersurface generally has many local minima whereas we want the global minimum. 28

9 Error Ellipses and Confidence Intervals Confidence Intervals for Weighted Least squares: For an arbitrary covariance matrix V for : Cost function: Q( ) = V 1 Parameter vector that minimzes Q: ˆ =(X V 1 X) 1 X V 1 y Covariance matrix for the parameters: What are our goals? P (ˆ )(ˆ ) = X V 1 X 1 1. Given a fit to data, what are the errors on the parameters? 2. Do we know the data errors a priori or not? If not, we need an estimate for the errors. 3. Model comparison: we want to compare models to identify the best one. 29

10 Nonlinear Least Squares So far we have considered linear models of the form y = X +. But often we want to fit models f(x, ) that are nonlinear in the parameters, such as y = f(x, )+, where this vector equation has n elements and is a k-vector. We cannot solve for the best fit to the data in the same way as for the linear model, but the underlying principle is the same: minimize the sum of squares. Thus, we minimize the quadratic form Q( ) = V 1. 45

11 The problem is to find the minimum Q in a k-space where Q is a nonmonotonic function of the parameters. Recall that Q is parabolic for the linear model so for that case, finding any minimum of Q is the same as finding the minimum. With nonlinear functions, there may be an arbitrary number of local minima that can confuse algorithms for finding the nearest minimum of a function. This multiplicity of minima is the bane of the nonlinear LS problem. 46

12 Consider a standard signal + noise data set where n is IID with variance baseline, Cost Function Example: Nonlinear Function y(x) =f(x; ) +n(x) 2 n and the signal is a Gaussian function of x with an additive f(x; ) =a + be (x c)2 /2d 2, which has two linear parameters (a and b) and two nonlinear parameters (c and d). We can generate a realization of y(x) using a set of parameters true =(a, b, c, d) and a realization of noise n(x). Then we can evaluate the cost function vs. comprising ranges of values of a, b, c, d: C( ) = [y(x) ŷ(x)] [y(x) ŷ(x)] (N 4) 2 n where for this example we have divided by the presumed-known noise variance and by the number of degrees of freedom when we specify four parameter values.

13 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = S/N = 5 d = Gaussian Scale Parameter d = Gaussian Scale Parameter Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =0.90 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = S/N = d = Gaussian Scale Parameter d = Gaussian Scale Parameter Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =1.23

14 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = S/N = d = Gaussian Scale Parameter d = Gaussian Scale Parameter c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =1.13 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = 0.05 S/N = Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = d = Gaussian Scale Parameter d = Gaussian Scale Parameter c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =0.96

15 The various strategies for minimizing Q include: 1. A grid search in space: (a brute force approach). There is a dynamic-range problem of searching enough hyper-volume so that the global minimum is found, but with sufficiently fine resolution that the global minimum is not missed. The total number of operations grows rapidly with k, the number of parameters. Let i be the total range searched for the parameter i (one element of the vector ) and with a grid sample interval i. The total number of grid points is N = k i=1 k i. 47

16 2. A ravine search: Use the gradient of Q, dq d, to find the bottom of a particular valley in -space. Choose a length in the direction of the negative gradient, move to a new position, evaluate Q and see if a minimum has been found. If not, iterate. This method clearly finds only the minimum that is nearest to the starting point of the search. This may not be the local minimum unless the starting point has been chosen wisely (or luckily). A hybrid approach would combine ravine search with another, pilot search that has identified the rough location of the global minimum. 48

17 3. Parabolic Extrapolation of Q: Near a minimum, Q may be approximated as a parabolic surface, so expression as such leads to a determination of the minimum. In vector form this is Q = Q min +( Q) ( Q). This can also be written in the form Q = Q min + k Q k min k Q j k j k min j k. Minimizing with respect to the increments, we obtain Q = 0 = Q + Q. This is a k-vector equation that yields corrections guesses 0 which yield Q 0 =(y ŷ ( 0 )) V 1 (y ŷ ( 0 )). to initial 49

18 The Fisher information matrix is related to the quadratic term above: F jk = 1 2 Q 2 j k min and is the inverse of the parameter covariance matrix (in the quadratic approximation).

19 4. Linearization of the fitting function, f( ): Linearize f( ) according to f( ) f 0 ( 0 )+ f( 0 ) ( 0 ). Then the model for the data becomes y f 0 ( 0 )+ f( 0 ) ( 0 )+. where 0 is an initial guess for the parameters. Note that f( ) is implicitly a function of some independent variable(s), as with linear LS. Since the model is linear near the initial guess, one can solve for 0 using the linear LS formalism. Specifically, = X V 1 X 1 X V 1 y, where X is now the n k matrix of values f( 0 ) for the k-dimensional gradient, evaluated at n values of some independent variable (e.g. time, spatial coordinate, frequency, etc.). 51

20 Note that, like methods 2. & 3., linearization of the fitting function also will find only the minimum that is closest to the initial guess for the parameters. 52

21 Optimization Methods We have seen a number of instances where we have wanted to maximize or minimize a function. For least-squares problems, the cases of interest are: 1. Linear Models: Q is concave = a single minimum found through a single iteration of the standard LS solution. 2. Non-linear Models: Q is generally complicated with many local minima. (a) Ravine searches, parabolic extrapolation, linearization of the fitting function are all iterative methods for finding the local minimum near a starting point. There is no guarantee that the global minimum will be found with these methods. (b) Grid Search: can find the global minimum but at the great 53

22 cost of evaluating functions at a large number of locations in -space. Also, with too-coarse sampling, the global minimum can be missed with this method as well. (c) Hill-climbing Method: Essentially the same as (a). (d) Downhill Simplex: This method searches the parameter space, or domain, using a geometrical construct called a simplex, a non-coplanar object with k +1vertices in the k-space. There need not be any computations of derivatives, the method simply changes the shape of the simplex and moves it through the k-space according to values of Q that are encountered at the vertices. It can get stuck in false minima, however, so multiple trials with different starting points should be used. 54

23 (e) Simulated Annealing: Allow trial values of parameters to jump around the domain (i.e. -space) according to a temperaturelike parameter and application of the Metropolis algorithm. This provides the opportunity for exploring the entire domain and not getting stuck in a local minimum. The temperature is lowered slowly as in annealing of metals, where the lattice finds a nice minimum-energy solution for itself. This method has a high probability of at leasting finding the neighborhood of the global minimum. Finding the exact minimum through the annealing process is slow. Hybridizing annealing with a method from A. can find the minimum more quickly. 55

24 (f) Genetic Algorithms (GA s): Search the domain through genetic-like operations. Let the parameter vector be associated with chromosomes made up of genes that each represent a specific parameter. The chromosomes are subject to genetic manipulation between generations (iterations). The main genetic processes are: i. selection according to fitness (defined in terms of a better value of the quantity being optimized, i.e. Q in leastsquares, likelihood function in ML); ii. recombination or crossover: where selected pairs of chromosomes (parameter vectors) interchange genes (bits). iii. mutation: where genes (bits) are randomly flipped according to some probability. This helps organisims from getting stuck in local minima. GA s can search the entire domain efficiently because suc- 56

25 cessful substrings (bit sequences) in the chromosomes ( schema ) grow exponentially according to their fitness relative to the mean fitness. Thus, the genetic approach explores the domain more efficently than a purely random search of the domain (e.g. Monte Carlo selection of parameter values) or a deterministic grid search because the genetic approach includes memory. 57

26 Markov Processes Markov processes are used for modeling as well as in statistical inference problems.! Markov processes are generally n th order:! The current state of a system may depend on n previous states! Most applications consider 1 st order processes! Hidden Markov processes:! A physical system may involve transitions between discrete states, but observables my reflect those states only indirectly (e.g. measurement noise, other physics, etc.)!

27 Markov Chains and Markov Processes Definitions: A Markov process has future samples determined only by the present state and by a transition probability from the present state to a future state. A Markov chain is one that has a countable number of states. Transitions between states are described by an n n stochastic matrix Q with elements q ij comprising the probabilities for changing in a single time step from state s i to state s j with i, j = 1,...,n. The state probability vector P has elements comprising the ensemble probability of finding the system in each state. E.g. for a three-state system: States = {s 1,s 2,,s n }, Q = q 11 q 12 q 13 q 21 q 22 q 23 q 31 q 32 q 33. Normalization across a row is j q ij =1since the system must be in some state at any time. In a single time step the probability for staying in the i th state is the metastability q ii and the probability for residing in that state for a time T is proportional to q T ii. 1

28 Two-state Markov Processes

29 The probability density function (PDF) for the duration of a given state is therefore a geometric series that sums to f T (T )=Ti 1 1 Ti 1 T 1, T =1, 2,, (1) with mean and rms values T i =(1 q ii ) 1, T i /T i = q ii. (2) Asymptotic behavior as the number of steps : The transition matrix after t steps is Q t. Under the reasonable assumptions that all elements of Q are non-negative and that all states are accessible in a finite number of steps, Q t converges to a steady-state form Q as t that has identical rows. Each row of Q is equal to the state probability vector P, the elements of which are the probabilities that a given time sample is in a particular state. P also equals the normalized left eigenvector of Q that has unity eigenvalue, i.e. PQ = P (e.g. Papoulis). For P to exist, the determinant det(q I) = 0(where I is the identity matrix), but this is automatically satisfied for a stochastic matrix corresponding to a stationary process. Convergence of Q t to a matrix with identical rows implies that the transition probabilities trend to those appropriate for an i.i.d. process when the time step t is much larger than the mean lifetimes T i of any of the states. For a two-state system P has elements p 1 =(1 q 22 )/(2 q 11 q 22 ) and p 2 =1 P 1. 2

30 Utility of Markov processes: 1. Modeling: Many processes in the lab and in nature are consistent with being Markov chains. The key elements are a set of discrete states and transitions that are random but are according to a transition matrix. 2. Sampling: A Markov chain can define a trajectory in the relevant space which can be used to randomly but efficiently sample the space. The key aspect of Markov Chain Monte Carlo is that the trajectory conforms statistically to the asymptotic form of the transition matrix. 3

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov

More information

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation

More information

Signal Modeling, Statistical Inference and Data Mining in Astrophysics

Signal Modeling, Statistical Inference and Data Mining in Astrophysics ASTRONOMY 6523 Spring 2013 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Course Approach The philosophy of the course reflects that of the instructor, who takes a dualistic view

More information

Stat 516, Homework 1

Stat 516, Homework 1 Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball

More information

Introduction to Optimization

Introduction to Optimization Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)

More information

16 : Markov Chain Monte Carlo (MCMC)

16 : Markov Chain Monte Carlo (MCMC) 10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University

A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;

More information

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning

Clustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades

More information

Algorithmisches Lernen/Machine Learning

Algorithmisches Lernen/Machine Learning Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines

More information

Neural Network Training

Neural Network Training Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification

More information

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore

More information

A graph contains a set of nodes (vertices) connected by links (edges or arcs)

A graph contains a set of nodes (vertices) connected by links (edges or arcs) BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,

More information

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf 1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

STA 4273H: Sta-s-cal Machine Learning

STA 4273H: Sta-s-cal Machine Learning STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our

More information

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm

Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability

More information

Markov Chains and MCMC

Markov Chains and MCMC Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time

More information

Statistical Data Analysis Stat 3: p-values, parameter estimation

Statistical Data Analysis Stat 3: p-values, parameter estimation Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,

More information

Local Search & Optimization

Local Search & Optimization Local Search & Optimization CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 4 Outline

More information

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM

6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM 6. Application to the Traveling Salesman Problem 92 6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM The properties that have the most significant influence on the maps constructed by Kohonen s algorithm

More information

Bayesian Methods for Machine Learning

Bayesian Methods for Machine Learning Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),

More information

Lecture Notes: Geometric Considerations in Unconstrained Optimization

Lecture Notes: Geometric Considerations in Unconstrained Optimization Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far

More information

Computer simulation can be thought as a virtual experiment. There are

Computer simulation can be thought as a virtual experiment. There are 13 Estimating errors 105 13 Estimating errors Computer simulation can be thought as a virtual experiment. There are systematic errors statistical errors. As systematical errors can be very complicated

More information

Bayesian Networks Inference with Probabilistic Graphical Models

Bayesian Networks Inference with Probabilistic Graphical Models 4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Recent Advances in Bayesian Inference Techniques

Recent Advances in Bayesian Inference Techniques Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian

More information

Nonparametric Bayesian Methods (Gaussian Processes)

Nonparametric Bayesian Methods (Gaussian Processes) [70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent

More information

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics

Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 25 http://www.astro.cornell.edu/~cordes/a6523 Lecture 25:! Markov Processes and Markov Chain Monte Carlo!! Chapter 29

More information

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions

Lecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K

More information

Expectation maximization

Expectation maximization Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is

More information

Local Search & Optimization

Local Search & Optimization Local Search & Optimization CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 4 Some

More information

Development of Stochastic Artificial Neural Networks for Hydrological Prediction

Development of Stochastic Artificial Neural Networks for Hydrological Prediction Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental

More information

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.

Clustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014. Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)

More information

Final Exam, Machine Learning, Spring 2009

Final Exam, Machine Learning, Spring 2009 Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3

More information

Pattern Recognition. Parameter Estimation of Probability Density Functions

Pattern Recognition. Parameter Estimation of Probability Density Functions Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The

More information

Introduction to Machine Learning CMU-10701

Introduction to Machine Learning CMU-10701 Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains

More information

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky

Monte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky Monte Carlo Lecture 15 4/9/18 1 Sampling with dynamics In Molecular Dynamics we simulate evolution of a system over time according to Newton s equations, conserving energy Averages (thermodynamic properties)

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

1 Geometry of high dimensional probability distributions

1 Geometry of high dimensional probability distributions Hamiltonian Monte Carlo October 20, 2018 Debdeep Pati References: Neal, Radford M. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2.11 (2011): 2. Betancourt, Michael. A conceptual

More information

Summary of Linear Least Squares Problem. Non-Linear Least-Squares Problems

Summary of Linear Least Squares Problem. Non-Linear Least-Squares Problems Lecture 7: Fitting Observed Data 2 Outline 1 Summary of Linear Least Squares Problem 2 Singular Value Decomposition 3 Non-Linear Least-Squares Problems 4 Levenberg-Marquart Algorithm 5 Errors with Poisson

More information

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis

Lecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,

More information

Brief introduction to Markov Chain Monte Carlo

Brief introduction to Markov Chain Monte Carlo Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical

More information

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.

The Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X. Heuristic Search The Story So Far... The central problem of this course: arg max X Smartness( X ) Possibly with some constraints on X. (Alternatively: arg min Stupidness(X ) ) X Properties of Smartness(X)

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006

Data Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006 Astronomical p( y x, I) p( x, I) p ( x y, I) = p( y, I) Data Analysis I Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK 10 lectures, beginning October 2006 4. Monte Carlo Methods

More information

Lecture 7 and 8: Markov Chain Monte Carlo

Lecture 7 and 8: Markov Chain Monte Carlo Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani

More information

Ch 4. Linear Models for Classification

Ch 4. Linear Models for Classification Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018

Markov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018 Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling

More information

Reliability Theory of Dynamically Loaded Structures (cont.)

Reliability Theory of Dynamically Loaded Structures (cont.) Outline of Reliability Theory of Dynamically Loaded Structures (cont.) Probability Density Function of Local Maxima in a Stationary Gaussian Process. Distribution of Extreme Values. Monte Carlo Simulation

More information

Lecture 16 Deep Neural Generative Models

Lecture 16 Deep Neural Generative Models Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed

More information

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait

A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute

More information

MCMC: Markov Chain Monte Carlo

MCMC: Markov Chain Monte Carlo I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov

More information

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that

More information

6 Markov Chain Monte Carlo (MCMC)

6 Markov Chain Monte Carlo (MCMC) 6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

Lecture 4: Probabilistic Learning

Lecture 4: Probabilistic Learning DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods

More information

Simulated Annealing for Constrained Global Optimization

Simulated Annealing for Constrained Global Optimization Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets

Neural Networks for Machine Learning. Lecture 11a Hopfield Nets Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold

More information

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative

More information

Lecture : Probabilistic Machine Learning

Lecture : Probabilistic Machine Learning Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning

More information

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015

ECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015 Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

Finite-Horizon Statistics for Markov chains

Finite-Horizon Statistics for Markov chains Analyzing FSDT Markov chains Friday, September 30, 2011 2:03 PM Simulating FSDT Markov chains, as we have said is very straightforward, either by using probability transition matrix or stochastic update

More information

Adaptive Algorithms for Blind Source Separation

Adaptive Algorithms for Blind Source Separation Adaptive Algorithms for Blind Source Separation George Moustakides Department of Computer Engineering and Informatics UNIVERSITY OF PATRAS, GREECE Outline of the Presentation Problem definition Existing

More information

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm

IEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel

The Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.

More information

Convex Optimization CMU-10725

Convex Optimization CMU-10725 Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state

More information

In the Name of God. Lecture 11: Single Layer Perceptrons

In the Name of God. Lecture 11: Single Layer Perceptrons 1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just

More information

Lecture 14. Clustering, K-means, and EM

Lecture 14. Clustering, K-means, and EM Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

Unsupervised machine learning

Unsupervised machine learning Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels

More information

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics

STA414/2104. Lecture 11: Gaussian Processes. Department of Statistics STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations

More information

Genetic Algorithms and Genetic Programming Lecture 17

Genetic Algorithms and Genetic Programming Lecture 17 Genetic Algorithms and Genetic Programming Lecture 17 Gillian Hayes 28th November 2006 Selection Revisited 1 Selection and Selection Pressure The Killer Instinct Memetic Algorithms Selection and Schemas

More information

Machine Learning and Adaptive Systems. Lectures 3 & 4

Machine Learning and Adaptive Systems. Lectures 3 & 4 ECE656- Lectures 3 & 4, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 What is Learning? General Definition of Learning: Any change in the behavior or performance

More information

Results: MCMC Dancers, q=10, n=500

Results: MCMC Dancers, q=10, n=500 Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

Report due date. Please note: report has to be handed in by Monday, May 16 noon.

Report due date. Please note: report has to be handed in by Monday, May 16 noon. Lecture 23 18.86 Report due date Please note: report has to be handed in by Monday, May 16 noon. Course evaluation: Please submit your feedback (you should have gotten an email) From the weak form to finite

More information

MSA220 Statistical Learning for Big Data

MSA220 Statistical Learning for Big Data MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate

More information

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles

Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To

More information

Data Warehousing & Data Mining

Data Warehousing & Data Mining 13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.

More information

Stochastic Proximal Gradient Algorithm

Stochastic Proximal Gradient Algorithm Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind

More information

Ad Placement Strategies

Ad Placement Strategies Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January

More information

Organization. I MCMC discussion. I project talks. I Lecture.

Organization. I MCMC discussion. I project talks. I Lecture. Organization I MCMC discussion I project talks. I Lecture. Content I Uncertainty Propagation Overview I Forward-Backward with an Ensemble I Model Reduction (Intro) Uncertainty Propagation in Causal Systems

More information

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

Last updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information