A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring
|
|
- Brian Clarke
- 5 years ago
- Views:
Transcription
1 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring Lecture 23:! Nonlinear least squares!! Notes Modeling2015.pdf on course web site (will be posted this tonight)! Chapter 11 in Gregory (Nonlinear model fitting)! Chapter 29 of Mackay (Monte Carlo Methods)! Chapter 12 in Gregory (MCMC)! An Introduction to MCMC for Machine Learning (Andrieu et al. 2003, Machine Learning, 50, 5! Genetic Algorithms: Principles of Natural Selection Applied to Computation (Stephanie Forrest, Science 1993, 261, 872)!! Assignment 4 posted!!!
2 Clustering Algorithms A very general problem is to find groupings or clustering of objects in some space. Clusters are useful for classification, description, discovery, and for learning algorithms. New objects may be identified because they are outliers from clusters defined by prior data. Simple clusters = islands of points. Hierarchical clustering: clusters within clusters. K-means Algorithm: Called K-means because membership in one of K clusters is identified by proximity of individual data points to the cluster mean. The treatment here is an amalgam of Chapter 8 of Introduction to Data Mining, Tan, Steinbach, and Kumar (available on the web at kumar/dmbook/index.php) and Chapters 20 and 21 of Information Theory, Inference, and Learning Algorithms, MacKay (also available on the web, Let there be N objects at locations {x j,j =1,,N} in an L-dimensional space. 1. Specify the number of clusters K and initialize their mean locations, {m k,k =1,,K}, as random vectors in the space. 2. Identify each object with a cluster by calculating the Euclidean distance and finding the nearest cluster. d jk = x j m k 1
3 3. The number of objects associated with each cluster is n(k). 4. Recalculate the mean location of each cluster, m k = 1 N X j:x j ink If a cluster is found to have no associated objects, its mean does not change. 5. Iterate the previous three steps until there is no change in cluster locations and membership. The K-means algorithm always converges but it need not converge to the correct grouping. Issues: 1. All points are treated equally in calculating the mean cluster location, even points that are on the periphery of the cluster. 2. The algorithm does not incorporate any prior shape information. It can be fooled by filamentary shapes. The first issue can be dealt with by calculating a weighted mean m k that weights more strongly the points that are nearest the previous mean. To do so requires imposing a length scale on the cluster, such as using a Gaussian function with some size in L-space. The second issue, elongated clusters, can be dealt with by using elliptical Gaussian functions to generate the weighted cluster means, with the variance on each axis updated along with cluster membership. Examples are given in Chapter 21 of MacKay. x j 2
4 Copyright Cambridge University Press On-screen viewing permitted. Printing not permitted. You can buy this book for 30 pounds or $50. See for links. 20.1: K-means clustering 287 Data: Figure K-means algorithm applied to a data set of 40 points. K = 2 means evolve to stable locations after three iterations. Assignment Update Assignment Update Assignment Update Run 1 Figure K-means algorithm applied to a data set of 40 points. Two separate runs, both with K = 4 means, reach different solutions. Each frame shows a successive assignment step. Run 2 Exercise [4, p.291] See if you can prove that K-means always converges. [Hint: find a physical analogy and an associated Lyapunov function.]
5 Copyright Cambridge University Press On-screen viewing permitted. Printing not permitted. You can buy this book for 30 pounds or $50. See for links An Example Inference Task: Clustering (a) (b) Figure K-means algorithm for a case with two dissimilar clusters. (a) The little n large data. (b) A stable set of assignments and means. Note that four points belonging to the broad cluster have been incorrectly assigned to the narrower cluster. (Points assigned to the right-hand cluster are shown by plus signs.) Figure Two elongated clusters, and the stable solution found by the K-means algorithm. (a) (b) function is provided as part of the problem definition; but I m assuming we are interested in data-modelling rather than vector quantization.] How do we choose K? Having found multiple alternative clusterings for a given K, how can we choose among them? Cases where K-means might be viewed as failing. Further questions arise when we look for cases where the algorithm behaves badly (compared with what the man in the street would call clustering ). Figure 20.5a shows a set of 75 data points generated from a mixture of two Gaussians. The right-hand Gaussian has less weight (only one fifth of the data
6 Nonlinear Least Squares Summary of linear least squares! Features of nonlinear least squares! Tackling the cost-function landscape!
7 Summary of Linear Least Squares Unweighted Least squares, equal uncorrelated errors: Cost function: Q( ) = I = j 2 j. Parameter vector that minimzes Q: ˆ =(X X) 1 X y Covariance matrix for the parameters: P (ˆ )(ˆ ) = 2 X X 1 Cost function dependence on = ˆ + : Q( ) =(y X ) (y X ) =(y Xˆ ) (y Xˆ )+ X X quadraticform 0 The cost function hypersurface is quadratic and has only one minimum.. 27
8 Weighted Least squares: Arbitrary covariance matrix V for : Cost function: Q( ) = V 1 Parameter vector that minimzes Q: ˆ =(X V 1 X) 1 X V 1 y Covariance matrix for the parameters: P (ˆ )(ˆ ) = X V 1 X Cost function dependence on = ˆ +, Q( ): 1 The cost function hypersurface generally has many local minima whereas we want the global minimum. 28
9 Error Ellipses and Confidence Intervals Confidence Intervals for Weighted Least squares: For an arbitrary covariance matrix V for : Cost function: Q( ) = V 1 Parameter vector that minimzes Q: ˆ =(X V 1 X) 1 X V 1 y Covariance matrix for the parameters: What are our goals? P (ˆ )(ˆ ) = X V 1 X 1 1. Given a fit to data, what are the errors on the parameters? 2. Do we know the data errors a priori or not? If not, we need an estimate for the errors. 3. Model comparison: we want to compare models to identify the best one. 29
10 Nonlinear Least Squares So far we have considered linear models of the form y = X +. But often we want to fit models f(x, ) that are nonlinear in the parameters, such as y = f(x, )+, where this vector equation has n elements and is a k-vector. We cannot solve for the best fit to the data in the same way as for the linear model, but the underlying principle is the same: minimize the sum of squares. Thus, we minimize the quadratic form Q( ) = V 1. 45
11 The problem is to find the minimum Q in a k-space where Q is a nonmonotonic function of the parameters. Recall that Q is parabolic for the linear model so for that case, finding any minimum of Q is the same as finding the minimum. With nonlinear functions, there may be an arbitrary number of local minima that can confuse algorithms for finding the nearest minimum of a function. This multiplicity of minima is the bane of the nonlinear LS problem. 46
12 Consider a standard signal + noise data set where n is IID with variance baseline, Cost Function Example: Nonlinear Function y(x) =f(x; ) +n(x) 2 n and the signal is a Gaussian function of x with an additive f(x; ) =a + be (x c)2 /2d 2, which has two linear parameters (a and b) and two nonlinear parameters (c and d). We can generate a realization of y(x) using a set of parameters true =(a, b, c, d) and a realization of noise n(x). Then we can evaluate the cost function vs. comprising ranges of values of a, b, c, d: C( ) = [y(x) ŷ(x)] [y(x) ŷ(x)] (N 4) 2 n where for this example we have divided by the presumed-known noise variance and by the number of degrees of freedom when we specify four parameter values.
13 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = S/N = 5 d = Gaussian Scale Parameter d = Gaussian Scale Parameter Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =0.90 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = S/N = d = Gaussian Scale Parameter d = Gaussian Scale Parameter Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =1.23
14 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = S/N = d = Gaussian Scale Parameter d = Gaussian Scale Parameter c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =1.13 Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = S/N = 0.05 S/N = Gaussian shape: a,b,c,d = 0.10, 1.00, 2.30, 0.73 S/N = d = Gaussian Scale Parameter d = Gaussian Scale Parameter c = Gaussian Location Parameter Q min = c = Gaussian Location Parameter Q min =0.96
15 The various strategies for minimizing Q include: 1. A grid search in space: (a brute force approach). There is a dynamic-range problem of searching enough hyper-volume so that the global minimum is found, but with sufficiently fine resolution that the global minimum is not missed. The total number of operations grows rapidly with k, the number of parameters. Let i be the total range searched for the parameter i (one element of the vector ) and with a grid sample interval i. The total number of grid points is N = k i=1 k i. 47
16 2. A ravine search: Use the gradient of Q, dq d, to find the bottom of a particular valley in -space. Choose a length in the direction of the negative gradient, move to a new position, evaluate Q and see if a minimum has been found. If not, iterate. This method clearly finds only the minimum that is nearest to the starting point of the search. This may not be the local minimum unless the starting point has been chosen wisely (or luckily). A hybrid approach would combine ravine search with another, pilot search that has identified the rough location of the global minimum. 48
17 3. Parabolic Extrapolation of Q: Near a minimum, Q may be approximated as a parabolic surface, so expression as such leads to a determination of the minimum. In vector form this is Q = Q min +( Q) ( Q). This can also be written in the form Q = Q min + k Q k min k Q j k j k min j k. Minimizing with respect to the increments, we obtain Q = 0 = Q + Q. This is a k-vector equation that yields corrections guesses 0 which yield Q 0 =(y ŷ ( 0 )) V 1 (y ŷ ( 0 )). to initial 49
18 The Fisher information matrix is related to the quadratic term above: F jk = 1 2 Q 2 j k min and is the inverse of the parameter covariance matrix (in the quadratic approximation).
19 4. Linearization of the fitting function, f( ): Linearize f( ) according to f( ) f 0 ( 0 )+ f( 0 ) ( 0 ). Then the model for the data becomes y f 0 ( 0 )+ f( 0 ) ( 0 )+. where 0 is an initial guess for the parameters. Note that f( ) is implicitly a function of some independent variable(s), as with linear LS. Since the model is linear near the initial guess, one can solve for 0 using the linear LS formalism. Specifically, = X V 1 X 1 X V 1 y, where X is now the n k matrix of values f( 0 ) for the k-dimensional gradient, evaluated at n values of some independent variable (e.g. time, spatial coordinate, frequency, etc.). 51
20 Note that, like methods 2. & 3., linearization of the fitting function also will find only the minimum that is closest to the initial guess for the parameters. 52
21 Optimization Methods We have seen a number of instances where we have wanted to maximize or minimize a function. For least-squares problems, the cases of interest are: 1. Linear Models: Q is concave = a single minimum found through a single iteration of the standard LS solution. 2. Non-linear Models: Q is generally complicated with many local minima. (a) Ravine searches, parabolic extrapolation, linearization of the fitting function are all iterative methods for finding the local minimum near a starting point. There is no guarantee that the global minimum will be found with these methods. (b) Grid Search: can find the global minimum but at the great 53
22 cost of evaluating functions at a large number of locations in -space. Also, with too-coarse sampling, the global minimum can be missed with this method as well. (c) Hill-climbing Method: Essentially the same as (a). (d) Downhill Simplex: This method searches the parameter space, or domain, using a geometrical construct called a simplex, a non-coplanar object with k +1vertices in the k-space. There need not be any computations of derivatives, the method simply changes the shape of the simplex and moves it through the k-space according to values of Q that are encountered at the vertices. It can get stuck in false minima, however, so multiple trials with different starting points should be used. 54
23 (e) Simulated Annealing: Allow trial values of parameters to jump around the domain (i.e. -space) according to a temperaturelike parameter and application of the Metropolis algorithm. This provides the opportunity for exploring the entire domain and not getting stuck in a local minimum. The temperature is lowered slowly as in annealing of metals, where the lattice finds a nice minimum-energy solution for itself. This method has a high probability of at leasting finding the neighborhood of the global minimum. Finding the exact minimum through the annealing process is slow. Hybridizing annealing with a method from A. can find the minimum more quickly. 55
24 (f) Genetic Algorithms (GA s): Search the domain through genetic-like operations. Let the parameter vector be associated with chromosomes made up of genes that each represent a specific parameter. The chromosomes are subject to genetic manipulation between generations (iterations). The main genetic processes are: i. selection according to fitness (defined in terms of a better value of the quantity being optimized, i.e. Q in leastsquares, likelihood function in ML); ii. recombination or crossover: where selected pairs of chromosomes (parameter vectors) interchange genes (bits). iii. mutation: where genes (bits) are randomly flipped according to some probability. This helps organisims from getting stuck in local minima. GA s can search the entire domain efficiently because suc- 56
25 cessful substrings (bit sequences) in the chromosomes ( schema ) grow exponentially according to their fitness relative to the mean fitness. Thus, the genetic approach explores the domain more efficently than a purely random search of the domain (e.g. Monte Carlo selection of parameter values) or a deterministic grid search because the genetic approach includes memory. 57
26 Markov Processes Markov processes are used for modeling as well as in statistical inference problems.! Markov processes are generally n th order:! The current state of a system may depend on n previous states! Most applications consider 1 st order processes! Hidden Markov processes:! A physical system may involve transitions between discrete states, but observables my reflect those states only indirectly (e.g. measurement noise, other physics, etc.)!
27 Markov Chains and Markov Processes Definitions: A Markov process has future samples determined only by the present state and by a transition probability from the present state to a future state. A Markov chain is one that has a countable number of states. Transitions between states are described by an n n stochastic matrix Q with elements q ij comprising the probabilities for changing in a single time step from state s i to state s j with i, j = 1,...,n. The state probability vector P has elements comprising the ensemble probability of finding the system in each state. E.g. for a three-state system: States = {s 1,s 2,,s n }, Q = q 11 q 12 q 13 q 21 q 22 q 23 q 31 q 32 q 33. Normalization across a row is j q ij =1since the system must be in some state at any time. In a single time step the probability for staying in the i th state is the metastability q ii and the probability for residing in that state for a time T is proportional to q T ii. 1
28 Two-state Markov Processes
29 The probability density function (PDF) for the duration of a given state is therefore a geometric series that sums to f T (T )=Ti 1 1 Ti 1 T 1, T =1, 2,, (1) with mean and rms values T i =(1 q ii ) 1, T i /T i = q ii. (2) Asymptotic behavior as the number of steps : The transition matrix after t steps is Q t. Under the reasonable assumptions that all elements of Q are non-negative and that all states are accessible in a finite number of steps, Q t converges to a steady-state form Q as t that has identical rows. Each row of Q is equal to the state probability vector P, the elements of which are the probabilities that a given time sample is in a particular state. P also equals the normalized left eigenvector of Q that has unity eigenvalue, i.e. PQ = P (e.g. Papoulis). For P to exist, the determinant det(q I) = 0(where I is the identity matrix), but this is automatically satisfied for a stochastic matrix corresponding to a stationary process. Convergence of Q t to a matrix with identical rows implies that the transition probabilities trend to those appropriate for an i.i.d. process when the time step t is much larger than the mean lifetimes T i of any of the states. For a two-state system P has elements p 1 =(1 q 22 )/(2 q 11 q 22 ) and p 2 =1 P 1. 2
30 Utility of Markov processes: 1. Modeling: Many processes in the lab and in nature are consistent with being Markov chains. The key elements are a set of discrete states and transitions that are random but are according to a transition matrix. 2. Sampling: A Markov chain can define a trajectory in the relevant space which can be used to randomly but efficiently sample the space. The key aspect of Markov Chain Monte Carlo is that the trajectory conforms statistically to the asymptotic form of the transition matrix. 3
Introduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos & Aarti Singh Contents Markov Chain Monte Carlo Methods Goal & Motivation Sampling Rejection Importance Markov
More informationPhysics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester
Physics 403 Numerical Methods, Maximum Likelihood, and Least Squares Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Quadratic Approximation
More informationSignal Modeling, Statistical Inference and Data Mining in Astrophysics
ASTRONOMY 6523 Spring 2013 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Course Approach The philosophy of the course reflects that of the instructor, who takes a dualistic view
More informationStat 516, Homework 1
Stat 516, Homework 1 Due date: October 7 1. Consider an urn with n distinct balls numbered 1,..., n. We sample balls from the urn with replacement. Let N be the number of draws until we encounter a ball
More informationIntroduction to Optimization
Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)
More information16 : Markov Chain Monte Carlo (MCMC)
10-708: Probabilistic Graphical Models 10-708, Spring 2014 16 : Markov Chain Monte Carlo MCMC Lecturer: Matthew Gormley Scribes: Yining Wang, Renato Negrinho 1 Sampling from low-dimensional distributions
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationA6523 Modeling, Inference, and Mining Jim Cordes, Cornell University
A6523 Modeling, Inference, and Mining Jim Cordes, Cornell University Lecture 19 Modeling Topics plan: Modeling (linear/non- linear least squares) Bayesian inference Bayesian approaches to spectral esbmabon;
More informationClustering K-means. Machine Learning CSE546. Sham Kakade University of Washington. November 15, Review: PCA Start: unsupervised learning
Clustering K-means Machine Learning CSE546 Sham Kakade University of Washington November 15, 2016 1 Announcements: Project Milestones due date passed. HW3 due on Monday It ll be collaborative HW2 grades
More informationAlgorithmisches Lernen/Machine Learning
Algorithmisches Lernen/Machine Learning Part 1: Stefan Wermter Introduction Connectionist Learning (e.g. Neural Networks) Decision-Trees, Genetic Algorithms Part 2: Norman Hendrich Support-Vector Machines
More informationNeural Network Training
Neural Network Training Sargur Srihari Topics in Network Training 0. Neural network parameters Probabilistic problem formulation Specifying the activation and error functions for Regression Binary classification
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationA graph contains a set of nodes (vertices) connected by links (edges or arcs)
BOLTZMANN MACHINES Generative Models Graphical Models A graph contains a set of nodes (vertices) connected by links (edges or arcs) In a probabilistic graphical model, each node represents a random variable,
More informationIntroduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Lior Wolf
1 Introduction to Machine Learning Maximum Likelihood and Bayesian Inference Lecturers: Eran Halperin, Lior Wolf 2014-15 We know that X ~ B(n,p), but we do not know p. We get a random sample from X, a
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationSTA 4273H: Sta-s-cal Machine Learning
STA 4273H: Sta-s-cal Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 2 In our
More informationMarkov Chain Monte Carlo The Metropolis-Hastings Algorithm
Markov Chain Monte Carlo The Metropolis-Hastings Algorithm Anthony Trubiano April 11th, 2018 1 Introduction Markov Chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability
More informationMarkov Chains and MCMC
Markov Chains and MCMC Markov chains Let S = {1, 2,..., N} be a finite set consisting of N states. A Markov chain Y 0, Y 1, Y 2,... is a sequence of random variables, with Y t S for all points in time
More informationStatistical Data Analysis Stat 3: p-values, parameter estimation
Statistical Data Analysis Stat 3: p-values, parameter estimation London Postgraduate Lectures on Particle Physics; University of London MSci course PH4515 Glen Cowan Physics Department Royal Holloway,
More informationLocal Search & Optimization
Local Search & Optimization CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2017 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 4 Outline
More information6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM
6. Application to the Traveling Salesman Problem 92 6. APPLICATION TO THE TRAVELING SALESMAN PROBLEM The properties that have the most significant influence on the maps constructed by Kohonen s algorithm
More informationBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning CS 584: Big Data Analytics Material adapted from Radford Neal s tutorial (http://ftp.cs.utoronto.ca/pub/radford/bayes-tut.pdf), Zoubin Ghahramni (http://hunch.net/~coms-4771/zoubin_ghahramani_bayesian_learning.pdf),
More informationLecture Notes: Geometric Considerations in Unconstrained Optimization
Lecture Notes: Geometric Considerations in Unconstrained Optimization James T. Allison February 15, 2006 The primary objectives of this lecture on unconstrained optimization are to: Establish connections
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationSTA 414/2104: Machine Learning
STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 9 Sequential Data So far
More informationComputer simulation can be thought as a virtual experiment. There are
13 Estimating errors 105 13 Estimating errors Computer simulation can be thought as a virtual experiment. There are systematic errors statistical errors. As systematical errors can be very complicated
More informationBayesian Networks Inference with Probabilistic Graphical Models
4190.408 2016-Spring Bayesian Networks Inference with Probabilistic Graphical Models Byoung-Tak Zhang intelligence Lab Seoul National University 4190.408 Artificial (2016-Spring) 1 Machine Learning? Learning
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationRecent Advances in Bayesian Inference Techniques
Recent Advances in Bayesian Inference Techniques Christopher M. Bishop Microsoft Research, Cambridge, U.K. research.microsoft.com/~cmbishop SIAM Conference on Data Mining, April 2004 Abstract Bayesian
More informationNonparametric Bayesian Methods (Gaussian Processes)
[70240413 Statistical Machine Learning, Spring, 2015] Nonparametric Bayesian Methods (Gaussian Processes) Jun Zhu dcszj@mail.tsinghua.edu.cn http://bigml.cs.tsinghua.edu.cn/~jun State Key Lab of Intelligent
More informationMinicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics
Minicourse on: Markov Chain Monte Carlo: Simulation Techniques in Statistics Eric Slud, Statistics Program Lecture 1: Metropolis-Hastings Algorithm, plus background in Simulation and Markov Chains. Lecture
More informationA6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring
A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 25 http://www.astro.cornell.edu/~cordes/a6523 Lecture 25:! Markov Processes and Markov Chain Monte Carlo!! Chapter 29
More informationLecture 4: Probabilistic Learning. Estimation Theory. Classification with Probability Distributions
DD2431 Autumn, 2014 1 2 3 Classification with Probability Distributions Estimation Theory Classification in the last lecture we assumed we new: P(y) Prior P(x y) Lielihood x2 x features y {ω 1,..., ω K
More informationExpectation maximization
Expectation maximization Subhransu Maji CMSCI 689: Machine Learning 14 April 2015 Motivation Suppose you are building a naive Bayes spam classifier. After your are done your boss tells you that there is
More informationLocal Search & Optimization
Local Search & Optimization CE417: Introduction to Artificial Intelligence Sharif University of Technology Spring 2018 Soleymani Artificial Intelligence: A Modern Approach, 3 rd Edition, Chapter 4 Some
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More informationClustering K-means. Clustering images. Machine Learning CSE546 Carlos Guestrin University of Washington. November 4, 2014.
Clustering K-means Machine Learning CSE546 Carlos Guestrin University of Washington November 4, 2014 1 Clustering images Set of Images [Goldberger et al.] 2 1 K-means Randomly initialize k centers µ (0)
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationPattern Recognition. Parameter Estimation of Probability Density Functions
Pattern Recognition Parameter Estimation of Probability Density Functions Classification Problem (Review) The classification problem is to assign an arbitrary feature vector x F to one of c classes. The
More informationIntroduction to Machine Learning CMU-10701
Introduction to Machine Learning CMU-10701 Markov Chain Monte Carlo Methods Barnabás Póczos Contents Markov Chain Monte Carlo Methods Sampling Rejection Importance Hastings-Metropolis Gibbs Markov Chains
More informationMonte Carlo. Lecture 15 4/9/18. Harvard SEAS AP 275 Atomistic Modeling of Materials Boris Kozinsky
Monte Carlo Lecture 15 4/9/18 1 Sampling with dynamics In Molecular Dynamics we simulate evolution of a system over time according to Newton s equations, conserving energy Averages (thermodynamic properties)
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More information1 Geometry of high dimensional probability distributions
Hamiltonian Monte Carlo October 20, 2018 Debdeep Pati References: Neal, Radford M. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo 2.11 (2011): 2. Betancourt, Michael. A conceptual
More informationSummary of Linear Least Squares Problem. Non-Linear Least-Squares Problems
Lecture 7: Fitting Observed Data 2 Outline 1 Summary of Linear Least Squares Problem 2 Singular Value Decomposition 3 Non-Linear Least-Squares Problems 4 Levenberg-Marquart Algorithm 5 Errors with Poisson
More informationLecture 3. G. Cowan. Lecture 3 page 1. Lectures on Statistical Data Analysis
Lecture 3 1 Probability (90 min.) Definition, Bayes theorem, probability densities and their properties, catalogue of pdfs, Monte Carlo 2 Statistical tests (90 min.) general concepts, test statistics,
More informationBrief introduction to Markov Chain Monte Carlo
Brief introduction to Department of Probability and Mathematical Statistics seminar Stochastic modeling in economics and finance November 7, 2011 Brief introduction to Content 1 and motivation Classical
More informationThe Story So Far... The central problem of this course: Smartness( X ) arg max X. Possibly with some constraints on X.
Heuristic Search The Story So Far... The central problem of this course: arg max X Smartness( X ) Possibly with some constraints on X. (Alternatively: arg min Stupidness(X ) ) X Properties of Smartness(X)
More informationApplications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices
Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationData Analysis I. Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK. 10 lectures, beginning October 2006
Astronomical p( y x, I) p( x, I) p ( x y, I) = p( y, I) Data Analysis I Dr Martin Hendry, Dept of Physics and Astronomy University of Glasgow, UK 10 lectures, beginning October 2006 4. Monte Carlo Methods
More informationLecture 7 and 8: Markov Chain Monte Carlo
Lecture 7 and 8: Markov Chain Monte Carlo 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering University of Cambridge http://mlg.eng.cam.ac.uk/teaching/4f13/ Ghahramani
More informationCh 4. Linear Models for Classification
Ch 4. Linear Models for Classification Pattern Recognition and Machine Learning, C. M. Bishop, 2006. Department of Computer Science and Engineering Pohang University of Science and echnology 77 Cheongam-ro,
More informationClustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26
Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar
More informationMarkov Chain Monte Carlo Inference. Siamak Ravanbakhsh Winter 2018
Graphical Models Markov Chain Monte Carlo Inference Siamak Ravanbakhsh Winter 2018 Learning objectives Markov chains the idea behind Markov Chain Monte Carlo (MCMC) two important examples: Gibbs sampling
More informationReliability Theory of Dynamically Loaded Structures (cont.)
Outline of Reliability Theory of Dynamically Loaded Structures (cont.) Probability Density Function of Local Maxima in a Stationary Gaussian Process. Distribution of Extreme Values. Monte Carlo Simulation
More informationLecture 16 Deep Neural Generative Models
Lecture 16 Deep Neural Generative Models CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago May 22, 2017 Approach so far: We have considered simple models and then constructed
More informationA Search and Jump Algorithm for Markov Chain Monte Carlo Sampling. Christopher Jennison. Adriana Ibrahim. Seminar at University of Kuwait
A Search and Jump Algorithm for Markov Chain Monte Carlo Sampling Christopher Jennison Department of Mathematical Sciences, University of Bath, UK http://people.bath.ac.uk/mascj Adriana Ibrahim Institute
More informationMCMC: Markov Chain Monte Carlo
I529: Machine Learning in Bioinformatics (Spring 2013) MCMC: Markov Chain Monte Carlo Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Review of Markov
More informationThe Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision
The Particle Filter Non-parametric implementation of Bayes filter Represents the belief (posterior) random state samples. by a set of This representation is approximate. Can represent distributions that
More information6 Markov Chain Monte Carlo (MCMC)
6 Markov Chain Monte Carlo (MCMC) The underlying idea in MCMC is to replace the iid samples of basic MC methods, with dependent samples from an ergodic Markov chain, whose limiting (stationary) distribution
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationLecture 4: Probabilistic Learning
DD2431 Autumn, 2015 1 Maximum Likelihood Methods Maximum A Posteriori Methods Bayesian methods 2 Classification vs Clustering Heuristic Example: K-means Expectation Maximization 3 Maximum Likelihood Methods
More informationSimulated Annealing for Constrained Global Optimization
Monte Carlo Methods for Computation and Optimization Final Presentation Simulated Annealing for Constrained Global Optimization H. Edwin Romeijn & Robert L.Smith (1994) Presented by Ariel Schwartz Objective
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationNeural Networks for Machine Learning. Lecture 11a Hopfield Nets
Neural Networks for Machine Learning Lecture 11a Hopfield Nets Geoffrey Hinton Nitish Srivastava, Kevin Swersky Tijmen Tieleman Abdel-rahman Mohamed Hopfield Nets A Hopfield net is composed of binary threshold
More informationComputer Vision Group Prof. Daniel Cremers. 11. Sampling Methods: Markov Chain Monte Carlo
Group Prof. Daniel Cremers 11. Sampling Methods: Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative
More informationLecture : Probabilistic Machine Learning
Lecture : Probabilistic Machine Learning Riashat Islam Reasoning and Learning Lab McGill University September 11, 2018 ML : Many Methods with Many Links Modelling Views of Machine Learning Machine Learning
More informationECE 275B Homework #2 Due Thursday 2/12/2015. MIDTERM is Scheduled for Thursday, February 19, 2015
Reading ECE 275B Homework #2 Due Thursday 2/12/2015 MIDTERM is Scheduled for Thursday, February 19, 2015 Read and understand the Newton-Raphson and Method of Scores MLE procedures given in Kay, Example
More informationMACHINE LEARNING ADVANCED MACHINE LEARNING
MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete
More informationFinite-Horizon Statistics for Markov chains
Analyzing FSDT Markov chains Friday, September 30, 2011 2:03 PM Simulating FSDT Markov chains, as we have said is very straightforward, either by using probability transition matrix or stochastic update
More informationAdaptive Algorithms for Blind Source Separation
Adaptive Algorithms for Blind Source Separation George Moustakides Department of Computer Engineering and Informatics UNIVERSITY OF PATRAS, GREECE Outline of the Presentation Problem definition Existing
More informationIEOR E4570: Machine Learning for OR&FE Spring 2015 c 2015 by Martin Haugh. The EM Algorithm
IEOR E4570: Machine Learning for OR&FE Spring 205 c 205 by Martin Haugh The EM Algorithm The EM algorithm is used for obtaining maximum likelihood estimates of parameters when some of the data is missing.
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationThe Bias-Variance dilemma of the Monte Carlo. method. Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel
The Bias-Variance dilemma of the Monte Carlo method Zlochin Mark 1 and Yoram Baram 1 Technion - Israel Institute of Technology, Technion City, Haifa 32000, Israel fzmark,baramg@cs.technion.ac.il Abstract.
More informationConvex Optimization CMU-10725
Convex Optimization CMU-10725 Simulated Annealing Barnabás Póczos & Ryan Tibshirani Andrey Markov Markov Chains 2 Markov Chains Markov chain: Homogen Markov chain: 3 Markov Chains Assume that the state
More informationIn the Name of God. Lecture 11: Single Layer Perceptrons
1 In the Name of God Lecture 11: Single Layer Perceptrons Perceptron: architecture We consider the architecture: feed-forward NN with one layer It is sufficient to study single layer perceptrons with just
More informationLecture 14. Clustering, K-means, and EM
Lecture 14. Clustering, K-means, and EM Prof. Alan Yuille Spring 2014 Outline 1. Clustering 2. K-means 3. EM 1 Clustering Task: Given a set of unlabeled data D = {x 1,..., x n }, we do the following: 1.
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationUnsupervised machine learning
Chapter 9 Unsupervised machine learning Unsupervised machine learning (a.k.a. cluster analysis) is a set of methods to assign objects into clusters under a predefined distance measure when class labels
More informationSTA414/2104. Lecture 11: Gaussian Processes. Department of Statistics
STA414/2104 Lecture 11: Gaussian Processes Department of Statistics www.utstat.utoronto.ca Delivered by Mark Ebden with thanks to Russ Salakhutdinov Outline Gaussian Processes Exam review Course evaluations
More informationGenetic Algorithms and Genetic Programming Lecture 17
Genetic Algorithms and Genetic Programming Lecture 17 Gillian Hayes 28th November 2006 Selection Revisited 1 Selection and Selection Pressure The Killer Instinct Memetic Algorithms Selection and Schemas
More informationMachine Learning and Adaptive Systems. Lectures 3 & 4
ECE656- Lectures 3 & 4, Professor Department of Electrical and Computer Engineering Colorado State University Fall 2015 What is Learning? General Definition of Learning: Any change in the behavior or performance
More informationResults: MCMC Dancers, q=10, n=500
Motivation Sampling Methods for Bayesian Inference How to track many INTERACTING targets? A Tutorial Frank Dellaert Results: MCMC Dancers, q=10, n=500 1 Probabilistic Topological Maps Results Real-Time
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationComputational functional genomics
Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate
More informationReport due date. Please note: report has to be handed in by Monday, May 16 noon.
Lecture 23 18.86 Report due date Please note: report has to be handed in by Monday, May 16 noon. Course evaluation: Please submit your feedback (you should have gotten an email) From the weak form to finite
More informationMSA220 Statistical Learning for Big Data
MSA220 Statistical Learning for Big Data Lecture 4 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology More on Discriminant analysis More on Discriminant
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 7 Approximate
More informationSupplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles
Supplemental Information Likelihood-based inference in isolation-by-distance models using the spatial distribution of low-frequency alleles John Novembre and Montgomery Slatkin Supplementary Methods To
More informationData Warehousing & Data Mining
13. Meta-Algorithms for Classification Data Warehousing & Data Mining Wolf-Tilo Balke Silviu Homoceanu Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 13.
More informationStochastic Proximal Gradient Algorithm
Stochastic Institut Mines-Télécom / Telecom ParisTech / Laboratoire Traitement et Communication de l Information Joint work with: Y. Atchade, Ann Arbor, USA, G. Fort LTCI/Télécom Paristech and the kind
More informationAd Placement Strategies
Case Study 1: Estimating Click Probabilities Tackling an Unknown Number of Features with Sketching Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox 2014 Emily Fox January
More informationOrganization. I MCMC discussion. I project talks. I Lecture.
Organization I MCMC discussion I project talks. I Lecture. Content I Uncertainty Propagation Overview I Forward-Backward with an Ensemble I Model Reduction (Intro) Uncertainty Propagation in Causal Systems
More informationLast updated: Oct 22, 2012 LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
Last updated: Oct 22, 2012 LINEAR CLASSIFIERS Problems 2 Please do Problem 8.3 in the textbook. We will discuss this in class. Classification: Problem Statement 3 In regression, we are modeling the relationship
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistical Sciences! rsalakhu@cs.toronto.edu! h0p://www.cs.utoronto.ca/~rsalakhu/ Lecture 7 Approximate
More information12 Discriminant Analysis
12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into
More information