Gaussian Processes 1. Schedule
|
|
- Carmel O’Brien’
- 5 years ago
- Views:
Transcription
1 1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project on INLA (Team effort, work in groups) Feb: Template model builder. (Guest lecturer Hans J Skaug) To be decided...
2 Motivation Large spatial (spatio-temporal) datasets Gaussian processes are very commonly used in practice.
3 Motivation Other contexts Genetic data Functional data Response surfaces modeling and optimization Extremely common as building block in several machine learning applications.
4 Preliminaries Univariate Gaussian distribution pdf p(x) x Y N(µ, σ 2 ). p(y) = 1 ( exp 1 2πσ 2 (y µ) 2 Z = Y µ, Y = µ + σz. σ Z is standard normal, mean 0 and variance 1. σ 2 ), y R.
5 Preliminaries Multivariate gaussian distribution Size n 1 vector Y = (Y 1,..., Y n ) ( 1 p(y) = exp 1 ) (2π) n/2 Σ 1/2 2 (y µ) Σ 1 (y µ), Y R n. Mean is E(Y ) = µ = (µ 1,..., µ n ). Variance-covariance matrix is Σ 1,1... Σ 1,n Σ = , Σ n,1... Σ n,n Σ i,i = σ 2 i = Var(Y i ), Σ i,j = Cov(Y i, Y j ), Corr(Y i, Y j ) = Σ i,j /(σ i σ j ).
6 Preliminaries Illustrations n = 2 - correlation 0.9 (left), independent (right). x x x x 1
7 Preliminaries Joint for blocks Y A = (Y A,1,..., Y A,nA ), Y B = (Y B,1,..., Y B,nB ), joint Gaussian with mean (µ A, µ B ), covariance: [ ] Σ µ = (µ A, µ B ), Σ = A Σ A,B, Σ B,A Σ B
8 Preliminaries Conditioning E(Y A Y B ) = µ A + Σ A,B Σ 1 B (Y B µ B ), Var(Y A Y B ) = Σ A Σ A,B Σ 1 B Σ B,A. Mean is linear in conditioning variable (data). Variance is not dependent on data.
9 Preliminaries Illustration x 2 =-1 x 2 =1 conditional pdf p(x 1 x 2 ) x 1 Figure: Conditional pdf for Y 1 when Y 2 = 1 or Y 2 = 1.
10 Preliminaries Transformation Z = L 1 (Y µ), Y = µ + LZ, Σ = LL. Z = (Z 1,..., Z n ) are independent standard normal, mean 0 and variance 1.
11 Preliminaries Cholesky factorization Σ = Lower triangular matrix L = Σ 1,1... Σ 1,n Σ n,1... Σ n,n L 1, L 2,1 L 2, L n,1 L n,2... L n,n = LL,,
12 Preliminaries Cholesky - example Σ = [ ] [ 1 0, L = ]. Consider sampling from joint p(y 1, y 2 ) = p(y 1 )p(y 2 y 1 ): Sample from p(y 1 ) is constructed by Y 1 = µ 1 + L 1,1 Z 1. Sample from p(y 2 y 1 ) is constructed by Y 2 = µ 2 + L 2,1 Z 1 + L 2,2 Z 2 = µ 2 + L 2,1 Y 1 µ 1 L 1,1 + L 2,2 Z 2
13 Processes Gaussian process For any set of time locations t 1,..., t n. Y (t 1 ),..., Y (t n ) is jointly Gaussian. Mean µ(t i ), i = 1,..., n. Σ = Σ 1,1... Σ 1,n Σ n,1... Σ n,n,
14 Processes Covariance function The covariance tends to decay with distance: Σ i,j = γ( t i t j ), for some covariance function γ. Examples: γexp(h) = σ 2 exp( φh) γmat(h) = σ 2 (1 + φh) exp( φh) γgauss(h) = σ 2 exp( φh 2 )
15 Processes Illustration of covariance function Exp. corr Matern type corr Gauss. corr Correlation Distance t-s
16 Processes Illustration of samples of Gaussian processes 3 Exp. corr Matern type corr 2 1 x(t) Time t
17 Processes Markov property The exponential correlation function gives a Markov process. For any t > s > u, p(y(t) y(s), y(u)) = p(y(t) y(s)). (Proof by trivariate distribution, and conditioning.) E(Y A Y B ) = µ A + Σ A,B Σ 1 B (Y B µ B ), Var(Y A Y B ) = Σ A Σ A,B Σ 1 B Σ B,A.
18 Processes Conditional formula E(Y A Y B ) = µ A + Σ A,B Σ 1 B (Y B µ B ), Var(Y A Y B ) = Σ A Σ A,B Σ 1 B Σ B,A. Expectation linear in data. Variance only dependent on data locations, not data. Expectation close to conditioning variables near data locations, goes to µ A far from data. Variance small near data locations, goes to Σ A far from data. Close data locations are not double data.
19 Processes Illustration of conditioning in Gaussian processes 2 1 x(t) t 3 2 x(t) t
20 Processes Application of Gaussian processes: function optimization Several applications involve very time-demanding or costly experiment. Which configurations of inputs give highest output? Goal is to find the optimal input without too many trials / tests. Approach: Fit a GP to the function, based on the evaluation points and results. This allows fast consideration of which evaluations points to choose!
21 Processes Production quality : Current prediction Goal: Find the best temperature input, to give optimal production. Which temperature to evaluate next? Experiment is costly, want to do few evaluations.
22 Processes Expected improvement Maximum so far Y = max(y ). EI n (s) = E(max(Y (s) Y, 0) Y )
23 Processes Sequential uncertainty reduction Perform test at t = 40, result is Y (40) = 50.
24 Processes Sequential uncertainty reduction and optimization Maximum so far Y = max(y, Y n+1 ). EI n+1 (s) = E(max(Y (s) Y, 0) Y, Y n+1 )
25 Processes Project on function optimization using EI next week Analytical solutions to parts of computational challenge.
26 Gaussian Markov random fields Precision matrix Q [ Σ 1 = Q = ] Q A Q A,B Q B,A Q B. Q holds the conditional variance structure.
27 Gaussian Markov random fields Interpretation of precision Q 1 A = Var(Y A Y B ), E(Y A Y B ) = µ A Q 1 A Q A,B(Y B µ B ), (Proof by QΣ = I. Or by writing out quadratic form and p(y A Y B ) p(y A, Y B ).)
28 Gaussian Markov random fields Sparse precision matrix Q For graphs the precision matrix is sparse. Q ij = 0 if nodes i and j are not neighbors. Conditionally independent. Q i,i±2 = 0 for exponential covariance function on a regular grid in time.
29 Gaussian Markov random fields Sparse precision matrix Q This sparseness means that several techniques from numerical analysis can be used. Solve Qb = a quickly for b.
30 Gaussian Markov random fields Cholesky factorization of Q Common method for sampling and evalution: Q 1,1... Q 1,n Q = = L Q L Q, Q n,1... Q n,n Lower triangular matrix L Q = L Q,1, L Q,2,1 L Q,2, L Q,n,1 L Q,n,2... L Q,n,n, The Cholesky factor is often sparse, but not as sparse as Q, because it holds only partial conditional structure, according to an ordering. Sparsity is maintained for exponential covariance function in time dimension (Markov).
31 Gaussian Markov random fields Sampling and evaluation using L Q Q = Q 1,1... Q 1,n Q n,1... Q n,n L Q Y = Z. (Previously, for covariance we had Y = LZ.) = L Q L Q, log Q = 2 log L Q = 2 i L Q,ii
32 Gaussian Markov random fields GMRF for spatial applications. A Markovian model can be constructed for a spatial Gaussian processes (Lindgren et al., 2011). The spatial process is viewed as a stochastic partial differential equation, and the solution is embedded in a triagularized graph over a spatial domain. More later (31 Jan).
33 Regression Model Regression for spatial datasets Applications with trends and residual spatial variation.
34 Regression Model Spatial Gaussian regression model Model: Y (s) = X (s)β + w(s) + ɛ(s). 1. Y (s) response variable at location /location-time position s. 2. β regression effects. X (s) covariates at s. 3. w(s) structured (space-time correlated) Gaussian process. 4. ɛ(s) unstructured (independent) Gaussian measurement noise.
35 Regression Model Gaussian model Model: Y (s) = X (s)β + w(s) + ɛ(s). Data at n locations: Y = (Y (s 1 ),..., Y (s n )). Main goals are: Parameter estimation Prediction
36 Regression Model Gaussian model Likelihood for parameter estimation: l(y ; β, θ) = 1 2 log C 1 2 (Y X β) C 1 (Y X β) C(θ) = C = Σ + τ 2 I n Var(w) = Σ, Var(ɛ(s i )) = τ 2 for all i. θ include parameters of the covariance model.
37 Method Maximum likelihood MLE: (ˆθ, ˆβ) = argmax θ,β {l(y ; β, θ)}.
38 Method Analytical derivatives Formulas for matrix derivatives. Q(θ) = C 1 ˆβ = [X QX ] 1 X QY, Z = Y X ˆβ dlog C dθ r = trace(q dc ) dθ r dz QZ dθ r = Z Q dc QZ. dθ r
39 Method Score and Hessian for θ dl = 1 dc trace(q ) + 1 dθ r 2 dθ r 2 Z Q dc QZ, dθ r ( d 2 ) l E = 1 dc trace(q Q dc ). dθ r dθ s 2 dθ s dθ r
40 Method Updates for each iteration Q = Q(θ p ) ˆβ p = [X QX ] 1 X QY, ˆθ p+1 = ˆθ p E ( d 2 l(y ; ˆβ p, ˆθ p ) dθ 2 ) 1 dl(y ; ˆβ p, ˆθ p ) dθ, Iterative scheme usually starts from preliminary guess, obatined via summary statistics.
41 Method Illustration maximization Exponential covariance with nugget effect. θ = (θ 1, θ 2, θ 3 ) : log precision, logistic range, log nugget precision.
42 Method Asymptotic properties ˆθ N(θ, G 1 ). G = G(ˆθ) = E ( d 2 ) l dθ 2.
43 Prediction Prediction from joint Gaussian formulation Prediction Ŷ 0 = E(Y 0 Y ) = X 0 ˆβ + C 0,. C 1 (Y X ˆβ). C 0,. is size 1 n vector of cross-covariances between prediction site s 0 and data sites. Prediction variance Var(Y 0 Y ) = C 0 C 0,. C 1 C 0,..
44 Numerical examples Synthetic data Consider unit square. Create grid of 25 2 = 625 locations. Use 49 data randomly assigned, or along center line (two designs). Covariance C(h) = τ 2 I (h = 0) + σ 2 (1 + φh) exp( φh), h = s i s j. θ include transformations of: σ, τ and φ.
45 Numerical examples Likelihood optimization True parameters β = ( 2, 3, 1), θ = (0.25, 9, ). Random design: β = [ 2(0.486), 3.43(0.552), 0.812(0.538)] θ = [ 0.298(0.118), 7.89(1.98), ( )] Center design: ˆβ = [ 2.06(0.576), 3.4(0.733), 0.353(0.733)] ˆθ = [0.255(0.141), 7.19(1.97), ( )]
46 Numerical examples Predictions
47 Numerical examples Project on spatial Gaussian regression model next week
48 Numerical examples Mining example Data giving oxide grade information at n = 1600 locations. 500 m Prediction line B Northing A Easting Use spatial statistics to predict oxide grade. Question: Will mining be profitable? Should one gather more data before making mining decision?
49 Numerical examples Mining example Model: Y (s) = X (s)β + w(s) + ɛ(s). Data at n borehole locations used to fit model parameters by MLE. Starting values from variogram (related to sample covariance) In Strike Perpendicular Omni directional Variogram Distance (m)
50 Numerical examples Mining example Current prediction A B A B Altitude (m) Altitude (m) Line Distance (m), Line Distance (m)
51 Numerical examples Mining example Develop or not? Decision is done using expected profits. Value based on current data: PV = max(p y, 0), p y = r t µ y k t 1, Will more data influence the mining decision? Expected value with more data: PoV = max(p y,y n, 0)π(y n y)dy n, p y,y n = r t µ y,y n k t 1, y n Is more data valuable. Compute the value of information (VOI): VOI = PoV PV. If VOI exceeds the cost of data, it is worthwhile gathering this information. (Computations partly analytical, similar to EI.)
52 Numerical examples Mining example Use this to compare two data types in planned boreholes: XRF : scanning in a lab (expensive, accurate). XMET : handheld device for grade scanning at the mining site (inexpensive, inaccurate).
53 Numerical examples Mining example 12 x 105 Cost of XRF ($) XMET XRF Nothing Cost of XMET ($) x 10 5
54 Numerical examples Schedule 17 Jan: Gaussian processes 24 Jan: Hands-on project on Gaussian processes 31 Jan: Latent Gaussian models and INLA. 7 Feb: Hands-on project on INLA Feb: Template model builder. (Guest lecturer Hans J Skaug)
55 Numerical examples Schedule Project (24 Jan) will include Expected improvement for function maximization. Parameter estimation in spatial regression model (forest example).
Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields
Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February
More informationHierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets
Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,
More informationNearest Neighbor Gaussian Processes for Large Spatial Data
Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns
More informationMachine learning - HT Maximum Likelihood
Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce
More informationMS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari
MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationHierarchical Modeling for Univariate Spatial Data
Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This
More informationNonparameteric Regression:
Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,
More information11 : Gaussian Graphic Models and Ising Models
10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood
More informationModels for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data
Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise
More informationLikelihood-Based Methods
Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)
More informationHierarchical Modelling for Univariate Spatial Data
Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationEXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING
EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical
More informationOn Gaussian Process Models for High-Dimensional Geostatistical Datasets
On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May
More informationGaussian Process Regression
Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process
More informationIntroduction to Geostatistics
Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,
More informationThe value of imperfect borehole information in mineral resource evaluation
The value of imperfect borehole information in mineral resource evaluation Steinar L. Ellefmo and Jo Eidsvik Abstract In mineral resource evaluation a careful analysis and assessments of the geology, assay
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationGaussian with mean ( µ ) and standard deviation ( σ)
Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (
More informationMathematical statistics
October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation
More informationOptimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.
Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may
More informationAssociation studies and regression
Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration
More informationLinear Classification
Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative
More informationMaximum Likelihood, Logistic Regression, and Stochastic Gradient Training
Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions
More informationPh.D. Qualifying Exam Friday Saturday, January 6 7, 2017
Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a
More informationGauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA
JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter
More informationComputer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression
Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:
More informationLogistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu
Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data
More informationGaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008
Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:
More informationChapter 3: Maximum Likelihood Theory
Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood
More informationSpatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.
C Stochastic fields Covariance Spatial Statistics with Image Analysis Lecture 2 Johan Lindström November 4, 26 Lecture L2 Johan Lindström - johanl@maths.lth.se FMSN2/MASM2 L /2 C Stochastic fields Covariance
More informationMachine Learning for Data Science (CS4786) Lecture 12
Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We
More informationGeneralized Linear Models. Kurt Hornik
Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationChapter 17: Undirected Graphical Models
Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)
More informationBayesian Decision and Bayesian Learning
Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by
More informationComputation fundamentals of discrete GMRF representations of continuous domain spatial models
Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial
More informationGaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012
Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature
More informationModel Checking and Improvement
Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number
More informationData are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs
Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances
More informationRegression with correlation for the Sales Data
Regression with correlation for the Sales Data Scatter with Loess Curve Time Series Plot Sales 30 35 40 45 Sales 30 35 40 45 0 10 20 30 40 50 Week 0 10 20 30 40 50 Week Sales Data What is our goal with
More informationLinear Regression and Discrimination
Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans
More informationSystem Identification, Lecture 4
System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans
More informationThe Expectation-Maximization Algorithm
1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable
More informationModeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop
Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures
More informationVariations. ECE 6540, Lecture 10 Maximum Likelihood Estimation
Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter
More informationMarkov random fields. The Markov property
Markov random fields The Markov property Discrete time: (X k X k!1,x k!2,... = (X k X k!1 A time symmetric version: (X k! X!k = (X k X k!1,x k+1 A more general version: Let A be a set of indices >k, B
More informationGaussian predictive process models for large spatial data sets.
Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015 Overview
More informationIntroduction to Machine Learning
Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},
More informationPoint-Referenced Data Models
Point-Referenced Data Models Jamie Monogan University of Georgia Spring 2013 Jamie Monogan (UGA) Point-Referenced Data Models Spring 2013 1 / 19 Objectives By the end of these meetings, participants should
More informationMachine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall
Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume
More informationIntroduction to Machine Learning
Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/
More informationCS 195-5: Machine Learning Problem Set 1
CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of
More informationComputer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression
Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.
More informationUniversität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr
Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters
More informationMCMC algorithms for fitting Bayesian models
MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models
More informationVariational Inference (11/04/13)
STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further
More informationComputer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)
Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori
More informationIntroduction to Gaussian Processes
Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of
More informationExponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger
Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential
More informationMachine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers
Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers School of Informatics, University of Edinburgh, Instructor: Chris Williams 1. Given a dataset {(x n, y n ), n 1,..., N}, where
More informationBayesian Approaches Data Mining Selected Technique
Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:
More informationGraphical Models and Kernel Methods
Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.
More informationIntroduction to Gaussian Process
Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression
More informationAdvanced Quantitative Methods: maximum likelihood
Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y
More informationBoosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi
Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier
More informationA Bayesian Perspective on Residential Demand Response Using Smart Meter Data
A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu
More informationPOLI 8501 Introduction to Maximum Likelihood Estimation
POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,
More informationChapter 4 - Fundamentals of spatial processes Lecture notes
Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative
More informationSTATS 306B: Unsupervised Learning Spring Lecture 2 April 2
STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised
More informationCPSC 540: Machine Learning
CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is
More informationPMR Learning as Inference
Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationBayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets
Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Sudipto Banerjee University of California, Los Angeles, USA Based upon projects involving: Abhirup Datta (Johns Hopkins University)
More informationIntroduction to Probabilistic Machine Learning
Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning
More informationLinear Dynamical Systems
Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations
More informationA Framework for Daily Spatio-Temporal Stochastic Weather Simulation
A Framework for Daily Spatio-Temporal Stochastic Weather Simulation, Rick Katz, Balaji Rajagopalan Geophysical Statistics Project Institute for Mathematics Applied to Geosciences National Center for Atmospheric
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationLecture 2: From Linear Regression to Kalman Filter and Beyond
Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing
More informationMachine Learning (CS 567) Lecture 5
Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol
More informationHierarchical Modeling for Multivariate Spatial Data
Hierarchical Modeling for Multivariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department
More informationComputer Intensive Methods in Mathematical Statistics
Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of
More informationSequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015
Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative
More informationLearning discrete graphical models via generalized inverse covariance matrices
Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationGaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak
Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression
More informationLecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu
Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes
More informationLecture 13 : Variational Inference: Mean Field Approximation
10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1
More informationIntroduction to Probabilistic Graphical Models: Exercises
Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics
More informationBayesian Linear Models
Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,
More informationFrequentist-Bayesian Model Comparisons: A Simple Example
Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal
More informationPractical Bayesian Optimization of Machine Learning. Learning Algorithms
Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that
More informationBayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework
HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for
More informationMidterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.
CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic
More informationBasic math for biology
Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood
More informationOverfitting, Bias / Variance Analysis
Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic
More information