Gaussian Processes 1. Schedule

Size: px
Start display at page:

Download "Gaussian Processes 1. Schedule"

Transcription

1 1 Schedule 17 Jan: Gaussian processes (Jo Eidsvik) 24 Jan: Hands-on project on Gaussian processes (Team effort, work in groups) 31 Jan: Latent Gaussian models and INLA (Jo Eidsvik) 7 Feb: Hands-on project on INLA (Team effort, work in groups) Feb: Template model builder. (Guest lecturer Hans J Skaug) To be decided...

2 Motivation Large spatial (spatio-temporal) datasets Gaussian processes are very commonly used in practice.

3 Motivation Other contexts Genetic data Functional data Response surfaces modeling and optimization Extremely common as building block in several machine learning applications.

4 Preliminaries Univariate Gaussian distribution pdf p(x) x Y N(µ, σ 2 ). p(y) = 1 ( exp 1 2πσ 2 (y µ) 2 Z = Y µ, Y = µ + σz. σ Z is standard normal, mean 0 and variance 1. σ 2 ), y R.

5 Preliminaries Multivariate gaussian distribution Size n 1 vector Y = (Y 1,..., Y n ) ( 1 p(y) = exp 1 ) (2π) n/2 Σ 1/2 2 (y µ) Σ 1 (y µ), Y R n. Mean is E(Y ) = µ = (µ 1,..., µ n ). Variance-covariance matrix is Σ 1,1... Σ 1,n Σ = , Σ n,1... Σ n,n Σ i,i = σ 2 i = Var(Y i ), Σ i,j = Cov(Y i, Y j ), Corr(Y i, Y j ) = Σ i,j /(σ i σ j ).

6 Preliminaries Illustrations n = 2 - correlation 0.9 (left), independent (right). x x x x 1

7 Preliminaries Joint for blocks Y A = (Y A,1,..., Y A,nA ), Y B = (Y B,1,..., Y B,nB ), joint Gaussian with mean (µ A, µ B ), covariance: [ ] Σ µ = (µ A, µ B ), Σ = A Σ A,B, Σ B,A Σ B

8 Preliminaries Conditioning E(Y A Y B ) = µ A + Σ A,B Σ 1 B (Y B µ B ), Var(Y A Y B ) = Σ A Σ A,B Σ 1 B Σ B,A. Mean is linear in conditioning variable (data). Variance is not dependent on data.

9 Preliminaries Illustration x 2 =-1 x 2 =1 conditional pdf p(x 1 x 2 ) x 1 Figure: Conditional pdf for Y 1 when Y 2 = 1 or Y 2 = 1.

10 Preliminaries Transformation Z = L 1 (Y µ), Y = µ + LZ, Σ = LL. Z = (Z 1,..., Z n ) are independent standard normal, mean 0 and variance 1.

11 Preliminaries Cholesky factorization Σ = Lower triangular matrix L = Σ 1,1... Σ 1,n Σ n,1... Σ n,n L 1, L 2,1 L 2, L n,1 L n,2... L n,n = LL,,

12 Preliminaries Cholesky - example Σ = [ ] [ 1 0, L = ]. Consider sampling from joint p(y 1, y 2 ) = p(y 1 )p(y 2 y 1 ): Sample from p(y 1 ) is constructed by Y 1 = µ 1 + L 1,1 Z 1. Sample from p(y 2 y 1 ) is constructed by Y 2 = µ 2 + L 2,1 Z 1 + L 2,2 Z 2 = µ 2 + L 2,1 Y 1 µ 1 L 1,1 + L 2,2 Z 2

13 Processes Gaussian process For any set of time locations t 1,..., t n. Y (t 1 ),..., Y (t n ) is jointly Gaussian. Mean µ(t i ), i = 1,..., n. Σ = Σ 1,1... Σ 1,n Σ n,1... Σ n,n,

14 Processes Covariance function The covariance tends to decay with distance: Σ i,j = γ( t i t j ), for some covariance function γ. Examples: γexp(h) = σ 2 exp( φh) γmat(h) = σ 2 (1 + φh) exp( φh) γgauss(h) = σ 2 exp( φh 2 )

15 Processes Illustration of covariance function Exp. corr Matern type corr Gauss. corr Correlation Distance t-s

16 Processes Illustration of samples of Gaussian processes 3 Exp. corr Matern type corr 2 1 x(t) Time t

17 Processes Markov property The exponential correlation function gives a Markov process. For any t > s > u, p(y(t) y(s), y(u)) = p(y(t) y(s)). (Proof by trivariate distribution, and conditioning.) E(Y A Y B ) = µ A + Σ A,B Σ 1 B (Y B µ B ), Var(Y A Y B ) = Σ A Σ A,B Σ 1 B Σ B,A.

18 Processes Conditional formula E(Y A Y B ) = µ A + Σ A,B Σ 1 B (Y B µ B ), Var(Y A Y B ) = Σ A Σ A,B Σ 1 B Σ B,A. Expectation linear in data. Variance only dependent on data locations, not data. Expectation close to conditioning variables near data locations, goes to µ A far from data. Variance small near data locations, goes to Σ A far from data. Close data locations are not double data.

19 Processes Illustration of conditioning in Gaussian processes 2 1 x(t) t 3 2 x(t) t

20 Processes Application of Gaussian processes: function optimization Several applications involve very time-demanding or costly experiment. Which configurations of inputs give highest output? Goal is to find the optimal input without too many trials / tests. Approach: Fit a GP to the function, based on the evaluation points and results. This allows fast consideration of which evaluations points to choose!

21 Processes Production quality : Current prediction Goal: Find the best temperature input, to give optimal production. Which temperature to evaluate next? Experiment is costly, want to do few evaluations.

22 Processes Expected improvement Maximum so far Y = max(y ). EI n (s) = E(max(Y (s) Y, 0) Y )

23 Processes Sequential uncertainty reduction Perform test at t = 40, result is Y (40) = 50.

24 Processes Sequential uncertainty reduction and optimization Maximum so far Y = max(y, Y n+1 ). EI n+1 (s) = E(max(Y (s) Y, 0) Y, Y n+1 )

25 Processes Project on function optimization using EI next week Analytical solutions to parts of computational challenge.

26 Gaussian Markov random fields Precision matrix Q [ Σ 1 = Q = ] Q A Q A,B Q B,A Q B. Q holds the conditional variance structure.

27 Gaussian Markov random fields Interpretation of precision Q 1 A = Var(Y A Y B ), E(Y A Y B ) = µ A Q 1 A Q A,B(Y B µ B ), (Proof by QΣ = I. Or by writing out quadratic form and p(y A Y B ) p(y A, Y B ).)

28 Gaussian Markov random fields Sparse precision matrix Q For graphs the precision matrix is sparse. Q ij = 0 if nodes i and j are not neighbors. Conditionally independent. Q i,i±2 = 0 for exponential covariance function on a regular grid in time.

29 Gaussian Markov random fields Sparse precision matrix Q This sparseness means that several techniques from numerical analysis can be used. Solve Qb = a quickly for b.

30 Gaussian Markov random fields Cholesky factorization of Q Common method for sampling and evalution: Q 1,1... Q 1,n Q = = L Q L Q, Q n,1... Q n,n Lower triangular matrix L Q = L Q,1, L Q,2,1 L Q,2, L Q,n,1 L Q,n,2... L Q,n,n, The Cholesky factor is often sparse, but not as sparse as Q, because it holds only partial conditional structure, according to an ordering. Sparsity is maintained for exponential covariance function in time dimension (Markov).

31 Gaussian Markov random fields Sampling and evaluation using L Q Q = Q 1,1... Q 1,n Q n,1... Q n,n L Q Y = Z. (Previously, for covariance we had Y = LZ.) = L Q L Q, log Q = 2 log L Q = 2 i L Q,ii

32 Gaussian Markov random fields GMRF for spatial applications. A Markovian model can be constructed for a spatial Gaussian processes (Lindgren et al., 2011). The spatial process is viewed as a stochastic partial differential equation, and the solution is embedded in a triagularized graph over a spatial domain. More later (31 Jan).

33 Regression Model Regression for spatial datasets Applications with trends and residual spatial variation.

34 Regression Model Spatial Gaussian regression model Model: Y (s) = X (s)β + w(s) + ɛ(s). 1. Y (s) response variable at location /location-time position s. 2. β regression effects. X (s) covariates at s. 3. w(s) structured (space-time correlated) Gaussian process. 4. ɛ(s) unstructured (independent) Gaussian measurement noise.

35 Regression Model Gaussian model Model: Y (s) = X (s)β + w(s) + ɛ(s). Data at n locations: Y = (Y (s 1 ),..., Y (s n )). Main goals are: Parameter estimation Prediction

36 Regression Model Gaussian model Likelihood for parameter estimation: l(y ; β, θ) = 1 2 log C 1 2 (Y X β) C 1 (Y X β) C(θ) = C = Σ + τ 2 I n Var(w) = Σ, Var(ɛ(s i )) = τ 2 for all i. θ include parameters of the covariance model.

37 Method Maximum likelihood MLE: (ˆθ, ˆβ) = argmax θ,β {l(y ; β, θ)}.

38 Method Analytical derivatives Formulas for matrix derivatives. Q(θ) = C 1 ˆβ = [X QX ] 1 X QY, Z = Y X ˆβ dlog C dθ r = trace(q dc ) dθ r dz QZ dθ r = Z Q dc QZ. dθ r

39 Method Score and Hessian for θ dl = 1 dc trace(q ) + 1 dθ r 2 dθ r 2 Z Q dc QZ, dθ r ( d 2 ) l E = 1 dc trace(q Q dc ). dθ r dθ s 2 dθ s dθ r

40 Method Updates for each iteration Q = Q(θ p ) ˆβ p = [X QX ] 1 X QY, ˆθ p+1 = ˆθ p E ( d 2 l(y ; ˆβ p, ˆθ p ) dθ 2 ) 1 dl(y ; ˆβ p, ˆθ p ) dθ, Iterative scheme usually starts from preliminary guess, obatined via summary statistics.

41 Method Illustration maximization Exponential covariance with nugget effect. θ = (θ 1, θ 2, θ 3 ) : log precision, logistic range, log nugget precision.

42 Method Asymptotic properties ˆθ N(θ, G 1 ). G = G(ˆθ) = E ( d 2 ) l dθ 2.

43 Prediction Prediction from joint Gaussian formulation Prediction Ŷ 0 = E(Y 0 Y ) = X 0 ˆβ + C 0,. C 1 (Y X ˆβ). C 0,. is size 1 n vector of cross-covariances between prediction site s 0 and data sites. Prediction variance Var(Y 0 Y ) = C 0 C 0,. C 1 C 0,..

44 Numerical examples Synthetic data Consider unit square. Create grid of 25 2 = 625 locations. Use 49 data randomly assigned, or along center line (two designs). Covariance C(h) = τ 2 I (h = 0) + σ 2 (1 + φh) exp( φh), h = s i s j. θ include transformations of: σ, τ and φ.

45 Numerical examples Likelihood optimization True parameters β = ( 2, 3, 1), θ = (0.25, 9, ). Random design: β = [ 2(0.486), 3.43(0.552), 0.812(0.538)] θ = [ 0.298(0.118), 7.89(1.98), ( )] Center design: ˆβ = [ 2.06(0.576), 3.4(0.733), 0.353(0.733)] ˆθ = [0.255(0.141), 7.19(1.97), ( )]

46 Numerical examples Predictions

47 Numerical examples Project on spatial Gaussian regression model next week

48 Numerical examples Mining example Data giving oxide grade information at n = 1600 locations. 500 m Prediction line B Northing A Easting Use spatial statistics to predict oxide grade. Question: Will mining be profitable? Should one gather more data before making mining decision?

49 Numerical examples Mining example Model: Y (s) = X (s)β + w(s) + ɛ(s). Data at n borehole locations used to fit model parameters by MLE. Starting values from variogram (related to sample covariance) In Strike Perpendicular Omni directional Variogram Distance (m)

50 Numerical examples Mining example Current prediction A B A B Altitude (m) Altitude (m) Line Distance (m), Line Distance (m)

51 Numerical examples Mining example Develop or not? Decision is done using expected profits. Value based on current data: PV = max(p y, 0), p y = r t µ y k t 1, Will more data influence the mining decision? Expected value with more data: PoV = max(p y,y n, 0)π(y n y)dy n, p y,y n = r t µ y,y n k t 1, y n Is more data valuable. Compute the value of information (VOI): VOI = PoV PV. If VOI exceeds the cost of data, it is worthwhile gathering this information. (Computations partly analytical, similar to EI.)

52 Numerical examples Mining example Use this to compare two data types in planned boreholes: XRF : scanning in a lab (expensive, accurate). XMET : handheld device for grade scanning at the mining site (inexpensive, inaccurate).

53 Numerical examples Mining example 12 x 105 Cost of XRF ($) XMET XRF Nothing Cost of XMET ($) x 10 5

54 Numerical examples Schedule 17 Jan: Gaussian processes 24 Jan: Hands-on project on Gaussian processes 31 Jan: Latent Gaussian models and INLA. 7 Feb: Hands-on project on INLA Feb: Template model builder. (Guest lecturer Hans J Skaug)

55 Numerical examples Schedule Project (24 Jan) will include Expected improvement for function maximization. Parameter estimation in spatial regression model (forest example).

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Machine learning - HT Maximum Likelihood

Machine learning - HT Maximum Likelihood Machine learning - HT 2016 3. Maximum Likelihood Varun Kanade University of Oxford January 27, 2016 Outline Probabilistic Framework Formulate linear regression in the language of probability Introduce

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach

Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score

More information

Hierarchical Modeling for Univariate Spatial Data

Hierarchical Modeling for Univariate Spatial Data Hierarchical Modeling for Univariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Spatial Domain 2 Geography 890 Spatial Domain This

More information

Nonparameteric Regression:

Nonparameteric Regression: Nonparameteric Regression: Nadaraya-Watson Kernel Regression & Gaussian Process Regression Seungjin Choi Department of Computer Science and Engineering Pohang University of Science and Technology 77 Cheongam-ro,

More information

11 : Gaussian Graphic Models and Ising Models

11 : Gaussian Graphic Models and Ising Models 10-708: Probabilistic Graphical Models 10-708, Spring 2017 11 : Gaussian Graphic Models and Ising Models Lecturer: Bryon Aragam Scribes: Chao-Ming Yen 1 Introduction Different from previous maximum likelihood

More information

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data

Models for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise

More information

Likelihood-Based Methods

Likelihood-Based Methods Likelihood-Based Methods Handbook of Spatial Statistics, Chapter 4 Susheela Singh September 22, 2016 OVERVIEW INTRODUCTION MAXIMUM LIKELIHOOD ESTIMATION (ML) RESTRICTED MAXIMUM LIKELIHOOD ESTIMATION (REML)

More information

Hierarchical Modelling for Univariate Spatial Data

Hierarchical Modelling for Univariate Spatial Data Hierarchical Modelling for Univariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING

EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING EXAM IN STATISTICAL MACHINE LEARNING STATISTISK MASKININLÄRNING DATE AND TIME: August 30, 2018, 14.00 19.00 RESPONSIBLE TEACHER: Niklas Wahlström NUMBER OF PROBLEMS: 5 AIDING MATERIAL: Calculator, mathematical

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Gaussian Process Regression

Gaussian Process Regression Gaussian Process Regression 4F1 Pattern Recognition, 21 Carl Edward Rasmussen Department of Engineering, University of Cambridge November 11th - 16th, 21 Rasmussen (Engineering, Cambridge) Gaussian Process

More information

Introduction to Geostatistics

Introduction to Geostatistics Introduction to Geostatistics Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore,

More information

The value of imperfect borehole information in mineral resource evaluation

The value of imperfect borehole information in mineral resource evaluation The value of imperfect borehole information in mineral resource evaluation Steinar L. Ellefmo and Jo Eidsvik Abstract In mineral resource evaluation a careful analysis and assessments of the geology, assay

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Gaussian with mean ( µ ) and standard deviation ( σ)

Gaussian with mean ( µ ) and standard deviation ( σ) Slide from Pieter Abbeel Gaussian with mean ( µ ) and standard deviation ( σ) 10/6/16 CSE-571: Robotics X ~ N( µ, σ ) Y ~ N( aµ + b, a σ ) Y = ax + b + + + + 1 1 1 1 1 1 1 1 1 1, ~ ) ( ) ( ), ( ~ ), (

More information

Mathematical statistics

Mathematical statistics October 1 st, 2018 Lecture 11: Sufficient statistic Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation

More information

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X.

Optimization. The value x is called a maximizer of f and is written argmax X f. g(λx + (1 λ)y) < λg(x) + (1 λ)g(y) 0 < λ < 1; x, y X. Optimization Background: Problem: given a function f(x) defined on X, find x such that f(x ) f(x) for all x X. The value x is called a maximizer of f and is written argmax X f. In general, argmax X f may

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Linear Classification

Linear Classification Linear Classification Lili MOU moull12@sei.pku.edu.cn http://sei.pku.edu.cn/ moull12 23 April 2015 Outline Introduction Discriminant Functions Probabilistic Generative Models Probabilistic Discriminative

More information

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training

Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Maximum Likelihood, Logistic Regression, and Stochastic Gradient Training Charles Elkan elkan@cs.ucsd.edu January 17, 2013 1 Principle of maximum likelihood Consider a family of probability distributions

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA

Gauge Plots. Gauge Plots JAPANESE BEETLE DATA MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA JAPANESE BEETLE DATA JAPANESE BEETLE DATA 6 MAXIMUM LIKELIHOOD FOR SPATIALLY CORRELATED DISCRETE DATA Gauge Plots TuscaroraLisa Central Madsen Fairways, 996 January 9, 7 Grubs Adult Activity Grub Counts 6 8 Organic Matter

More information

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 9. Gaussian Processes - Regression Group Prof. Daniel Cremers 9. Gaussian Processes - Regression Repetition: Regularized Regression Before, we solved for w using the pseudoinverse. But: we can kernelize this problem as well! First step:

More information

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu

Logistic Regression Review Fall 2012 Recitation. September 25, 2012 TA: Selen Uguroglu Logistic Regression Review 10-601 Fall 2012 Recitation September 25, 2012 TA: Selen Uguroglu!1 Outline Decision Theory Logistic regression Goal Loss function Inference Gradient Descent!2 Training Data

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Chapter 3: Maximum Likelihood Theory

Chapter 3: Maximum Likelihood Theory Chapter 3: Maximum Likelihood Theory Florian Pelgrin HEC September-December, 2010 Florian Pelgrin (HEC) Maximum Likelihood Theory September-December, 2010 1 / 40 1 Introduction Example 2 Maximum likelihood

More information

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.

Spatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström. C Stochastic fields Covariance Spatial Statistics with Image Analysis Lecture 2 Johan Lindström November 4, 26 Lecture L2 Johan Lindström - johanl@maths.lth.se FMSN2/MASM2 L /2 C Stochastic fields Covariance

More information

Machine Learning for Data Science (CS4786) Lecture 12

Machine Learning for Data Science (CS4786) Lecture 12 Machine Learning for Data Science (CS4786) Lecture 12 Gaussian Mixture Models Course Webpage : http://www.cs.cornell.edu/courses/cs4786/2016fa/ Back to K-means Single link is sensitive to outliners We

More information

Generalized Linear Models. Kurt Hornik

Generalized Linear Models. Kurt Hornik Generalized Linear Models Kurt Hornik Motivation Assuming normality, the linear model y = Xβ + e has y = β + ε, ε N(0, σ 2 ) such that y N(μ, σ 2 ), E(y ) = μ = β. Various generalizations, including general

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Chapter 17: Undirected Graphical Models

Chapter 17: Undirected Graphical Models Chapter 17: Undirected Graphical Models The Elements of Statistical Learning Biaobin Jiang Department of Biological Sciences Purdue University bjiang@purdue.edu October 30, 2014 Biaobin Jiang (Purdue)

More information

Bayesian Decision and Bayesian Learning

Bayesian Decision and Bayesian Learning Bayesian Decision and Bayesian Learning Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 30 Bayes Rule p(x ω i

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 12: Gaussian Belief Propagation, State Space Models and Kalman Filters Guest Kalman Filter Lecture by

More information

Computation fundamentals of discrete GMRF representations of continuous domain spatial models

Computation fundamentals of discrete GMRF representations of continuous domain spatial models Computation fundamentals of discrete GMRF representations of continuous domain spatial models Finn Lindgren September 23 2015 v0.2.2 Abstract The fundamental formulas and algorithms for Bayesian spatial

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Model Checking and Improvement

Model Checking and Improvement Model Checking and Improvement Statistics 220 Spring 2005 Copyright c 2005 by Mark E. Irwin Model Checking All models are wrong but some models are useful George E. P. Box So far we have looked at a number

More information

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs

Data are collected along transect lines, with dense data along. Spatial modelling using GMRFs. 200m. Today? Spatial modelling using GMRFs Centre for Mathematical Sciences Lund University Engineering geology Lund University Results A non-stationary extension The model Estimation Gaussian Markov random fields Basics Approximating Mate rn covariances

More information

Regression with correlation for the Sales Data

Regression with correlation for the Sales Data Regression with correlation for the Sales Data Scatter with Loess Curve Time Series Plot Sales 30 35 40 45 Sales 30 35 40 45 0 10 20 30 40 50 Week 0 10 20 30 40 50 Week Sales Data What is our goal with

More information

Linear Regression and Discrimination

Linear Regression and Discrimination Linear Regression and Discrimination Kernel-based Learning Methods Christian Igel Institut für Neuroinformatik Ruhr-Universität Bochum, Germany http://www.neuroinformatik.rub.de July 16, 2009 Christian

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2012 F, FRI Uppsala University, Information Technology 30 Januari 2012 SI-2012 K. Pelckmans

More information

System Identification, Lecture 4

System Identification, Lecture 4 System Identification, Lecture 4 Kristiaan Pelckmans (IT/UU, 2338) Course code: 1RT880, Report code: 61800 - Spring 2016 F, FRI Uppsala University, Information Technology 13 April 2016 SI-2016 K. Pelckmans

More information

The Expectation-Maximization Algorithm

The Expectation-Maximization Algorithm 1/29 EM & Latent Variable Models Gaussian Mixture Models EM Theory The Expectation-Maximization Algorithm Mihaela van der Schaar Department of Engineering Science University of Oxford MLE for Latent Variable

More information

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop

Modeling Data with Linear Combinations of Basis Functions. Read Chapter 3 in the text by Bishop Modeling Data with Linear Combinations of Basis Functions Read Chapter 3 in the text by Bishop A Type of Supervised Learning Problem We want to model data (x 1, t 1 ),..., (x N, t N ), where x i is a vector

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Brown University CSCI 2950-P, Spring 2013 Prof. Erik Sudderth Lecture 13: Learning in Gaussian Graphical Models, Non-Gaussian Inference, Monte Carlo Methods Some figures

More information

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation

Variations. ECE 6540, Lecture 10 Maximum Likelihood Estimation Variations ECE 6540, Lecture 10 Last Time BLUE (Best Linear Unbiased Estimator) Formulation Advantages Disadvantages 2 The BLUE A simplification Assume the estimator is a linear system For a single parameter

More information

Markov random fields. The Markov property

Markov random fields. The Markov property Markov random fields The Markov property Discrete time: (X k X k!1,x k!2,... = (X k X k!1 A time symmetric version: (X k! X!k = (X k X k!1,x k+1 A more general version: Let A be a set of indices >k, B

More information

Gaussian predictive process models for large spatial data sets.

Gaussian predictive process models for large spatial data sets. Gaussian predictive process models for large spatial data sets. Sudipto Banerjee, Alan E. Gelfand, Andrew O. Finley, and Huiyan Sang Presenters: Halley Brantley and Chris Krut September 28, 2015 Overview

More information

Introduction to Machine Learning

Introduction to Machine Learning Outline Introduction to Machine Learning Bayesian Classification Varun Chandola March 8, 017 1. {circular,large,light,smooth,thick}, malignant. {circular,large,light,irregular,thick}, malignant 3. {oval,large,dark,smooth,thin},

More information

Point-Referenced Data Models

Point-Referenced Data Models Point-Referenced Data Models Jamie Monogan University of Georgia Spring 2013 Jamie Monogan (UGA) Point-Referenced Data Models Spring 2013 1 / 19 Objectives By the end of these meetings, participants should

More information

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall

Machine Learning. Gaussian Mixture Models. Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall Machine Learning Gaussian Mixture Models Zhiyao Duan & Bryan Pardo, Machine Learning: EECS 349 Fall 2012 1 The Generative Model POV We think of the data as being generated from some process. We assume

More information

Introduction to Machine Learning

Introduction to Machine Learning Introduction to Machine Learning 12. Gaussian Processes Alex Smola Carnegie Mellon University http://alex.smola.org/teaching/cmu2013-10-701 10-701 The Normal Distribution http://www.gaussianprocess.org/gpml/chapters/

More information

CS 195-5: Machine Learning Problem Set 1

CS 195-5: Machine Learning Problem Set 1 CS 95-5: Machine Learning Problem Set Douglas Lanman dlanman@brown.edu 7 September Regression Problem Show that the prediction errors y f(x; ŵ) are necessarily uncorrelated with any linear function of

More information

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression

Computer Vision Group Prof. Daniel Cremers. 4. Gaussian Processes - Regression Group Prof. Daniel Cremers 4. Gaussian Processes - Regression Definition (Rep.) Definition: A Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution.

More information

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr

Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen. Bayesian Learning. Tobias Scheffer, Niels Landwehr Universität Potsdam Institut für Informatik Lehrstuhl Maschinelles Lernen Bayesian Learning Tobias Scheffer, Niels Landwehr Remember: Normal Distribution Distribution over x. Density function with parameters

More information

MCMC algorithms for fitting Bayesian models

MCMC algorithms for fitting Bayesian models MCMC algorithms for fitting Bayesian models p. 1/1 MCMC algorithms for fitting Bayesian models Sudipto Banerjee sudiptob@biostat.umn.edu University of Minnesota MCMC algorithms for fitting Bayesian models

More information

Variational Inference (11/04/13)

Variational Inference (11/04/13) STA561: Probabilistic machine learning Variational Inference (11/04/13) Lecturer: Barbara Engelhardt Scribes: Matt Dickenson, Alireza Samany, Tracy Schifeling 1 Introduction In this lecture we will further

More information

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.)

Computer Vision Group Prof. Daniel Cremers. 2. Regression (cont.) Prof. Daniel Cremers 2. Regression (cont.) Regression with MLE (Rep.) Assume that y is affected by Gaussian noise : t = f(x, w)+ where Thus, we have p(t x, w, )=N (t; f(x, w), 2 ) 2 Maximum A-Posteriori

More information

Introduction to Gaussian Processes

Introduction to Gaussian Processes Introduction to Gaussian Processes Iain Murray murray@cs.toronto.edu CSC255, Introduction to Machine Learning, Fall 28 Dept. Computer Science, University of Toronto The problem Learn scalar function of

More information

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger

Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm. by Korbinian Schwinger Exponential Family and Maximum Likelihood, Gaussian Mixture Models and the EM Algorithm by Korbinian Schwinger Overview Exponential Family Maximum Likelihood The EM Algorithm Gaussian Mixture Models Exponential

More information

Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers

Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers Machine Learning and Pattern Recognition, Tutorial Sheet Number 3 Answers School of Informatics, University of Edinburgh, Instructor: Chris Williams 1. Given a dataset {(x n, y n ), n 1,..., N}, where

More information

Bayesian Approaches Data Mining Selected Technique

Bayesian Approaches Data Mining Selected Technique Bayesian Approaches Data Mining Selected Technique Henry Xiao xiao@cs.queensu.ca School of Computing Queen s University Henry Xiao CISC 873 Data Mining p. 1/17 Probabilistic Bases Review the fundamentals

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Undirected Graphical Models Mark Schmidt University of British Columbia Winter 2016 Admin Assignment 3: 2 late days to hand it in today, Thursday is final day. Assignment 4:

More information

Graphical Models and Kernel Methods

Graphical Models and Kernel Methods Graphical Models and Kernel Methods Jerry Zhu Department of Computer Sciences University of Wisconsin Madison, USA MLSS June 17, 2014 1 / 123 Outline Graphical Models Probabilistic Inference Directed vs.

More information

Introduction to Gaussian Process

Introduction to Gaussian Process Introduction to Gaussian Process CS 778 Chris Tensmeyer CS 478 INTRODUCTION 1 What Topic? Machine Learning Regression Bayesian ML Bayesian Regression Bayesian Non-parametric Gaussian Process (GP) GP Regression

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi

Boosting. CAP5610: Machine Learning Instructor: Guo-Jun Qi Boosting CAP5610: Machine Learning Instructor: Guo-Jun Qi Weak classifiers Weak classifiers Decision stump one layer decision tree Naive Bayes A classifier without feature correlations Linear classifier

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

POLI 8501 Introduction to Maximum Likelihood Estimation

POLI 8501 Introduction to Maximum Likelihood Estimation POLI 8501 Introduction to Maximum Likelihood Estimation Maximum Likelihood Intuition Consider a model that looks like this: Y i N(µ, σ 2 ) So: E(Y ) = µ V ar(y ) = σ 2 Suppose you have some data on Y,

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes Chapter 4 - Fundamentals of spatial processes Lecture notes Geir Storvik January 21, 2013 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites Mostly positive correlation Negative

More information

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2

STATS 306B: Unsupervised Learning Spring Lecture 2 April 2 STATS 306B: Unsupervised Learning Spring 2014 Lecture 2 April 2 Lecturer: Lester Mackey Scribe: Junyang Qian, Minzhe Wang 2.1 Recap In the last lecture, we formulated our working definition of unsupervised

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning MCMC and Non-Parametric Bayes Mark Schmidt University of British Columbia Winter 2016 Admin I went through project proposals: Some of you got a message on Piazza. No news is

More information

PMR Learning as Inference

PMR Learning as Inference Outline PMR Learning as Inference Probabilistic Modelling and Reasoning Amos Storkey Modelling 2 The Exponential Family 3 Bayesian Sets School of Informatics, University of Edinburgh Amos Storkey PMR Learning

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets

Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Bayesian Modeling and Inference for High-Dimensional Spatiotemporal Datasets Sudipto Banerjee University of California, Los Angeles, USA Based upon projects involving: Abhirup Datta (Johns Hopkins University)

More information

Introduction to Probabilistic Machine Learning

Introduction to Probabilistic Machine Learning Introduction to Probabilistic Machine Learning Piyush Rai Dept. of CSE, IIT Kanpur (Mini-course 1) Nov 03, 2015 Piyush Rai (IIT Kanpur) Introduction to Probabilistic Machine Learning 1 Machine Learning

More information

Linear Dynamical Systems

Linear Dynamical Systems Linear Dynamical Systems Sargur N. srihari@cedar.buffalo.edu Machine Learning Course: http://www.cedar.buffalo.edu/~srihari/cse574/index.html Two Models Described by Same Graph Latent variables Observations

More information

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation

A Framework for Daily Spatio-Temporal Stochastic Weather Simulation A Framework for Daily Spatio-Temporal Stochastic Weather Simulation, Rick Katz, Balaji Rajagopalan Geophysical Statistics Project Institute for Mathematics Applied to Geosciences National Center for Atmospheric

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Lecture 2: From Linear Regression to Kalman Filter and Beyond

Lecture 2: From Linear Regression to Kalman Filter and Beyond Lecture 2: From Linear Regression to Kalman Filter and Beyond January 18, 2017 Contents 1 Batch and Recursive Estimation 2 Towards Bayesian Filtering 3 Kalman Filter and Bayesian Filtering and Smoothing

More information

Machine Learning (CS 567) Lecture 5

Machine Learning (CS 567) Lecture 5 Machine Learning (CS 567) Lecture 5 Time: T-Th 5:00pm - 6:20pm Location: GFS 118 Instructor: Sofus A. Macskassy (macskass@usc.edu) Office: SAL 216 Office hours: by appointment Teaching assistant: Cheol

More information

Hierarchical Modeling for Multivariate Spatial Data

Hierarchical Modeling for Multivariate Spatial Data Hierarchical Modeling for Multivariate Spatial Data Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota, Minneapolis, Minnesota, U.S.A. 2 Department

More information

Computer Intensive Methods in Mathematical Statistics

Computer Intensive Methods in Mathematical Statistics Computer Intensive Methods in Mathematical Statistics Department of mathematics johawes@kth.se Lecture 16 Advanced topics in computational statistics 18 May 2017 Computer Intensive Methods (1) Plan of

More information

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015 Sequence Modelling with Features: Linear-Chain Conditional Random Fields COMP-599 Oct 6, 2015 Announcement A2 is out. Due Oct 20 at 1pm. 2 Outline Hidden Markov models: shortcomings Generative vs. discriminative

More information

Learning discrete graphical models via generalized inverse covariance matrices

Learning discrete graphical models via generalized inverse covariance matrices Learning discrete graphical models via generalized inverse covariance matrices Duzhe Wang, Yiming Lv, Yongjoon Kim, Young Lee Department of Statistics University of Wisconsin-Madison {dwang282, lv23, ykim676,

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak Gaussian processes and bayesian optimization Stanisław Jastrzębski kudkudak.github.io kudkudak Plan Goal: talk about modern hyperparameter optimization algorithms Bayes reminder: equivalent linear regression

More information

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu

Lecture: Gaussian Process Regression. STAT 6474 Instructor: Hongxiao Zhu Lecture: Gaussian Process Regression STAT 6474 Instructor: Hongxiao Zhu Motivation Reference: Marc Deisenroth s tutorial on Robot Learning. 2 Fast Learning for Autonomous Robots with Gaussian Processes

More information

Lecture 13 : Variational Inference: Mean Field Approximation

Lecture 13 : Variational Inference: Mean Field Approximation 10-708: Probabilistic Graphical Models 10-708, Spring 2017 Lecture 13 : Variational Inference: Mean Field Approximation Lecturer: Willie Neiswanger Scribes: Xupeng Tong, Minxing Liu 1 Problem Setup 1.1

More information

Introduction to Probabilistic Graphical Models: Exercises

Introduction to Probabilistic Graphical Models: Exercises Introduction to Probabilistic Graphical Models: Exercises Cédric Archambeau Xerox Research Centre Europe cedric.archambeau@xrce.xerox.com Pascal Bootcamp Marseille, France, July 2010 Exercise 1: basics

More information

Bayesian Linear Models

Bayesian Linear Models Bayesian Linear Models Sudipto Banerjee September 03 05, 2017 Department of Biostatistics, Fielding School of Public Health, University of California, Los Angeles Linear Regression Linear regression is,

More information

Frequentist-Bayesian Model Comparisons: A Simple Example

Frequentist-Bayesian Model Comparisons: A Simple Example Frequentist-Bayesian Model Comparisons: A Simple Example Consider data that consist of a signal y with additive noise: Data vector (N elements): D = y + n The additive noise n has zero mean and diagonal

More information

Practical Bayesian Optimization of Machine Learning. Learning Algorithms

Practical Bayesian Optimization of Machine Learning. Learning Algorithms Practical Bayesian Optimization of Machine Learning Algorithms CS 294 University of California, Berkeley Tuesday, April 20, 2016 Motivation Machine Learning Algorithms (MLA s) have hyperparameters that

More information

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework HT5: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford http://www.stats.ox.ac.uk/~sejdinov/sdmml.html Maximum Likelihood Principle A generative model for

More information

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so.

Midterm. Introduction to Machine Learning. CS 189 Spring Please do not open the exam before you are instructed to do so. CS 89 Spring 07 Introduction to Machine Learning Midterm Please do not open the exam before you are instructed to do so. The exam is closed book, closed notes except your one-page cheat sheet. Electronic

More information

Basic math for biology

Basic math for biology Basic math for biology Lei Li Florida State University, Feb 6, 2002 The EM algorithm: setup Parametric models: {P θ }. Data: full data (Y, X); partial data Y. Missing data: X. Likelihood and maximum likelihood

More information

Overfitting, Bias / Variance Analysis

Overfitting, Bias / Variance Analysis Overfitting, Bias / Variance Analysis Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 8, 207 / 40 Outline Administration 2 Review of last lecture 3 Basic

More information