Some Areas of Recent Research

Size: px
Start display at page:

Download "Some Areas of Recent Research"

Transcription

1 University of Chicago Department Retreat, October 2012

2 Funders & Collaborators NSF (STATMOS), US Department of Energy Faculty: Mihai Anitescu, Liz Moyer Postdocs: Jie Chen, Bill Leeds, Ying Sun Grad students: Stefano Castruccio, Michael Horrell, Andy Poppick Undergrads: Peter Hansen, Grant Wilder

3 Preconditioning and fitting Gaussian process models Gaussian process Z determined by its mean and covariance functions: EZ(x) = (x) covfz(x); Z(y)g = K(x; y) Assume mean is 0 and covariance structure known up to parameter θ. Let K θ be covariance matrix for observations Z(x 1); : : : ; Z(x n) given θ. Then Then the loglik is (ignoring an additive constant) `(θ) = 1 2 log 1 jk(θ)j 2 Z0 K(θ) 1 Z: Problem: How to compute `(θ)? Particularly log jk(θ)j?

4 Important aside: Even when loglik can be computed exactly, maximizing it (or sampling from a posterior) may not be easy. Consider 400 evenly spaced observations on R and Z is fractional Brownian motion with variogram 1 2 E fz(x) Z(y)g2 = Γ `jx 2 with ` = 10 and = 1:5. y j Neither parameter is estimated well although there is strong evidence parameters lie along a curve in (`; ) space. Problem is worse if leave out Γ ` 2. I am unaware of any transformation independent of observation locations that would give concave loglikelihood. This kind of function makes some people in the optimization community unhappy. Things only get worse with more complex models.

5 60 40 log likelihood l α

6 Computing exact MLE Exact computations of likelihood function for n irregularly sited observations generally requires O(n 3 ) computation and O(n 2 ) memory to compute Cholesky decomposition of covariance matrix. Computation is becoming cheap much faster than memory. Increasing emphasis on matrix-free methods in which never have to store an n n matrix, even if requires more computation.

7 Iterative solution of linear equations Computing quadratic form in likelihood best done by solving systems like Kx = y, not by finding K 1. Iterative methods: for K positive definite, equivalent to minimizing 1 2 x0 Kx x 0 y, which can solve by, for example, conjugate gradient. Main computation requires multiplying vectors by K. This is fast for sparse K some structured (e.g., Toeplitz) matrices But even for dense unstructured matrices, iterative solution is matrix-free and may require many fewer flops than Cholesky decomposition: O(n 2 # iterations) v : O(n 3 ) Number of iterations for accurate solution related to condition number (ratio of largest to smallest eigenvalue) (K) of K.

8 When nearby observations strongly correlated, (K) can be very large, so need to precondition: Find a matrix P such that P 0 K(θ)P is well-conditioned for θ in vicinity of MLE and multiplying a vector by P is fast. Let Y = P 0 Z. Then the loglik (with Z as data, but written in terms of Y) equals `(θ) = 1 2 log jp0 K(θ)Pj + log jpj Okay for P to depend on θ as long as use this formula. 1 2 Y0 fp 0 K(θ)Pg 1 Y: Can ignore log jpj if it doesn t depend on θ (even if P does). What to do about log jp 0 K(θ)Pj?

9 Solve score equations instead? (Ignore preconditioning for Writing K i(θ) for K(θ), score equations are (assume mean is n o Z 0 K(θ) 1 K i(θ)k(θ) 1 Z = tr K(θ) 1 K i(θ) for i = 1; : : : ; p. First term requires only one solve. Instead of log determinant, need, for each component of θ, n o tr K(θ) 1 K i(θ) ; which requires n solves for exact calculation. Approximate by the unbiased estimate (Hutchinson, 1990) 1 N NX U 0 jk(θ) 1 K i(θ)u j ; j=1 where U j = (U j1; : : : ; U jn) 0 is random vector with U jk s iid and Pr(U jk = 1) = Pr(U jk = 1) = 1 2. Yields unbiased estimating equations.

10 Can bound statistical inefficiency of procedure in terms of (K). Thus, if can find a decent preconditioner for K, moderate N works well. Don t need N comparable to n! Preconditioning helps in two ways: Reduces number of iterations needed in iterative solver. Reduces need for large N. Scope for further improvement by choosing U j s not independent. Design of experiments! Stein, Chen and Anitescu (under revision).

11 Some other interests When low rank approximations to covariance matrices don t work. Won t discuss this here, but work likely to annoy some who have been advocating this approach for massive spatial datasets. Modeling and computation for massive (as opposed to large) space-time datasets. Without assuming covariance (or inverse covariance) matrices are low rank or sparse. Climate model emulation.

12 One-pass methods Look at data block by block and summarize the information about K(θ) from that block so that don t have to go back to raw data again (Anitescu, Horrell). Simple example: Divide data into B blocks. Within each block, approximate the loglik (or score) function. Mle of θ and observed information matrix an adequate approximation? If not, store more complete representation of loglik function. Adding loglik across blocks reduces storage with little loss of information? Save a few observations (or other summaries) from each block. Add within block approximate logliks to loglik of sparse observations. For truly massive (petascale, exascale) data, will need more than two layers.

13

14

15

16 +

17 +

18 + +

19

20 Climate model emulation Reproducing some of the output of a GCM under some forcing scenario without actually running it (Castruccio, Leeds, Moyer, Wilder). Or, better yet, producing accurate simulations of actual climate under some forcing scenario. GCM runs we have: NCAR Community Climate System Model version 3 (CCSM3), T31 resolution (approx 3:75 3:75 grid cells) Input is CO 2, output is temperature T (t) and precipitation P(t), t is year 18 forcing scenarios, 53 realizations, > 10;000 model years

21 Statistical emulation of mean Separate time series model for each of 47 regions: where T (t) = flog[co2](t) + log[co2](t 2 1)g X w i 2 log[co 2](t i) + "(t) i=2 "(t) is an autoregressive model of order 1 w i = (1 ) i. Fit with small number of scenarios and a few realizations per scenario. Compare to standard computer model emulation approach in which view (CO 2(1); : : : ; CO 2(n)) as input and (T (1); : : : ; T (n)) as output.

22 Total column ozone OMI (Ozone Monitoring Instrument, successor to TOMS) is aboard the satellite EOS Aura: Polar-orbiting. Sun-synchronous, so satellite always at local noon. Each orbit about 100 minutes, or 14.1 orbits a day. From raw data (photon counts in multiple frequency bands), levels of many trace constituents of atmosphere are deduced, including ozone. Over 80,000 observations per orbit, so over 10 6 a day. Near global coverage (no data during polar nights, some missing data). How might statistical models be used to produce better Level-3 (gridded) product than what NASA currently does?

23 Observation locations from 2 orbits latitude Date line longitude

24 Scope for fruitful interaction between statistics and numerical analysis. Information flow in both directions. Statistical problems produce new challenges in applied/computational math. Statistical/probabilistic thinking can yield new algorithms and theory for numerical analysis.

25 STATMOS Statistics in the Atmospheric and Oceanic Sciences, NSF-supported network. For anyone interested in this area, I have money for travel to rest of network (NC State, U of Washington, NCAR, etc.). For any graduate student interested in this area, I can also pay your salary while you are visiting another member of the network. If someone has postdoc money, I may be able to split cost of postdoc for research related to network goals.

Theory and Computation for Gaussian Processes

Theory and Computation for Gaussian Processes University of Chicago IPAM, February 2015 Funders & Collaborators US Department of Energy, US National Science Foundation (STATMOS) Mihai Anitescu, Jie Chen, Ying Sun Gaussian processes A process Z on

More information

The Matrix Reloaded: Computations for large spatial data sets

The Matrix Reloaded: Computations for large spatial data sets The Matrix Reloaded: Computations for large spatial data sets The spatial model Solving linear systems Matrix multiplication Creating sparsity Doug Nychka National Center for Atmospheric Research Sparsity,

More information

The Matrix Reloaded: Computations for large spatial data sets

The Matrix Reloaded: Computations for large spatial data sets The Matrix Reloaded: Computations for large spatial data sets Doug Nychka National Center for Atmospheric Research The spatial model Solving linear systems Matrix multiplication Creating sparsity Sparsity,

More information

An Inversion-Free Estimating Equations Approach for. Gaussian Process Models

An Inversion-Free Estimating Equations Approach for. Gaussian Process Models An Inversion-Free Estimating Equations Approach for Gaussian Process Models Mihai Anitescu Jie Chen Michael L. Stein November 29, 2015 Abstract One of the scalability bottlenecks for the large-scale usage

More information

Statistica Sinica Preprint No: SS wR2

Statistica Sinica Preprint No: SS wR2 Statistica Sinica Preprint No: SS-13-227wR2 Title A covariance parameter estimation method for polar-orbiting satellite data Manuscript ID SS-13-227wR2 URL http://www.stat.sinica.edu.tw/statistica/ DOI

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Fixed-domain Asymptotics of Covariance Matrices and Preconditioning

Fixed-domain Asymptotics of Covariance Matrices and Preconditioning Fixed-domain Asymptotics of Covariance Matrices and Preconditioning Jie Chen IBM Thomas J. Watson Research Center Presented at Preconditioning Conference, August 1, 2017 Jie Chen (IBM Research) Covariance

More information

Scientific Computing

Scientific Computing Scientific Computing Direct solution methods Martin van Gijzen Delft University of Technology October 3, 2018 1 Program October 3 Matrix norms LU decomposition Basic algorithm Cost Stability Pivoting Pivoting

More information

Nonparametric Bayesian Methods

Nonparametric Bayesian Methods Nonparametric Bayesian Methods Debdeep Pati Florida State University October 2, 2014 Large spatial datasets (Problem of big n) Large observational and computer-generated datasets: Often have spatial and

More information

Supplemental Material for. Statistical Emulation of Climate Model Projections based on Precomputed GCM Runs

Supplemental Material for. Statistical Emulation of Climate Model Projections based on Precomputed GCM Runs Supplemental Material for Statistical Emulation of Climate Model Projections based on Precomputed GCM Runs Stefano Castruccio Department of Statistics, University of Chicago, Chicago, Illinois David J.

More information

Computer Models of the Earth s Climate

Computer Models of the Earth s Climate Computer Models of the Earth s Climate DARGAN M. W. FRIERSON DEPARTMENT OF ATMOSPHERIC SCIENCES MATH DAY, 3-25-13 Climate Models Climate Models Climate Models Mathematical model: uses equations to describe

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Solutions Parabola Volume 49, Issue 2 (2013)

Solutions Parabola Volume 49, Issue 2 (2013) Parabola Volume 49, Issue (013) Solutions 1411 140 Q1411 How many three digit numbers are there which do not contain any digit more than once? What do you get if you add them all up? SOLUTION There are

More information

Curve Fitting Re-visited, Bishop1.2.5

Curve Fitting Re-visited, Bishop1.2.5 Curve Fitting Re-visited, Bishop1.2.5 Maximum Likelihood Bishop 1.2.5 Model Likelihood differentiation p(t x, w, β) = Maximum Likelihood N N ( t n y(x n, w), β 1). (1.61) n=1 As we did in the case of the

More information

The Conjugate Gradient Method

The Conjugate Gradient Method The Conjugate Gradient Method Classical Iterations We have a problem, We assume that the matrix comes from a discretization of a PDE. The best and most popular model problem is, The matrix will be as large

More information

Approximate Principal Components Analysis of Large Data Sets

Approximate Principal Components Analysis of Large Data Sets Approximate Principal Components Analysis of Large Data Sets Daniel J. McDonald Department of Statistics Indiana University mypage.iu.edu/ dajmcdon April 27, 2016 Approximation-Regularization for Analysis

More information

9.1 Preconditioned Krylov Subspace Methods

9.1 Preconditioned Krylov Subspace Methods Chapter 9 PRECONDITIONING 9.1 Preconditioned Krylov Subspace Methods 9.2 Preconditioned Conjugate Gradient 9.3 Preconditioned Generalized Minimal Residual 9.4 Relaxation Method Preconditioners 9.5 Incomplete

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Numerical Methods I Non-Square and Sparse Linear Systems

Numerical Methods I Non-Square and Sparse Linear Systems Numerical Methods I Non-Square and Sparse Linear Systems Aleksandar Donev Courant Institute, NYU 1 donev@courant.nyu.edu 1 MATH-GA 2011.003 / CSCI-GA 2945.003, Fall 2014 September 25th, 2014 A. Donev (Courant

More information

HOMEWORK 10 SOLUTIONS

HOMEWORK 10 SOLUTIONS HOMEWORK 10 SOLUTIONS MATH 170A Problem 0.1. Watkins 8.3.10 Solution. The k-th error is e (k) = G k e (0). As discussed before, that means that e (k+j) ρ(g) k, i.e., the norm of the error is approximately

More information

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012 Parametric Models Dr. Shuang LIANG School of Software Engineering TongJi University Fall, 2012 Today s Topics Maximum Likelihood Estimation Bayesian Density Estimation Today s Topics Maximum Likelihood

More information

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides

Probabilistic modeling. The slides are closely adapted from Subhransu Maji s slides Probabilistic modeling The slides are closely adapted from Subhransu Maji s slides Overview So far the models and algorithms you have learned about are relatively disconnected Probabilistic modeling framework

More information

Linear Solvers. Andrew Hazel

Linear Solvers. Andrew Hazel Linear Solvers Andrew Hazel Introduction Thus far we have talked about the formulation and discretisation of physical problems...... and stopped when we got to a discrete linear system of equations. Introduction

More information

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter

Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Spatial bias modeling with application to assessing remotely-sensed aerosol as a proxy for particulate matter Chris Paciorek Department of Biostatistics Harvard School of Public Health application joint

More information

Chapter 7 Iterative Techniques in Matrix Algebra

Chapter 7 Iterative Techniques in Matrix Algebra Chapter 7 Iterative Techniques in Matrix Algebra Per-Olof Persson persson@berkeley.edu Department of Mathematics University of California, Berkeley Math 128B Numerical Analysis Vector Norms Definition

More information

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization Numerical Linear Algebra Primer Ryan Tibshirani Convex Optimization 10-725 Consider Last time: proximal Newton method min x g(x) + h(x) where g, h convex, g twice differentiable, and h simple. Proximal

More information

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. CS 189 Spring 2015 Introduction to Machine Learning Midterm You have 80 minutes for the exam. The exam is closed book, closed notes except your one-page crib sheet. No calculators or electronic items.

More information

CS 542G: Conditioning, BLAS, LU Factorization

CS 542G: Conditioning, BLAS, LU Factorization CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles

More information

Loglikelihood and Confidence Intervals

Loglikelihood and Confidence Intervals Stat 504, Lecture 2 1 Loglikelihood and Confidence Intervals The loglikelihood function is defined to be the natural logarithm of the likelihood function, l(θ ; x) = log L(θ ; x). For a variety of reasons,

More information

Lecture 13: Simple Linear Regression in Matrix Format

Lecture 13: Simple Linear Regression in Matrix Format See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix

More information

Course Notes: Week 1

Course Notes: Week 1 Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction (3/28/11) We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues

More information

Scalable kernel methods and their use in black-box optimization

Scalable kernel methods and their use in black-box optimization with derivatives Scalable kernel methods and their use in black-box optimization David Eriksson Center for Applied Mathematics Cornell University dme65@cornell.edu November 9, 2018 1 2 3 4 1/37 with derivatives

More information

ECE521 lecture 4: 19 January Optimization, MLE, regularization

ECE521 lecture 4: 19 January Optimization, MLE, regularization ECE521 lecture 4: 19 January 2017 Optimization, MLE, regularization First four lectures Lectures 1 and 2: Intro to ML Probability review Types of loss functions and algorithms Lecture 3: KNN Convexity

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables? Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what

More information

Stochastic Analogues to Deterministic Optimizers

Stochastic Analogues to Deterministic Optimizers Stochastic Analogues to Deterministic Optimizers ISMP 2018 Bordeaux, France Vivak Patel Presented by: Mihai Anitescu July 6, 2018 1 Apology I apologize for not being here to give this talk myself. I injured

More information

Bias-Variance Tradeoff

Bias-Variance Tradeoff What s learning, revisited Overfitting Generative versus Discriminative Logistic Regression Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University September 19 th, 2007 Bias-Variance Tradeoff

More information

A COVARIANCE PARAMETER ESTIMATION METHOD FOR POLAR-ORBITING SATELLITE DATA

A COVARIANCE PARAMETER ESTIMATION METHOD FOR POLAR-ORBITING SATELLITE DATA Statistica Sinica 25 (2015), 41-59 doi:http://dx.doi.org/10.5705/ss.2013.227w A COVARIANCE PARAMETER ESTIMATION METHOD FOR POLAR-ORBITING SATELLITE DATA Michael T. Horrell and Michael L. Stein University

More information

From Stationary Methods to Krylov Subspaces

From Stationary Methods to Krylov Subspaces Week 6: Wednesday, Mar 7 From Stationary Methods to Krylov Subspaces Last time, we discussed stationary methods for the iterative solution of linear systems of equations, which can generally be written

More information

Lab 1: Iterative Methods for Solving Linear Systems

Lab 1: Iterative Methods for Solving Linear Systems Lab 1: Iterative Methods for Solving Linear Systems January 22, 2017 Introduction Many real world applications require the solution to very large and sparse linear systems where direct methods such as

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

Algebra I EOC Review (Part 2)

Algebra I EOC Review (Part 2) 1. Let x = total miles the car can travel Answer: x 22 = 18 or x 18 = 22 2. A = 1 2 ah 1 2 bh A = 1 h(a b) 2 2A = h(a b) 2A = h a b Note that when solving for a variable that appears more than once, consider

More information

But if z is conditioned on, we need to model it:

But if z is conditioned on, we need to model it: Partially Unobserved Variables Lecture 8: Unsupervised Learning & EM Algorithm Sam Roweis October 28, 2003 Certain variables q in our models may be unobserved, either at training time or at test time or

More information

Notes on Markov Networks

Notes on Markov Networks Notes on Markov Networks Lili Mou moull12@sei.pku.edu.cn December, 2014 This note covers basic topics in Markov networks. We mainly talk about the formal definition, Gibbs sampling for inference, and maximum

More information

LA Support for Scalable Kernel Methods. David Bindel 29 Sep 2018

LA Support for Scalable Kernel Methods. David Bindel 29 Sep 2018 LA Support for Scalable Kernel Methods David Bindel 29 Sep 2018 Collaborators Kun Dong (Cornell CAM) David Eriksson (Cornell CAM) Jake Gardner (Cornell CS) Eric Lee (Cornell CS) Hannes Nickisch (Phillips

More information

Current Status of the Stratospheric Ozone Layer From: UNEP Environmental Effects of Ozone Depletion and Its Interaction with Climate Change

Current Status of the Stratospheric Ozone Layer From: UNEP Environmental Effects of Ozone Depletion and Its Interaction with Climate Change Goals Produce a data product that allows users to acquire time series of the distribution of UV-B radiation across the continental USA, based upon measurements from the UVMRP. Provide data in a format

More information

A MATRIX-FREE APPROACH FOR SOLVING THE PARAMETRIC GAUSSIAN PROCESS MAXIMUM LIKELIHOOD PROBLEM

A MATRIX-FREE APPROACH FOR SOLVING THE PARAMETRIC GAUSSIAN PROCESS MAXIMUM LIKELIHOOD PROBLEM Preprint ANL/MCS-P1857-0311 A MATRIX-FREE APPROACH FOR SOLVING THE PARAMETRIC GAUSSIAN PROCESS MAXIMUM LIKELIHOOD PROBLEM MIHAI ANITESCU, JIE CHEN, AND LEI WANG Abstract. Gaussian processes are the cornerstone

More information

Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI

Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI Solution to Laplace Equation using Preconditioned Conjugate Gradient Method with Compressed Row Storage using MPI Sagar Bhatt Person Number: 50170651 Department of Mechanical and Aerospace Engineering,

More information

The convergence of stationary iterations with indefinite splitting

The convergence of stationary iterations with indefinite splitting The convergence of stationary iterations with indefinite splitting Michael C. Ferris Joint work with: Tom Rutherford and Andy Wathen University of Wisconsin, Madison 6th International Conference on Complementarity

More information

Quasi-Newton Methods

Quasi-Newton Methods Newton s Method Pros and Cons Quasi-Newton Methods MA 348 Kurt Bryan Newton s method has some very nice properties: It s extremely fast, at least once it gets near the minimum, and with the simple modifications

More information

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z)

Variables which are always unobserved are called latent variables or sometimes hidden variables. e.g. given y,x fit the model p(y x) = z p(y x,z)p(z) CSC2515 Machine Learning Sam Roweis Lecture 8: Unsupervised Learning & EM Algorithm October 31, 2006 Partially Unobserved Variables 2 Certain variables q in our models may be unobserved, either at training

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AMS526: Numerical Analysis I (Numerical Linear Algebra) AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 24: Preconditioning and Multigrid Solver Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 5 Preconditioning Motivation:

More information

Introduction to Machine Learning Midterm Exam

Introduction to Machine Learning Midterm Exam 10-701 Introduction to Machine Learning Midterm Exam Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes, but

More information

Non-stationary Cross-Covariance Models for Multivariate Processes on a Globe

Non-stationary Cross-Covariance Models for Multivariate Processes on a Globe Scandinavian Journal of Statistics, Vol. 38: 726 747, 2011 doi: 10.1111/j.1467-9469.2011.00751.x Published by Blackwell Publishing Ltd. Non-stationary Cross-Covariance Models for Multivariate Processes

More information

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013

Machine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013 Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you

More information

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Credible Intervals, Confidence Intervals, and Limits. Department of Physics and Astronomy University of Rochester Physics 403 Credible Intervals, Confidence Intervals, and Limits Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Summarizing Parameters with a Range Bayesian

More information

Theory of Maximum Likelihood Estimation. Konstantin Kashin

Theory of Maximum Likelihood Estimation. Konstantin Kashin Gov 2001 Section 5: Theory of Maximum Likelihood Estimation Konstantin Kashin February 28, 2013 Outline Introduction Likelihood Examples of MLE Variance of MLE Asymptotic Properties What is Statistical

More information

Biostat 2065 Analysis of Incomplete Data

Biostat 2065 Analysis of Incomplete Data Biostat 2065 Analysis of Incomplete Data Gong Tang Dept of Biostatistics University of Pittsburgh October 20, 2005 1. Large-sample inference based on ML Let θ is the MLE, then the large-sample theory implies

More information

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52 Conjugate

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences) Lecture 19: Computing the SVD; Sparse Linear Systems Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical

More information

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http

More information

COM336: Neural Computing

COM336: Neural Computing COM336: Neural Computing http://www.dcs.shef.ac.uk/ sjr/com336/ Lecture 2: Density Estimation Steve Renals Department of Computer Science University of Sheffield Sheffield S1 4DP UK email: s.renals@dcs.shef.ac.uk

More information

E = UV W (9.1) = I Q > V W

E = UV W (9.1) = I Q > V W 91 9. EOFs, SVD A common statistical tool in oceanography, meteorology and climate research are the so-called empirical orthogonal functions (EOFs). Anyone, in any scientific field, working with large

More information

6.4 Krylov Subspaces and Conjugate Gradients

6.4 Krylov Subspaces and Conjugate Gradients 6.4 Krylov Subspaces and Conjugate Gradients Our original equation is Ax = b. The preconditioned equation is P Ax = P b. When we write P, we never intend that an inverse will be explicitly computed. P

More information

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2

Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, Jeffreys priors. exp 1 ) p 2 Stat260: Bayesian Modeling and Inference Lecture Date: February 10th, 2010 Jeffreys priors Lecturer: Michael I. Jordan Scribe: Timothy Hunter 1 Priors for the multivariate Gaussian Consider a multivariate

More information

Multi-resolution models for large data sets

Multi-resolution models for large data sets Multi-resolution models for large data sets Douglas Nychka, National Center for Atmospheric Research National Science Foundation Iowa State March, 2013 Credits Steve Sain, Tamra Greasby, NCAR Tia LeRud,

More information

18.05 Practice Final Exam

18.05 Practice Final Exam No calculators. 18.05 Practice Final Exam Number of problems 16 concept questions, 16 problems. Simplifying expressions Unless asked to explicitly, you don t need to simplify complicated expressions. For

More information

Fast Dimension-Reduced Climate Model Calibration and the Effect of Data Aggregation

Fast Dimension-Reduced Climate Model Calibration and the Effect of Data Aggregation Fast Dimension-Reduced Climate Model Calibration and the Effect of Data Aggregation Won Chang Post Doctoral Scholar, Department of Statistics, University of Chicago Oct 15, 2014 Thesis Advisors: Murali

More information

Sub-kilometer-scale space-time stochastic rainfall simulation

Sub-kilometer-scale space-time stochastic rainfall simulation Picture: Huw Alexander Ogilvie Sub-kilometer-scale space-time stochastic rainfall simulation Lionel Benoit (University of Lausanne) Gregoire Mariethoz (University of Lausanne) Denis Allard (INRA Avignon)

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Better Simulation Metamodeling: The Why, What and How of Stochastic Kriging

Better Simulation Metamodeling: The Why, What and How of Stochastic Kriging Better Simulation Metamodeling: The Why, What and How of Stochastic Kriging Jeremy Staum Collaborators: Bruce Ankenman, Barry Nelson Evren Baysal, Ming Liu, Wei Xie supported by the NSF under Grant No.

More information

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning Xiangmin Jiao SUNY Stony Brook Xiangmin Jiao Numerical Analysis I 1 / 18 Outline

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1)

Chapter 3: Maximum-Likelihood & Bayesian Parameter Estimation (part 1) HW 1 due today Parameter Estimation Biometrics CSE 190 Lecture 7 Today s lecture was on the blackboard. These slides are an alternative presentation of the material. CSE190, Winter10 CSE190, Winter10 Chapter

More information

Probabilistic Graphical Models

Probabilistic Graphical Models Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages

18.05 Final Exam. Good luck! Name. No calculators. Number of problems 16 concept questions, 16 problems, 21 pages Name No calculators. 18.05 Final Exam Number of problems 16 concept questions, 16 problems, 21 pages Extra paper If you need more space we will provide some blank paper. Indicate clearly that your solution

More information

Statistical Estimation

Statistical Estimation Statistical Estimation Use data and a model. The plug-in estimators are based on the simple principle of applying the defining functional to the ECDF. Other methods of estimation: minimize residuals from

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation Maximum Likelihood Estimation Guy Lebanon February 19, 2011 Maximum likelihood estimation is the most popular general purpose method for obtaining estimating a distribution from a finite sample. It was

More information

BAYESIAN HIERARCHICAL MODELS FOR EXTREME EVENT ATTRIBUTION

BAYESIAN HIERARCHICAL MODELS FOR EXTREME EVENT ATTRIBUTION BAYESIAN HIERARCHICAL MODELS FOR EXTREME EVENT ATTRIBUTION Richard L Smith University of North Carolina and SAMSI (Joint with Michael Wehner, Lawrence Berkeley Lab) IDAG Meeting Boulder, February 1-3,

More information

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294) Conjugate gradient method Descent method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps

More information

CPSC 540: Machine Learning

CPSC 540: Machine Learning CPSC 540: Machine Learning Multivariate Gaussians Mark Schmidt University of British Columbia Winter 2019 Last Time: Multivariate Gaussian http://personal.kenyon.edu/hartlaub/mellonproject/bivariate2.html

More information

Statistics Research in Remote Sensing Data Analysis for Climate Science at the Jet Propulsion Laboratory

Statistics Research in Remote Sensing Data Analysis for Climate Science at the Jet Propulsion Laboratory Statistics Research in Remote Sensing Data Analysis for Climate Science at the Jet Propulsion Laboratory Amy Braverman Jet Propulsion Laboratory, California Institute of Technology Mail Stop 306-463 4800

More information

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington

Bayesian Classifiers and Probability Estimation. Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington Bayesian Classifiers and Probability Estimation Vassilis Athitsos CSE 4308/5360: Artificial Intelligence I University of Texas at Arlington 1 Data Space Suppose that we have a classification problem The

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent

Unsupervised Machine Learning and Data Mining. DS 5230 / DS Fall Lecture 7. Jan-Willem van de Meent Unsupervised Machine Learning and Data Mining DS 5230 / DS 4420 - Fall 2018 Lecture 7 Jan-Willem van de Meent DIMENSIONALITY REDUCTION Borrowing from: Percy Liang (Stanford) Dimensionality Reduction Goal:

More information

Consider the following example of a linear system:

Consider the following example of a linear system: LINEAR SYSTEMS Consider the following example of a linear system: Its unique solution is x + 2x 2 + 3x 3 = 5 x + x 3 = 3 3x + x 2 + 3x 3 = 3 x =, x 2 = 0, x 3 = 2 In general we want to solve n equations

More information

Sampling and incomplete network data

Sampling and incomplete network data 1/58 Sampling and incomplete network data 567 Statistical analysis of social networks Peter Hoff Statistics, University of Washington 2/58 Network sampling methods It is sometimes difficult to obtain a

More information

Bayesian Methods: Naïve Bayes

Bayesian Methods: Naïve Bayes Bayesian Methods: aïve Bayes icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Last Time Parameter learning Learning the parameter of a simple coin flipping model Prior

More information

Climate Change: the Uncertainty of Certainty

Climate Change: the Uncertainty of Certainty Climate Change: the Uncertainty of Certainty Reinhard Furrer, UZH JSS, Geneva Oct. 30, 2009 Collaboration with: Stephan Sain - NCAR Reto Knutti - ETHZ Claudia Tebaldi - Climate Central Ryan Ford, Doug

More information

An Introduction to Gaussian Processes for Spatial Data (Predictions!)

An Introduction to Gaussian Processes for Spatial Data (Predictions!) An Introduction to Gaussian Processes for Spatial Data (Predictions!) Matthew Kupilik College of Engineering Seminar Series Nov 216 Matthew Kupilik UAA GP for Spatial Data Nov 216 1 / 35 Why? When evidence

More information

Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE

Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE Consistent Downscaling of Seismic Inversions to Cornerpoint Flow Models SPE 103268 Subhash Kalla LSU Christopher D. White LSU James S. Gunning CSIRO Michael E. Glinsky BHP-Billiton Contents Method overview

More information

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians

Exercise Sheet 1. 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Exercise Sheet 1 1 Probability revision 1: Student-t as an infinite mixture of Gaussians Show that an infinite mixture of Gaussian distributions, with Gamma distributions as mixing weights in the following

More information

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts)

Quiz 1 Solutions. Problem 2. Asymptotics & Recurrences [20 points] (3 parts) Introduction to Algorithms October 13, 2010 Massachusetts Institute of Technology 6.006 Fall 2010 Professors Konstantinos Daskalakis and Patrick Jaillet Quiz 1 Solutions Quiz 1 Solutions Problem 1. We

More information

Naive Bayes and Gaussian Bayes Classifier

Naive Bayes and Gaussian Bayes Classifier Naive Bayes and Gaussian Bayes Classifier Elias Tragas tragas@cs.toronto.edu October 3, 2016 Elias Tragas Naive Bayes and Gaussian Bayes Classifier October 3, 2016 1 / 23 Naive Bayes Bayes Rules: Naive

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate

Mixture Models & EM. Nicholas Ruozzi University of Texas at Dallas. based on the slides of Vibhav Gogate Mixture Models & EM icholas Ruozzi University of Texas at Dallas based on the slides of Vibhav Gogate Previously We looed at -means and hierarchical clustering as mechanisms for unsupervised learning -means

More information