Some Areas of Recent Research

Size: px

Start display at page:

Download "Some Areas of Recent Research"

Pierce Payne
6 years ago
Views:

1 University of Chicago Department Retreat, October 2012

2 Funders & Collaborators NSF (STATMOS), US Department of Energy Faculty: Mihai Anitescu, Liz Moyer Postdocs: Jie Chen, Bill Leeds, Ying Sun Grad students: Stefano Castruccio, Michael Horrell, Andy Poppick Undergrads: Peter Hansen, Grant Wilder

3 Preconditioning and fitting Gaussian process models Gaussian process Z determined by its mean and covariance functions: EZ(x) = (x) covfz(x); Z(y)g = K(x; y) Assume mean is 0 and covariance structure known up to parameter θ. Let K θ be covariance matrix for observations Z(x 1); : : : ; Z(x n) given θ. Then Then the loglik is (ignoring an additive constant) `(θ) = 1 2 log 1 jk(θ)j 2 Z0 K(θ) 1 Z: Problem: How to compute `(θ)? Particularly log jk(θ)j?

4 Important aside: Even when loglik can be computed exactly, maximizing it (or sampling from a posterior) may not be easy. Consider 400 evenly spaced observations on R and Z is fractional Brownian motion with variogram 1 2 E fz(x) Z(y)g2 = Γ `jx 2 with ` = 10 and = 1:5. y j Neither parameter is estimated well although there is strong evidence parameters lie along a curve in (`; ) space. Problem is worse if leave out Γ ` 2. I am unaware of any transformation independent of observation locations that would give concave loglikelihood. This kind of function makes some people in the optimization community unhappy. Things only get worse with more complex models.

5 60 40 log likelihood l α

6 Computing exact MLE Exact computations of likelihood function for n irregularly sited observations generally requires O(n 3 ) computation and O(n 2 ) memory to compute Cholesky decomposition of covariance matrix. Computation is becoming cheap much faster than memory. Increasing emphasis on matrix-free methods in which never have to store an n n matrix, even if requires more computation.

7 Iterative solution of linear equations Computing quadratic form in likelihood best done by solving systems like Kx = y, not by finding K 1. Iterative methods: for K positive definite, equivalent to minimizing 1 2 x0 Kx x 0 y, which can solve by, for example, conjugate gradient. Main computation requires multiplying vectors by K. This is fast for sparse K some structured (e.g., Toeplitz) matrices But even for dense unstructured matrices, iterative solution is matrix-free and may require many fewer flops than Cholesky decomposition: O(n 2 # iterations) v : O(n 3 ) Number of iterations for accurate solution related to condition number (ratio of largest to smallest eigenvalue) (K) of K.

8 When nearby observations strongly correlated, (K) can be very large, so need to precondition: Find a matrix P such that P 0 K(θ)P is well-conditioned for θ in vicinity of MLE and multiplying a vector by P is fast. Let Y = P 0 Z. Then the loglik (with Z as data, but written in terms of Y) equals `(θ) = 1 2 log jp0 K(θ)Pj + log jpj Okay for P to depend on θ as long as use this formula. 1 2 Y0 fp 0 K(θ)Pg 1 Y: Can ignore log jpj if it doesn t depend on θ (even if P does). What to do about log jp 0 K(θ)Pj?

9 Solve score equations instead? (Ignore preconditioning for Writing K i(θ) for K(θ), score equations are (assume mean is n o Z 0 K(θ) 1 K i(θ)k(θ) 1 Z = tr K(θ) 1 K i(θ) for i = 1; : : : ; p. First term requires only one solve. Instead of log determinant, need, for each component of θ, n o tr K(θ) 1 K i(θ) ; which requires n solves for exact calculation. Approximate by the unbiased estimate (Hutchinson, 1990) 1 N NX U 0 jk(θ) 1 K i(θ)u j ; j=1 where U j = (U j1; : : : ; U jn) 0 is random vector with U jk s iid and Pr(U jk = 1) = Pr(U jk = 1) = 1 2. Yields unbiased estimating equations.

10 Can bound statistical inefficiency of procedure in terms of (K). Thus, if can find a decent preconditioner for K, moderate N works well. Don t need N comparable to n! Preconditioning helps in two ways: Reduces number of iterations needed in iterative solver. Reduces need for large N. Scope for further improvement by choosing U j s not independent. Design of experiments! Stein, Chen and Anitescu (under revision).

11 Some other interests When low rank approximations to covariance matrices don t work. Won t discuss this here, but work likely to annoy some who have been advocating this approach for massive spatial datasets. Modeling and computation for massive (as opposed to large) space-time datasets. Without assuming covariance (or inverse covariance) matrices are low rank or sparse. Climate model emulation.

12 One-pass methods Look at data block by block and summarize the information about K(θ) from that block so that don t have to go back to raw data again (Anitescu, Horrell). Simple example: Divide data into B blocks. Within each block, approximate the loglik (or score) function. Mle of θ and observed information matrix an adequate approximation? If not, store more complete representation of loglik function. Adding loglik across blocks reduces storage with little loss of information? Save a few observations (or other summaries) from each block. Add within block approximate logliks to loglik of sparse observations. For truly massive (petascale, exascale) data, will need more than two layers.

16 +

17 +

18 + +

20 Climate model emulation Reproducing some of the output of a GCM under some forcing scenario without actually running it (Castruccio, Leeds, Moyer, Wilder). Or, better yet, producing accurate simulations of actual climate under some forcing scenario. GCM runs we have: NCAR Community Climate System Model version 3 (CCSM3), T31 resolution (approx 3:75 3:75 grid cells) Input is CO 2, output is temperature T (t) and precipitation P(t), t is year 18 forcing scenarios, 53 realizations, > 10;000 model years

21 Statistical emulation of mean Separate time series model for each of 47 regions: where T (t) = flog[co2](t) + log[co2](t 2 1)g X w i 2 log[co 2](t i) + "(t) i=2 "(t) is an autoregressive model of order 1 w i = (1 ) i. Fit with small number of scenarios and a few realizations per scenario. Compare to standard computer model emulation approach in which view (CO 2(1); : : : ; CO 2(n)) as input and (T (1); : : : ; T (n)) as output.

22 Total column ozone OMI (Ozone Monitoring Instrument, successor to TOMS) is aboard the satellite EOS Aura: Polar-orbiting. Sun-synchronous, so satellite always at local noon. Each orbit about 100 minutes, or 14.1 orbits a day. From raw data (photon counts in multiple frequency bands), levels of many trace constituents of atmosphere are deduced, including ozone. Over 80,000 observations per orbit, so over 10 6 a day. Near global coverage (no data during polar nights, some missing data). How might statistical models be used to produce better Level-3 (gridded) product than what NASA currently does?

23 Observation locations from 2 orbits latitude Date line longitude

24 Scope for fruitful interaction between statistics and numerical analysis. Information flow in both directions. Statistical problems produce new challenges in applied/computational math. Statistical/probabilistic thinking can yield new algorithms and theory for numerical analysis.

25 STATMOS Statistics in the Atmospheric and Oceanic Sciences, NSF-supported network. For anyone interested in this area, I have money for travel to rest of network (NC State, U of Washington, NCAR, etc.). For any graduate student interested in this area, I can also pay your salary while you are visiting another member of the network. If someone has postdoc money, I may be able to split cost of postdoc for research related to network goals.

Theory and Computation for Gaussian Processes

Theory and Computation for Gaussian Processes University of Chicago IPAM, February 2015 Funders & Collaborators US Department of Energy, US National Science Foundation (STATMOS) Mihai Anitescu, Jie Chen, Ying Sun Gaussian processes A process Z on