Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis

Size: px
Start display at page:

Download "Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis"

Transcription

1 Fitting Large-Scale Spatial Models with Applications to Microarray Data Analysis Stephan R Sain Department of Mathematics University of Colorado at Denver Denver, Colorado ssain@mathcudenveredu Reinhard Furrer Geophysical Statistics Project National Center for Atmospheric Research Boulder, Colorado furrer@ucaredu Many problems in the environmental and biological sciences involve the analysis of large quantities of data Further, the data in these problems are often subject to various types of structure and, in particular, spatial dependence Traditional model fitting often fails due to the size of the datasets since it is difficult to not only specify but also to compute with the full covariance matrix For example, a single microarray can include over 400,000 individual observations We propose using a very general type of mixed model that has a random spatial component Recognizing that spatial covariance matrices often exhibit a large number of zero or near-zero entries, covariance tapering is used to force near-zero entries to zero Then, taking advantage of the sparse nature of such tapered covariance matrices, backfitting is used to estimate the fixed and random model parameters Results will be demonstrated on a experiment using microarrays to build a profile of differentially expressed genes relating to cerebral vascular malformations, an important cause of hemorrhagic stroke and seizures Keywords: Mixed effects; Backfitting; Covariance Tapering; Sparse matrices 1 Introduction Many spatial problems are inherently multivariate with more than one measurement or observation at each spatial location Moreover, many spatial problems involve a large number of spatial locations This leads to serious computational difficulties in constructing, storing, and manipulating very large regression and covariance matrices Such problems arise in a number of areas from traditional environmental statistics and epidemiology to new approaches to biological problems For example, the authors present research in this area include combining observed climate data and climate models to examine climate model behavior as well as predictions of climate change In this setting, there are two variables, precipitation and temperature, for sixteen different models on a 5 grid resulting in observations per climate model Of particular interest in this paper is a problem of considerable current interest, namely analyzing microarray data for biological experiments In this case, we are attempting to build a profile of differentially expressed genes relating to cerebral vascular malformation (Shenkar et al, 2003) In this study, there are roughly twenty gene chips with three disease groups (control and two disease states) and with each chip basically a array of approximately 400,000 observations We propose a simple, multivariate, additive (mixed-effects) spatial model and discuss some strategies for fitting such models and estimating model parameters 1

2 2 SAIN AND FURRER when the size of the data structures are large There are two key aspects First, recognizing that many if not most of the elements of the spatial covariance matrices are zero or near-zero, covariance tapering is used to force near-zero entries to zero which introduces a great deal of sparseness in the covariance matrices This sparseness allows such matrices to be stored and manipulated more efficiently Second, the additive structure in the model is exploited using a backfitting algorithm for parameter estimation The next section develops the model in detail while Section 3 discusses several computational issues when using huge datasets with backfitting algorithms Section 4 shows qualitative and quantitative results of a small example using microarray data analysis Finally, we discuss in Section 5 current and future research of this longterm project 2 A Multivariate, Additive Spatial Model A simple, multivariate, additive (mixed-effects) spatial model for an observation vector Y can be written as Y = Xβ + h + ɛ, (1) where Xβ represent fixed effects; h represents a random, zero-mean spatial process with Var(h) = Σ h ; ɛ represents a random, zero-mean error process with Var(ɛ) = Σ ɛ, orthonormal to h Model (1) is generic; in the case of gene expression on a chip, the fixed effects can be expanded to Y = β mean + Rβ row + Cβ col + Gβ Gene + h + ɛ (2) From a biological point of view, β Gene is the quantity of interest whereas the chip specific effects β Chip = [ β mean, β T row, β T col ]T are ancillary and are included to account for any chip specific, large-scale trends observed in the data It is convenient to parameterize the spatial covariance matrix by θ (Σ h = K(θ)) and to assume a white noise measurement error (Σ ɛ = σ 2 I) Suppose we have k different chips having an identical gene layout Then, model (1) is expanded to F 0 0 G Y 1 Y k = 0 F F G β Chip 1 β Chip k β Gene + h Chip 1 h Chip k + ɛ Chip 1 ɛ Chip k (3) with F = [ 1, R, C ] and where the spatial processes h Chip i are mutually independent The covariance matrices of the spatial and the random process are assumed to take the forms K σ 2 1I 0 0 Σ h = 0 K 2 0 and Σ ɛ = 0 σ 2 2I 0, (4) 0 0 K k 0 0 σk 2I

3 FITTING LARGE-SCALE SPATIAL MODELS 3 where K i = K(θ i ) represents a chip specific spatial covariance matrix parameterized by θ i ; are chip specific variances, called nugget effect in geostatistical literature σ 2 i The independence assumption across chips is justified by the fact that, typically, the chips are based on unique tissue samples, often from different individuals The chip specific fixed (β Chip i ) and random, spatial (h Chip i ) effects are included to account for chip specific large-scale and small-scale spatial trends This type of structure is able to model non-linear relationships in much the same fashion as smoothing splines (Nychka, 2000) Note that this model can also be written in an additive fashion, separating chip specific and gene specific effects For small samples, one could use ML or REML (Kitanidis, 1997; Stein, 1999) to fit covariance parameters, estimates of β and predictions of the random effects follow directly: β = (X T V 1 X) 1 X T V 1 Y (generalized least-squares) (5) ĥ = Σ h V 1 (Y X β) where V = Σ h + Σ ɛ These estimates are equivalent to the universal kriging solutions, ie the best linear unbiased predictor (eg Cressie, 1993) In our setting, direct computation with the design and covariance matrices is impossible as the observation vector Y, even with only one or two chips is too big We solve this problem with backfitting algorithms outlined below 21 Backfitting with One Chip Backfitting procedures are widely used in additive or generalized linear/additive models, eg Breiman and Friedman (1985); Buja et al (1989) Applied to equation (1), the backfitting algorithm consists of estimating iteratively the fixed effects β (regression step) and the spatial process h (kriging step), as schematized below [1] Let ĥ(0) be an initial guess and put j = 1 [2] β(j) = ( X T X ) 1 X T ( Y ĥ(j 1)) [3] Estimate covariance parameters to get θ (j) and σ 2(j), then put ĥ (j) = Σ ( V 1 h Y X β (j)) [4] Put j = j + 1 and repeat [2] and [3] until convergence To prove equivalence after convergence, plug [3] into [2] and a few straightforward manipulations lead to the generalized least-squares estimator (5) The convergence criterion in step [4] should be based on the estimates β (j) (j), θ and σ 2(j), for example, absolute or relative mean squared differences The algorithm usually converges in a few steps We will come back to this issue in Section 4 In the setting of a single microarray chip, the design matrix X is too big to compute with for available computing resources We use therefore equation (2) to separate the different fixed effects and perform the regression step [2] iteratively on the chip specific effects β Chip and the gene effects β Gene Thus, step [2] becomes:

4 4 SAIN AND FURRER [2a] (0) Let β Gene be an initial guess and put l = 1 [2b] β(l) Chip = ( F T F ) 1 ( F T Y ĥ(j 1) G β (l 1) ) Gene [2c] β(l) Gene = ( G T G ) 1 ( G T Y ĥ(j 1) F β (l) ) Chip [2d] Put, l = l + 1 and repeat [2b] and [2c] until convergence, then β (j) = ( β(l 1) Chip T (l 1), β T) T Gene 22 Backfitting with Several Chips Suppose we have k different chips According to equation (3) we extend the backfitting algorithm presented in the last section As there is no spatial structure between different chips (cf equation (4)), we estimate and fit the spatial structure on each chip separately In a similar way, the chip specific effects depend only on the observations of the corresponding chip Only the gene effects have to be considered across all observations It can be shown that they can be fitted by taking the mean of the centered observations Z i = Y Chip i h (j 1) Chip i F β(l) Chip i Therefore, the design matrices are identical to the case of a single chip This yields the modified backfitting algorithm below [1 ] Let ĥ(0) Chip i, i = 1,, k, be an initial guess and put j = 1 [2a ] Let β (0) Gene be an initial guess and put l = 1 [2b ] For i = 1,, k, β (l) Chip i = ( F T F ) 1 ( F T Y Chip i ĥ(j 1) Chip i [2c ] Let Z = 1 k k i=1 Y Chip i ĥ(j 1) Chip i β (l) Gene = ( G T G ) 1 G T Z (l 1)) G β Gene (l) F β Chip i, then put [2d ] Put, l = l + 1 and repeat [2b ] and [2c ] until convergence, then β (j) = ( β(l 1) Chip [3 ] For i = 1,, k, T (l 1), β T) T Gene estimate covariance parameters to get ĥ (j) θ (j) i and σ 2 i (j), then put Chip i = K i( θ (j) i ) ( K i ( θ (j) i ) + σ i 2(j) I ) 1( (j) (j) YChip i F β Chip i G β Gene [4 ] Put j = j + 1 and repeat [2a ] to [3 ] until convergence ) This backfitting algorithm uses essentially the same amount of storage as the algorithm for a single chip only Note that the computing time of step [2c ] is comparable with step [2c] Whereas steps [2b ] and [3 ] are k times as expensive as the respective steps in the single chip case

5 FITTING LARGE-SCALE SPATIAL MODELS 5 3 Computational Issues One of the aims of this study was to see whether this kind of analysis could be done with existing software on a reasonable sized desktop computer We decided to use the freely available computer software R (Ihaka and Gentleman, 1996; R Development Core Team, 2004) with a RedHat Linux system and 2 Gbytes of RAM 31 Sparse Matrices The design matrices F and G contain as entires ±1 and a vast amount of zeros If such huge matrices contain only a small percentage of nonzero elements, it is advantageous to use more complex storing methods than a simple double indexed array One commonly used structure consists of using three vectors, where the first contains the nonzero elements, the second the column indexes of the elements stored in the first, and the last pointers to the beginning of each matrix row in the first two vectors For a matrix with z nonzero elements we thus need z reals and z + n + 1 integers compared to n n reals (eg George and Liu, 1981, see also Table 1 as explained in Section 33) The R package SparseM (Koenker and Ng, 2003) contains a few rudimentary functions for handling sparse matrices We used their concept of representing sparse matrices and wrote the backfitting procedure in a linear, sequential way, calling as few functions as possible in order to save memory Computationally expensive blocks, such as the construction of the design matrices are coded in Fortran 77 The coding is similar to the functions given in Furrer (2004) 32 Covariance Tapering In the backfitting algorithm, steps [3] or [3 ] are best unbiased linear predictions (BLUP) of a spatial field, also called simple kriging in geostatistical literature The BLUP essentially requires solving the huge linear system Vx = Z, where Z contains centered observations Computationally, we first perform a Cholesky factorization L T L = V and then successively solve the triangular systems Lw = Z and L T x = w, giving x = V 1 Z Typical covariance structures imply full matrices V Tapering the covariance function with some positive definite, compactly supported function induces a sparseness structure in V and preserves asymptotic optimality (Furrer et al, 2004) The taper range determines the degree of approximation but also the sparseness of V As a rule of thumb, Furrer et al (2004) recommend to use points within the taper range In our setting, we cannot meet this proposal because of memory limitations With taper distance 2 < η 2 (ie 8 points) the Cholesky factor of K i contains a number of nonzero elements of the order 10 6 However, with 2 < η 3 (ie points) the Cholesky factor of K i contains a number of nonzero elements of the order 10 8 We will therefore use a taper length of 2, leading to 3,614,762 (0002%) and 67,070,820 (0040%) nonzero elements in the covariance matrix and its Cholesky factor, respectively, for the single chip case We suppose that the spatial process is stationary and isotropic such that the ijth element of the covariance matrices is given by positive definite function k(h; θ 1, θ 2 ), where h is the distance between observation i and j The parameter θ 1 = k(0) is called the sill and θ 2 is the range parameter, responsible for the rate at which the covariance decays

6 6 SAIN AND FURRER 33 Choice of Contrasts The design matrices are sparse, but the choice of the contrasts determines to what extent Therefore, this choice is crucial to our objective As an illustration, Table 1 gives the percentages of nonzero elements for sum and treatment contrasts for one chip Table 1: Sparseness for different contrasts Percentages of nonzero elements compared to a full matrix Note that X has more than elements For two cases only lower bounds can be given due to limited RAM F F T F G G T G X X T X Treatment Sum > > 6626 As we decouple the chip and gene effects in the regression step of the backfitting algorithm (steps [2b ] and [2c ]), we can switch between different contrasts for each of those effects For interpretability reasons, we choose for F sum and for G treatment contrasts Covariance tapering and the additional iteration of the gene and chip effects (steps [2a,,d] and [2,,d ]), cuts the computational and storage cost considerably However, 2 Gbytes of RAM are not sufficient for our application to keep all matrices permanently in memory Hence, for each regression and kriging step we construct the individual design and covariance matrices and eliminate them afterwards 34 Standard Errors To calculate standard errors of the parameter β, we could simply use equation (5) to deduct Var( β) = (X T V 1 X) 1 Var(Y) However, this variance cannot be calculated directly, since the matrices are too big In the case of one chip, one could simplify the expression to Var( β Gene ) = ( (G T G) 1 G T F(F T F) 1 F T G ) Var(Y) With our existing computing resources, it is still not possible to evaluate this quantity We therefore use the simplistic approximation Var( β Gene ) ( θ2 1 + σ 2) (G T G) 1 (6) 4 Application The raw data from the microarray chips was rounded to the nearest 1/4 We therefore blurred the data prior to our analysis with a white noise according a uniform ( 0125, 0125) variable Then we took the logarithm and subtracted the mean Those transformed observations were plugged into the backfitting algorithms previously outlined The next two sections present the results for a single and a double chip model

7 FITTING LARGE-SCALE SPATIAL MODELS 7 Figure 1: Two-dimensional empirical covariance function for single chip example 41 Single Chip Results Figure 1 shows the two-dimensional empirical covariance function after fitting, confirming strongly isotropy of the spatial process To our knowledge there is no a priori reason that the spatial covariance has a particular structure Given the gridded data, empirical covariance estimates are not able to refute a linear behavior of the covariance function at the origin (cf Figure 2) We suppose an exponential covariance structure and use a spherical taper with a fixed taper range of 2 (being the maximum possible value computationally feasible) The resulting covariance function for K 1 can be written as k(h; θ 1, θ 2 ) = θ 1 exp ( h )(1 3h ) θ 2 4 h3 16 Figure 2 shows the ordinary least squares fits with θ 1 = 0487, θ 2 = 1528 and nugget effect σ 2 = 0061 As presumed, the backfitting algorithm converges quickly, MSE ( β(j) β (j 1)) < 10 4 for j 4 Figure 3 shows how quickly a few randomly selected coefficients converge Table 2 gives the required computing time on a Linux powered 26 GHz Xeon processor with 2 Gbytes of RAM Figure 6 displays the row and column effects (the color bar was taken over the range of the displayed data) The top panel reflects the fact that perfect match and miss match are on alternating rows The column effects might indicate a small trend with higher values at the right But the row and column effects are small compared x + x o + x + x o empirical horizontal empirical vertical empirical off axis fitted exponential tapered covariance: exp*spher taper covariance o o o + x o o oo o o o o + x o o o o o o + x o o o o + x o + xo lag Figure 2: Empirical and fitted covariances for single chip example

8 8 SAIN AND FURRER Iteration Iteration Figure 3: Convergence of randomly selected fixed effects parameters (row, column effects left panel, gene effects right panel) for single chip example to the spatial process (Figure 7, top) The observations are the sum of the chip specific effects (Figure 7, bottom), the gene effects and the residuals (Figure 8, top and bottom) The residuals indicate, that all the structure could be explained with the fixed effects and a spatial process as there is no spatial structure or pattern left Figure 4 shows the effects between perfect match and miss match, which are slightly positively skewed We tried to normalize the effects using the approximation (6) However, little difference is observed and there are a larger number of normalized effects beyond the typical threshold values of ±2 42 Double Chip Results The algorithm was applied to two chips based on different samples from a single individual Figure 9 shows the differences between the chip effects and spatial processes in the case of two chips Although the differences between the chip effects are small compared to the fitted effects, they exhibit an interesting pattern Chip Table 2: Computing time for different steps in the backfitting algorithm in R The values represent the mean of the iterations (Linux, 26 GHz Xeon processor with 2 Gbytes of RAM) Action Time (sec) Read data, variable setup 559 Create the matrix F T F 1444 Solve F T Fx = b 161 Create the matrix G T G 1776 Solve G T Gx = b 308 Total typical regression step (4 iterations) 4528 Estimate covariance parameters 1219 Create the matrix Σ h 1048 Solve Σ h x = b 4567 Total typical kriging step 7551 Total backfitting (4 iterations) 50623

9 FITTING LARGE-SCALE SPATIAL MODELS 9 Miss match Miss match Perfect match Perfect match Figure 4: Gene effects between perfect match and miss match for single chip example (the right panel gives the normalized effects) The horizontal and vertical lines are the means The red and blue curves are smoothed histograms The dotted curves are superimposed normal densities 2 has almost exclusively bigger effects The spatial difference shows some rather large blotches suggesting substantial differences in chip specific large-scale trends This emphasizes that large and small-scale chip specific effects and trends can be modeled and extracted Figure 5 compares the fitted gene effects obtained from the analysis with a single chip only and from taking account of both Reassuringly, there do not seem to be substantial differences in the gene effects across the two chips from the same individual Effects with two chips Effects with one chip Figure 5: Comparison of the fitted gene effects with the single chip and the double chip model

10 10 SAIN AND FURRER 5 Discussion and Outlook Our goal in this project was essentially a proof-of-concept to establish that traditional additive, mixed-effects models for multivariate spatial data could be used to analyze large-scale data problems such as those posed in the environmental and biological sciences We now begin serious application of these methods, in particular to the microarray data from experiments associated with cerebral vascular malformations (among others) More specifically, we seek a more detailed analysis, including the examination of genes labeled as differentially expressed and the comparison with the results from more established methods Moreover, we seek to examine the differences in differentially expressed genes for the different disease groups in our study This will involve additional modifications to the design matrices However, we do not perceive this to be a serious complication and the backfitting algorithms can easily be modified to account for these changes Our models currently assume constant means across the probe-level data for each specific gene on the gene chip There seems to be evidence, both from our own empirical studies and in the the biological literature, that this is not the case We are exploring improved models to account for this additional structure in the data Finally, there are additional computational improvements currently being examined We are exploring computing environments that do not have the memory limitations of 2 Gbytes of RAM We are also exploring ways of imposing less-severe tapering of the spatial covariances in order to approach the more optimal conditions discussed in Furrer et al (2004) In addition, the fairly regular lattices observed in microarray data lead to a particular sparse structure in the Cholesky factor Our preliminary experiments suggest could be exploited to dramatically improve computational performance Acknowledgments The authors would like to thank Professor Isam Awad and Robert Shenkar (Department of Neurological Surgery, Feinberg School of Medecine, Northwestern University) as well as Edith Creek (Department of Mathematics, University of Colorado at Denver) for providing the data and answering our numerous questions The research of the first author was supported in part by a grant from the University of Colorado Genome-Biotechnology Initiative The research of both the first and second authors was supported in part by the Geophyical Statistics Project at the National Center for Atmospheric Research under the National Science Foundation grant DMS References Breiman, L and Friedman, J H (1985) Estimating optimal transformations for multiple regression and correlations (with discussion) Journal of the American Statistical Association, 80, Buja, A, Hastie, T J, and Tibshirani, R J (1989) Linear smoothers and additive models (with discussion) Annals of Statistics, 17, Cressie, N A C (1993) Statistics for Spatial Data John Wiley & Sons Inc, New York, revised reprint 3

11 FITTING LARGE-SCALE SPATIAL MODELS 11 Furrer, R (2004) KriSp: An R package for Covariance Tapered Kriging of Large Datasets Using Sparse Matrix Techniques Software/KriSp/ 5 Furrer, R, Genton, M G, and Nychka, D (2004) Covariance Tapering for Interpolation of Large Spatial Datasets Submitted to Journal of Computational and Graphical Statistics 5, 10 George, A and Liu, J W H (1981) Computer solution of large sparse positive definite systems Prentice-Hall Inc, Englewood Cliffs, N J 5 Ihaka, R and Gentleman, R (1996) R: A language for data analysis and graphics Journal of Computational and Graphical Statistics, 5, Kitanidis, P K (1997) Introduction to Geostatistics: Applications in Hydrogeology Cambridge University Press 3 Koenker, R and Ng, P (2003) SparseM: Sparse Matrix Package for R 5 Nychka, D W (2000) Spatial-process estimates as smoothers In Schimek, M G, editor, Smoothing and Regression: Approaches, Computation, and Application, chapter 13, John Wiley & Sons Inc, New York 3 R Development Core Team (2004) R: A language and environment for statistical computing R Foundation for Statistical Computing, Vienna, Austria 5 Shenkar, R, Elliott, J P, Diener, K, Gault, J, Hu, L, Cohrs, R J, Phang, T, Hunter, L, Breeze, R E, and Awad, I A (2003) Differential gene expression in human cerebrovascular malformations (with discussion) Neurosurgery, 52, Stein, M L (1999) Interpolation of Spatial Data Springer-Verlag, New York 3

12 12 SAIN AND FURRER Figure 6: Row (top) and column (bottom) effects for single chip example (back to text)

13 FITTING LARGE-SCALE SPATIAL MODELS 13 Figure 7: Spatial process (top) and chip specific effects (bottom) for single chip example (back to text)

14 14 SAIN AND FURRER Figure 8: Gene effects (top) and residuals (bottom) for single chip example (back to text)

15 FITTING LARGE-SCALE SPATIAL MODELS 15 Figure 9: Differences for the chip effects (top) and spatial processes (bottom) in the case of two chips (back to text)

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed

Spatial Backfitting of Roller Measurement Values from a Florida Test Bed Spatial Backfitting of Roller Measurement Values from a Florida Test Bed Daniel K. Heersink 1, Reinhard Furrer 1, and Mike A. Mooney 2 1 Institute of Mathematics, University of Zurich, CH-8057 Zurich 2

More information

The Matrix Reloaded: Computations for large spatial data sets

The Matrix Reloaded: Computations for large spatial data sets The Matrix Reloaded: Computations for large spatial data sets The spatial model Solving linear systems Matrix multiplication Creating sparsity Doug Nychka National Center for Atmospheric Research Sparsity,

More information

Covariance Tapering for Interpolation of Large Spatial Datasets

Covariance Tapering for Interpolation of Large Spatial Datasets Covariance Tapering for Interpolation of Large Spatial Datasets Reinhard Furrer, Marc G. Genton and Douglas Nychka Interpolation of a spatially correlated random process is used in many areas. The best

More information

The Matrix Reloaded: Computations for large spatial data sets

The Matrix Reloaded: Computations for large spatial data sets The Matrix Reloaded: Computations for large spatial data sets Doug Nychka National Center for Atmospheric Research The spatial model Solving linear systems Matrix multiplication Creating sparsity Sparsity,

More information

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields

Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields Spatial statistics, addition to Part I. Parameter estimation and kriging for Gaussian random fields 1 Introduction Jo Eidsvik Department of Mathematical Sciences, NTNU, Norway. (joeid@math.ntnu.no) February

More information

Covariance Tapering for Interpolation of Large Spatial Datasets

Covariance Tapering for Interpolation of Large Spatial Datasets Covariance Tapering for Interpolation of Large Spatial Datasets Reinhard FURRER, Marc G. GENTON, and Douglas NYCHKA Interpolation of a spatially correlated random process is used in many scientific areas.

More information

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA

SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer

More information

On Gaussian Process Models for High-Dimensional Geostatistical Datasets

On Gaussian Process Models for High-Dimensional Geostatistical Datasets On Gaussian Process Models for High-Dimensional Geostatistical Datasets Sudipto Banerjee Joint work with Abhirup Datta, Andrew O. Finley and Alan E. Gelfand University of California, Los Angeles, USA May

More information

Multivariate modelling and efficient estimation of Gaussian random fields with application to roller data

Multivariate modelling and efficient estimation of Gaussian random fields with application to roller data Multivariate modelling and efficient estimation of Gaussian random fields with application to roller data Reinhard Furrer, UZH PASI, Búzios, 14-06-25 NZZ.ch Motivation Microarray data: construct alternative

More information

Multivariate spatial models and the multikrig class

Multivariate spatial models and the multikrig class Multivariate spatial models and the multikrig class Stephan R Sain, IMAGe, NCAR ENAR Spring Meetings March 15, 2009 Outline Overview of multivariate spatial regression models Case study: pedotransfer functions

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS

ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,

More information

Building Blocks for Direct Sequential Simulation on Unstructured Grids

Building Blocks for Direct Sequential Simulation on Unstructured Grids Building Blocks for Direct Sequential Simulation on Unstructured Grids Abstract M. J. Pyrcz (mpyrcz@ualberta.ca) and C. V. Deutsch (cdeutsch@ualberta.ca) University of Alberta, Edmonton, Alberta, CANADA

More information

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH

PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH SURESH TRIPATHI Geostatistical Society of India Assumptions and Geostatistical Variogram

More information

Outline. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices Methods and Kriging

Outline. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices. Sparse Matrices Methods and Kriging Sparse Matrices Methods and Kriging Applications to Large Spatial Data Sets SAMSI July 28 August 1, 2009 Reinhard Furrer, University of Zurich Outline What are sparse matrices? How to work with sparse

More information

PARAMETER ESTIMATION FOR FRACTIONAL BROWNIAN SURFACES

PARAMETER ESTIMATION FOR FRACTIONAL BROWNIAN SURFACES Statistica Sinica 2(2002), 863-883 PARAMETER ESTIMATION FOR FRACTIONAL BROWNIAN SURFACES Zhengyuan Zhu and Michael L. Stein University of Chicago Abstract: We study the use of increments to estimate the

More information

A Multivariate Spatial Model for Soil Water Profiles

A Multivariate Spatial Model for Soil Water Profiles A Multivariate Spatial Model for Soil Water Profiles Stephan R. Sain, 1 Shrikant Jagtap, 2 Linda Mearns, 3 and Doug Nychka 4 July 20, 2004 SUMMARY: Pedotransfer functions are classes of models used to

More information

Multi-resolution models for large data sets

Multi-resolution models for large data sets Multi-resolution models for large data sets Douglas Nychka, National Center for Atmospheric Research National Science Foundation NORDSTAT, Umeå, June, 2012 Credits Steve Sain, NCAR Tia LeRud, UC Davis

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap

Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap Geog 210C Spring 2011 Lab 6. Geostatistics in ArcMap Overview In this lab you will think critically about the functionality of spatial interpolation, improve your kriging skills, and learn how to use several

More information

Multi-resolution models for large data sets

Multi-resolution models for large data sets Multi-resolution models for large data sets Douglas Nychka, National Center for Atmospheric Research National Science Foundation Iowa State March, 2013 Credits Steve Sain, Tamra Greasby, NCAR Tia LeRud,

More information

Faster Kriging on Graphs

Faster Kriging on Graphs Faster Kriging on Graphs Omkar Muralidharan Abstract [Xu et al. 2009] introduce a graph prediction method that is accurate but slow. My project investigates faster methods based on theirs that are nearly

More information

Climate Change: the Uncertainty of Certainty

Climate Change: the Uncertainty of Certainty Climate Change: the Uncertainty of Certainty Reinhard Furrer, UZH JSS, Geneva Oct. 30, 2009 Collaboration with: Stephan Sain - NCAR Reto Knutti - ETHZ Claudia Tebaldi - Climate Central Ryan Ford, Doug

More information

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph

I don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph Spatial analysis Huge topic! Key references Diggle (point patterns); Cressie (everything); Diggle and Ribeiro (geostatistics); Dormann et al (GLMMs for species presence/abundance); Haining; (Pinheiro and

More information

Interaction effects for continuous predictors in regression modeling

Interaction effects for continuous predictors in regression modeling Interaction effects for continuous predictors in regression modeling Testing for interactions The linear regression model is undoubtedly the most commonly-used statistical model, and has the advantage

More information

Lecture 9: Introduction to Kriging

Lecture 9: Introduction to Kriging Lecture 9: Introduction to Kriging Math 586 Beginning remarks Kriging is a commonly used method of interpolation (prediction) for spatial data. The data are a set of observations of some variable(s) of

More information

Some general observations.

Some general observations. Modeling and analyzing data from computer experiments. Some general observations. 1. For simplicity, I assume that all factors (inputs) x1, x2,, xd are quantitative. 2. Because the code always produces

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Sparse Matrices and Large Data Issues

Sparse Matrices and Large Data Issues Sparse Matrices and Large Data Issues Workshop ENAR March 15, 2009 Reinhard Furrer, CSM Outline What are sparse matrices? How to work with sparse matrices? Sparse positive definite matrices in statistics.

More information

Sparse Matrices and Large Data Issues

Sparse Matrices and Large Data Issues Sparse Matrices and Large Data Issues Workshop ENAR March 15, 2009 Reinhard Furrer, CSM Outline What are sparse matrices? How to work with sparse matrices? Sparse positive definite matrices in statistics.

More information

ESL Chap3. Some extensions of lasso

ESL Chap3. Some extensions of lasso ESL Chap3 Some extensions of lasso 1 Outline Consistency of lasso for model selection Adaptive lasso Elastic net Group lasso 2 Consistency of lasso for model selection A number of authors have studied

More information

Overview of Spatial Statistics with Applications to fmri

Overview of Spatial Statistics with Applications to fmri with Applications to fmri School of Mathematics & Statistics Newcastle University April 8 th, 2016 Outline Why spatial statistics? Basic results Nonstationary models Inference for large data sets An example

More information

Nearest Neighbor Gaussian Processes for Large Spatial Data

Nearest Neighbor Gaussian Processes for Large Spatial Data Nearest Neighbor Gaussian Processes for Large Spatial Data Abhi Datta 1, Sudipto Banerjee 2 and Andrew O. Finley 3 July 31, 2017 1 Department of Biostatistics, Bloomberg School of Public Health, Johns

More information

A Bias Correction for the Minimum Error Rate in Cross-validation

A Bias Correction for the Minimum Error Rate in Cross-validation A Bias Correction for the Minimum Error Rate in Cross-validation Ryan J. Tibshirani Robert Tibshirani Abstract Tuning parameters in supervised learning problems are often estimated by cross-validation.

More information

Index. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN:

Index. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN: Index Akaike information criterion (AIC) 105, 290 analysis of variance 35, 44, 127 132 angular transformation 22 anisotropy 59, 99 affine or geometric 59, 100 101 anisotropy ratio 101 exploring and displaying

More information

On the convergence of the iterative solution of the likelihood equations

On the convergence of the iterative solution of the likelihood equations On the convergence of the iterative solution of the likelihood equations R. Moddemeijer University of Groningen, Department of Computing Science, P.O. Box 800, NL-9700 AV Groningen, The Netherlands, e-mail:

More information

Expressions for the covariance matrix of covariance data

Expressions for the covariance matrix of covariance data Expressions for the covariance matrix of covariance data Torsten Söderström Division of Systems and Control, Department of Information Technology, Uppsala University, P O Box 337, SE-7505 Uppsala, Sweden

More information

Computational methods for mixed models

Computational methods for mixed models Computational methods for mixed models Douglas Bates Department of Statistics University of Wisconsin Madison March 27, 2018 Abstract The lme4 package provides R functions to fit and analyze several different

More information

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets

A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets A full scale, non stationary approach for the kriging of large spatio(-temporal) datasets Thomas Romary, Nicolas Desassis & Francky Fouedjio Mines ParisTech Centre de Géosciences, Equipe Géostatistique

More information

Statistical Inference

Statistical Inference Statistical Inference Liu Yang Florida State University October 27, 2016 Liu Yang, Libo Wang (Florida State University) Statistical Inference October 27, 2016 1 / 27 Outline The Bayesian Lasso Trevor Park

More information

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices

Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Applications of Randomized Methods for Decomposing and Simulating from Large Covariance Matrices Vahid Dehdari and Clayton V. Deutsch Geostatistical modeling involves many variables and many locations.

More information

Model Selection for Geostatistical Models

Model Selection for Geostatistical Models Model Selection for Geostatistical Models Richard A. Davis Colorado State University http://www.stat.colostate.edu/~rdavis/lectures Joint work with: Jennifer A. Hoeting, Colorado State University Andrew

More information

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University

Spatial Lasso with Application to GIS Model Selection. F. Jay Breidt Colorado State University Spatial Lasso with Application to GIS Model Selection F. Jay Breidt Colorado State University with Hsin-Cheng Huang, Nan-Jung Hsu, and Dave Theobald September 25 The work reported here was developed under

More information

Testing Theories in Particle Physics Using Maximum Likelihood and Adaptive Bin Allocation

Testing Theories in Particle Physics Using Maximum Likelihood and Adaptive Bin Allocation Testing Theories in Particle Physics Using Maximum Likelihood and Adaptive Bin Allocation Bruce Knuteson 1 and Ricardo Vilalta 2 1 Laboratory for Nuclear Science, Massachusetts Institute of Technology

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

On dealing with spatially correlated residuals in remote sensing and GIS

On dealing with spatially correlated residuals in remote sensing and GIS On dealing with spatially correlated residuals in remote sensing and GIS Nicholas A. S. Hamm 1, Peter M. Atkinson and Edward J. Milton 3 School of Geography University of Southampton Southampton SO17 3AT

More information

A multi-resolution Gaussian process model for the analysis of large spatial data sets.

A multi-resolution Gaussian process model for the analysis of large spatial data sets. National Science Foundation A multi-resolution Gaussian process model for the analysis of large spatial data sets. Doug Nychka Soutir Bandyopadhyay Dorit Hammerling Finn Lindgren Stephen Sain NCAR/TN-504+STR

More information

Exploring the World of Ordinary Kriging. Dennis J. J. Walvoort. Wageningen University & Research Center Wageningen, The Netherlands

Exploring the World of Ordinary Kriging. Dennis J. J. Walvoort. Wageningen University & Research Center Wageningen, The Netherlands Exploring the World of Ordinary Kriging Wageningen University & Research Center Wageningen, The Netherlands July 2004 (version 0.2) What is? What is it about? Potential Users a computer program for exploring

More information

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract

Statistical Models for Monitoring and Regulating Ground-level Ozone. Abstract Statistical Models for Monitoring and Regulating Ground-level Ozone Eric Gilleland 1 and Douglas Nychka 2 Abstract The application of statistical techniques to environmental problems often involves a tradeoff

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Fast kriging of large data sets with Gaussian Markov random fields

Fast kriging of large data sets with Gaussian Markov random fields Computational Statistics & Data Analysis 52 (2008) 233 2349 www.elsevier.com/locate/csda Fast kriging of large data sets with Gaussian Markov random fields Linda Hartman a,,, Ola Hössjer b,2 a Centre for

More information

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets

Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Hierarchical Nearest-Neighbor Gaussian Process Models for Large Geo-statistical Datasets Abhirup Datta 1 Sudipto Banerjee 1 Andrew O. Finley 2 Alan E. Gelfand 3 1 University of Minnesota, Minneapolis,

More information

Using Estimating Equations for Spatially Correlated A

Using Estimating Equations for Spatially Correlated A Using Estimating Equations for Spatially Correlated Areal Data December 8, 2009 Introduction GEEs Spatial Estimating Equations Implementation Simulation Conclusion Typical Problem Assess the relationship

More information

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland

Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland Mikhail Kanevski 1, Michel Maignan 1

More information

A Short Note on Resolving Singularity Problems in Covariance Matrices

A Short Note on Resolving Singularity Problems in Covariance Matrices International Journal of Statistics and Probability; Vol. 1, No. 2; 2012 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education A Short Note on Resolving Singularity Problems

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Beta-Binomial Kriging: An Improved Model for Spatial Rates

Beta-Binomial Kriging: An Improved Model for Spatial Rates Available online at www.sciencedirect.com ScienceDirect Procedia Environmental Sciences 27 (2015 ) 30 37 Spatial Statistics 2015: Emerging Patterns - Part 2 Beta-Binomial Kriging: An Improved Model for

More information

arxiv: v1 [stat.me] 24 May 2010

arxiv: v1 [stat.me] 24 May 2010 The role of the nugget term in the Gaussian process method Andrey Pepelyshev arxiv:1005.4385v1 [stat.me] 24 May 2010 Abstract The maximum likelihood estimate of the correlation parameter of a Gaussian

More information

The ProbForecastGOP Package

The ProbForecastGOP Package The ProbForecastGOP Package April 24, 2006 Title Probabilistic Weather Field Forecast using the GOP method Version 1.3 Author Yulia Gel, Adrian E. Raftery, Tilmann Gneiting, Veronica J. Berrocal Description

More information

A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data

A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data A Sequential Split-Conquer-Combine Approach for Analysis of Big Spatial Data Min-ge Xie Department of Statistics & Biostatistics Rutgers, The State University of New Jersey In collaboration with Xuying

More information

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St.

Regression Graphics. 1 Introduction. 2 The Central Subspace. R. D. Cook Department of Applied Statistics University of Minnesota St. Regression Graphics R. D. Cook Department of Applied Statistics University of Minnesota St. Paul, MN 55108 Abstract This article, which is based on an Interface tutorial, presents an overview of regression

More information

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University

Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University Nonstationary spatial process modeling Part II Paul D. Sampson --- Catherine Calder Univ of Washington --- Ohio State University this presentation derived from that presented at the Pan-American Advanced

More information

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE

Biostatistics Workshop Longitudinal Data Analysis. Session 4 GARRETT FITZMAURICE Biostatistics Workshop 2008 Longitudinal Data Analysis Session 4 GARRETT FITZMAURICE Harvard University 1 LINEAR MIXED EFFECTS MODELS Motivating Example: Influence of Menarche on Changes in Body Fat Prospective

More information

The Proportional Effect of Spatial Variables

The Proportional Effect of Spatial Variables The Proportional Effect of Spatial Variables J. G. Manchuk, O. Leuangthong and C. V. Deutsch Centre for Computational Geostatistics, Department of Civil and Environmental Engineering University of Alberta

More information

Sparse inverse covariance estimation with the lasso

Sparse inverse covariance estimation with the lasso Sparse inverse covariance estimation with the lasso Jerome Friedman Trevor Hastie and Robert Tibshirani November 8, 2007 Abstract We consider the problem of estimating sparse graphs by a lasso penalty

More information

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University

An Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University An Introduction to Spatial Statistics Chunfeng Huang Department of Statistics, Indiana University Microwave Sounding Unit (MSU) Anomalies (Monthly): 1979-2006. Iron Ore (Cressie, 1986) Raw percent data

More information

Spatial Modeling and Prediction of County-Level Employment Growth Data

Spatial Modeling and Prediction of County-Level Employment Growth Data Spatial Modeling and Prediction of County-Level Employment Growth Data N. Ganesh Abstract For correlated sample survey estimates, a linear model with covariance matrix in which small areas are grouped

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Quantile Regression for Extraordinarily Large Data

Quantile Regression for Extraordinarily Large Data Quantile Regression for Extraordinarily Large Data Shih-Kang Chao Department of Statistics Purdue University November, 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile regression Two-step

More information

Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data

Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data Hazard Function, Failure Rate, and A Rule of Thumb for Calculating Empirical Hazard Function of Continuous-Time Failure Data Feng-feng Li,2, Gang Xie,2, Yong Sun,2, Lin Ma,2 CRC for Infrastructure and

More information

Multivariate spatial models and the multikrig class

Multivariate spatial models and the multikrig class Multivariate spatial models and the multikrig class Stephan R Sain, IMAGe, NCAR SAMSI Summer School on Spatial Statistics July 28 August 1, 2009 Outline Overview of multivariate spatial regression models

More information

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study

Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study Modeling and Interpolation of Non-Gaussian Spatial Data: A Comparative Study Gunter Spöck, Hannes Kazianka, Jürgen Pilz Department of Statistics, University of Klagenfurt, Austria hannes.kazianka@uni-klu.ac.at

More information

Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan Sain. October 10, 2012

Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan Sain. October 10, 2012 A multi-resolution Gaussian process model for the analysis of large spatial data sets. Douglas Nychka, Soutir Bandyopadhyay, Dorit Hammerling, Finn Lindgren, and Stephan Sain October 10, 2012 Abstract

More information

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints

A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Noname manuscript No. (will be inserted by the editor) A Recursive Formula for the Kaplan-Meier Estimator with Mean Constraints Mai Zhou Yifan Yang Received: date / Accepted: date Abstract In this note

More information

Approximating likelihoods for large spatial data sets

Approximating likelihoods for large spatial data sets Approximating likelihoods for large spatial data sets By Michael Stein, Zhiyi Chi, and Leah Welty Jessi Cisewski April 28, 2009 1 Introduction Given a set of spatial data, often the desire is to estimate

More information

Analysis of methods for speech signals quantization

Analysis of methods for speech signals quantization INFOTEH-JAHORINA Vol. 14, March 2015. Analysis of methods for speech signals quantization Stefan Stojkov Mihajlo Pupin Institute, University of Belgrade Belgrade, Serbia e-mail: stefan.stojkov@pupin.rs

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan

More information

Basics of Point-Referenced Data Models

Basics of Point-Referenced Data Models Basics of Point-Referenced Data Models Basic tool is a spatial process, {Y (s), s D}, where D R r Chapter 2: Basics of Point-Referenced Data Models p. 1/45 Basics of Point-Referenced Data Models Basic

More information

Package plw. R topics documented: May 7, Type Package

Package plw. R topics documented: May 7, Type Package Type Package Package plw May 7, 2018 Title Probe level Locally moderated Weighted t-tests. Version 1.40.0 Date 2009-07-22 Author Magnus Astrand Maintainer Magnus Astrand

More information

Estimation of direction of increase of gold mineralisation using pair-copulas

Estimation of direction of increase of gold mineralisation using pair-copulas 22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Estimation of direction of increase of gold mineralisation using pair-copulas

More information

Obtaining Uncertainty Measures on Slope and Intercept

Obtaining Uncertainty Measures on Slope and Intercept Obtaining Uncertainty Measures on Slope and Intercept of a Least Squares Fit with Excel s LINEST Faith A. Morrison Professor of Chemical Engineering Michigan Technological University, Houghton, MI 39931

More information

7 Geostatistics. Figure 7.1 Focus of geostatistics

7 Geostatistics. Figure 7.1 Focus of geostatistics 7 Geostatistics 7.1 Introduction Geostatistics is the part of statistics that is concerned with geo-referenced data, i.e. data that are linked to spatial coordinates. To describe the spatial variation

More information

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models

Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Environmentrics 00, 1 12 DOI: 10.1002/env.XXXX Comparing Non-informative Priors for Estimation and Prediction in Spatial Models Regina Wu a and Cari G. Kaufman a Summary: Fitting a Bayesian model to spatial

More information

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process

Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Applied Mathematical Sciences, Vol. 4, 2010, no. 62, 3083-3093 Sequential Procedure for Testing Hypothesis about Mean of Latent Gaussian Process Julia Bondarenko Helmut-Schmidt University Hamburg University

More information

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE

POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE CO-282 POPULAR CARTOGRAPHIC AREAL INTERPOLATION METHODS VIEWED FROM A GEOSTATISTICAL PERSPECTIVE KYRIAKIDIS P. University of California Santa Barbara, MYTILENE, GREECE ABSTRACT Cartographic areal interpolation

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Probabilistic Regression Using Basis Function Models

Probabilistic Regression Using Basis Function Models Probabilistic Regression Using Basis Function Models Gregory Z. Grudic Department of Computer Science University of Colorado, Boulder grudic@cs.colorado.edu Abstract Our goal is to accurately estimate

More information

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor

Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Computationally efficient banding of large covariance matrices for ordered data and connections to banding the inverse Cholesky factor Y. Wang M. J. Daniels wang.yanpin@scrippshealth.org mjdaniels@austin.utexas.edu

More information

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning)

Regression. Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Linear Regression Regression Goal: Learn a mapping from observations (features) to continuous labels given a training set (supervised learning) Example: Height, Gender, Weight Shoe Size Audio features

More information

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates

More information

Notes for CS542G (Iterative Solvers for Linear Systems)

Notes for CS542G (Iterative Solvers for Linear Systems) Notes for CS542G (Iterative Solvers for Linear Systems) Robert Bridson November 20, 2007 1 The Basics We re now looking at efficient ways to solve the linear system of equations Ax = b where in this course,

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

Chapter 4 - Fundamentals of spatial processes Lecture notes

Chapter 4 - Fundamentals of spatial processes Lecture notes TK4150 - Intro 1 Chapter 4 - Fundamentals of spatial processes Lecture notes Odd Kolbjørnsen and Geir Storvik January 30, 2017 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites

More information

Latin Hypercube Sampling with Multidimensional Uniformity

Latin Hypercube Sampling with Multidimensional Uniformity Latin Hypercube Sampling with Multidimensional Uniformity Jared L. Deutsch and Clayton V. Deutsch Complex geostatistical models can only be realized a limited number of times due to large computational

More information

On the smallest eigenvalues of covariance matrices of multivariate spatial processes

On the smallest eigenvalues of covariance matrices of multivariate spatial processes On the smallest eigenvalues of covariance matrices of multivariate spatial processes François Bachoc, Reinhard Furrer Toulouse Mathematics Institute, University Paul Sabatier, France Institute of Mathematics

More information

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models

An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Proceedings 59th ISI World Statistics Congress, 25-30 August 2013, Hong Kong (Session CPS023) p.3938 An Algorithm for Bayesian Variable Selection in High-dimensional Generalized Linear Models Vitara Pungpapong

More information

Exploratory quantile regression with many covariates: An application to adverse birth outcomes

Exploratory quantile regression with many covariates: An application to adverse birth outcomes Exploratory quantile regression with many covariates: An application to adverse birth outcomes June 3, 2011 eappendix 30 Percent of Total 20 10 0 0 1000 2000 3000 4000 5000 Birth weights efigure 1: Histogram

More information

Linear System of Equations

Linear System of Equations Linear System of Equations Linear systems are perhaps the most widely applied numerical procedures when real-world situation are to be simulated. Example: computing the forces in a TRUSS. F F 5. 77F F.

More information