Estimating and modeling variograms of compositional data with occasional missing variables in R
|
|
- Roland McKenzie
- 5 years ago
- Views:
Transcription
1 Estimating and modeling variograms of compositional data with occasional missing variables in R R. Tolosana-Delgado 1, K.G. van den Boogaart 2, V. Pawlowsky-Glahn 3 1 Maritime Engineering Laboratory (LIM), Technical University of Catalonia raimon.tolosana@upc.edu 2 Institute for Stochastics, Technical University Bergakademie, Freiberg, Germany boogaart@math.tu-freiberg.de 3 Dept. Computer Science and Applied Mathematics, University of Girona, Girona, Spain vera.pawlowsky@udg.edu Abstract. Many environmental campaigns typically include regionalized compositional data, showing the relative importance of a set of constituents of samples taken at several locations. Variables considered are not always the same everywhere in the data set, and often are not comparable in their absolute values. On the contrary, log-ratio transformed data are directly homogeneous in these cases, and we can analyse them as usual. Here we propose to characterize their spatial structure by studying the variograms of the set of all pairwise logratios: they are easy to compute, even when some values are missing or with data from different sources; they contain the same information as any set of direct and cross-variograms of a log-ratio transformed composition, and can thus be reexpressed into more classical ways for cokriging; and their model fitting is as easy to understand, visualize and check as in univariate variograms. 1 INTRODUCTION In very extensive geochemical campaigns, in environmental or mineral surveys, it is typical that samples are analysed in different labs, with different techniques and standards, and even sometimes where different subsets of components are observed. These give spatially-referenced compositional data sets with some irregularities: components might be missing in some places, and absolute percentage values might not be comparable among labs, particularly if some data vectors have been closed to sum up to 100%. [3] showed that this constant sum induces a spurious correlation, that spoils all classical statistical techniques, including variogram-based Geostatistics [5]. In the presence of such data gaps, a log-ratio transformation approach to Geostatistics [7] is necessary, as the log-ratio of two components does neither depend on the presence/absence of values on other variables, nor on whether the data were closed or not. However, a log-ratio involving at least a missing variable is not computable. Interpolation in this situation is quite similar to undersampled cokriging [9], where some coordinates of our observation vectors may be missing, and we look for both an interpolation of full vectors at unsampled locations and the completion of the missing variables at the sampled locations. The key issue in this case is the estimation and modelling of (cross-)variograms. Following [6], this contribution presents a way to estimate the covariographic structure of a regionalized compositional data set with irregularities, by using the concept of the variation matrix.
2 2 BASICS OF COMPOSITIONAL DATA ANALYSIS A compositional data set is a data set where each variable shows the relative importance of a part in a whole. The most typical compositional variables in environmental surveillance problems are chemical components, like major oxide and trace element composition of soils, heavy metal composition on trace species (e.g. moss), hydrogeochemical composition of (sub)surface waters, etc. [1] presented compositions as vectors of positive components summing up to a constant (most typically, 1 or 100%), and suggested to transform the data through a set of log-ratio transformations to get rid of the spurious correlation effects induced by the constant sum. It has been later argued [2] that a data set should be regarded as compositional (and log-ratio transformed) as soon as the questions to answer: a) are unrelated with the total sum of the variables, or b) they must be equally meaningful whichever units we use to express the variables (%, mg/l, molarity, etc.). Due to the mentioned spurious correlation effect, classical (geo)statistical concepts must be interpreted with extreme caution when used in compositional data sets. To replace the classical mean vector and covariance matrix, [1] advocates for the use of respectively: the closed geometric mean, ˆm(Z) = 100/(eŷ1 + + eŷd ) [eŷ1,...,eŷd ], where ŷ i = N n=1 log(z ni)/n, and the variation matrix ˆT(Z) = (ˆt ij ), ˆt ij = 1 N N n=1 [ log ( zni z nj ) 2 (ŷ i ŷ j )]. Here, Z = (x ni ) is a compositional data set with D components (i = 1,..., D) and N observations. Note that the variation matrix is the D D symmetric matrix of variances of all pairwise log-ratios. Alternatively, one can choose an isometric log-ratio transformation, ilr(z) = V t log(z) = Z (1) where the log is applied component-wise, and V contains a set of (D 1) orthonormal vectors v i R D orthogonal to 1 = [1,...1]. The result is a new data set without any constraint, which may be treated with any classical statistical technique; geometric results (means, regression intercepts and slopes, principal components, confidence ellipses, etc., symbolized by r ) can be back-transformed to obtain an easier-to-interpret composition ilr 1 (r ) = 100 exp(v r ) 1 t exp(v r ). (2) For instance, the closed geometric mean can be obtained as ˆm(Z) = ilr 1 (E[ilr(Z)]). The variation matrix and the covariance matrix of Z are linked through ˆΣ = Cov[Z ] = 1 2 Vt ˆT V. (3) Note that, due to the orthonormality of the columns of V, i.e. V V t = I, Eq. (3) ensures that ˆΣ and ˆT have the same eigenvectors. Thus, if ˆΣ is positive definite, then ˆT must be negative definite and vice versa. These properties also apply to the theoretical counterparts of the variation matrix and the covariance matrix, though not used here.
3 3 STRUCTURAL ANALYSIS FOR COMPOSITIONS For a regionalized composition Z( x), we can follow the idea of the variation matrix and work with Γ = (γ ij ) the matrix of direct variograms of all log-ratios of any two variables (i, j), estimated by ˆγ ij ( h) = 1 2N( h) n,m N( h) ( log z ni log z ) 2 mi (4) z nj z mj and called the intrinsic variation matrix. Surprising as it might be, [7] show that this matrix of D 2 direct variograms contains the same information as the array of all (symmetric) cross-covariances σ ij kl ( [ ( h) = Cov log (Z i ( x)/z k ( x)),log Z j ( x + h)/z l ( x + )] h) between any two possible pair-wise log-ratios, containing D 4 (cross-)covariance functions, or of any set Ψ = (ψ ij ) of auto- and cross-variograms ψ ij = Cov [ v t i log(z),vt j log(z)] of an ilr-transformed (Eq. 1) data set ilr(z( x)) = Z ( x). Note that Z ( x) is an unbounded regionalized variable, thus its intrinsic covariance structure Ψ must be conditionally negative definite. Then, because this variogram system and the intrinsic variation matrix are related through Ψ = 0.5V t Γ V, we deduce that Γ must be conditionally positive definite. In the modelling chapter, we may take Ψ( h) as a linear model of corregionalization, i.e. Ψ( h) = K k=1 C k (1 ρ k ( h)), a linear combination of some (chosen) positive definite correlograms ρ k ( h) with positive (semi-)definite C k covariance matrices (to estimate). Then, just by the linearity of (3), we may model the experimental intrinsic variation matrix by Γ( K h) = B k (1 ρ k ( h)) k=1 where B k = 0.5V t C k V are negative semi-definite matrices to estimate. Thus, there is no need to devise new routines to fit these models: we can just modify the routines checking definiteness to force the C k to be negative semi-definite. Moreover, (4) transparently admits missing values: wherever a log-ratio involves a missing, one has one sampled point less to estimate the variogram at the lags involving that location, i.e. the number of data pairs N ij ( h) will now depend on the two variable indices (i, j). The resulting variogram matrix may not be valid itself, but this is of limited importance, because the experimental variograms just guide the fitting of a valid model. 4 R PROGRAMMING For our purposes, an important limitation of existing geostatistical R packages is the lack of a truly vectorial approach to multivariate geostatistics: geor is mostly univariate (it does not even allow the computation of a cross-variogram); and gstat, a multivariate geostatistics package, must be given the variables (e.g., the several parts of the composition, or their ilr coefficients, or the set of all pairwise log-ratios) one by one. Additionally, none of the packages work transparently with missing values during variogram fitting. Therefore the proposed algorithm of computing all pairwise log-ratio variograms and
4 fitting an LMC-variogram model with negative semidefinite matrices was newly implemented in R within the compositions software package [10]. Since variances are always positive and their variablity is typically proportional to their mean value, the optimal fitting is done on a log scale, and weighted with the number of pairs in each distance class of the empirical variogram. For a given multivariate variogram model Γ(h; θ) = (γ ij (h : θ)) depending on a vector of parameters θ, the corresponding objective function (goodness-of-fit) is given by gof(θ) = p p h Bins i=1 j=1,j i ln (γ ij (h; θ)) ln (ˆγ ij (h)) 2 For reasonable starting values, it is possible to automatically minimize this function with the non-linear optimization procedure nlm of basic R [8]. 5 EXAMPLE To illustrate this variogram fitting proposal, we use a geochemical data set of 601 samples from river and stream sediments from the Grazer Paläozoikum (Styria, Austria) analysed for 34 compositional parts (9 major oxides, 25 trace elements), kindly provided by J. C. Davis. This region is mostly covered by shales, limestones and dolomites, with some crystalline basement outcrops and Tertiary clastic sediments [11]. In this contribution, we take 7 major oxides (K, Na, Ca, P, Fe, Mg, Mn), randomly removed 400 values (to simulate the loss), reclose the remaining to sum up to 100%, and compute and fit the log-variograms. We fitted a pure spherical model, Γ(h) = C 0 (1 δ 0h ) + C s sph (h/r) with positive definite matrices C 0 for the nugget and C s for the partial sill, and sph( ) a standard univariate spherical variogram with unit sill, with a range parameter r to fit. Figure 1 visualizes the achieved fit, for the original data set and the same data set with randomly created missing values. It also provides the corresponding commands needed to compute such intrinsic variation matrix with the package. A further example with the classical reference Jura data set [4] is provided in the package [10]. 6 CONCLUSIONS The array of all log-ratio variograms (the intrinsic variation matrix) allows to fit multivariate variogram models semiautomatically. The fit can be visualized in a matrix of variograms, where each panel can be interpreted separately, like a univariate variogram: it is not necessary to understand the particular aspects of cross-variogram fitting, when checking the fit. Since these variograms are strictly positive, it is also possible to use a relative-scale fitting procedure, weighting variogram values according to their (inverse) expected value. Thus, the fitting is focused on the more important small variogram values to short distances, instead of being dominated by the higher variogram values. Moreover, this method can be applied even in the presence of missing values. The complete procedure has been implemented in the compositions package for R.
5 K Na semi variogram Ca P Fe Mg Mn lag distance (km) Figure 1: Empirical intrinsic variation matrix (symbols) compared with the fitted theoretical model (lines), with a single spherical structure of fitted range 8.8km plus a nugget effect. Black (circles, thick line) is used for the original data set, red (cross, thin line) for the data set with artificial missing values. Substantial departures of the model from the empirical version are in both cases only visible beyond the range. The fit can be obtained by the following sequence of commands in R: > library(compositions)... loading data > empvar <- logratiovariogram(comp,x) > vgmodel <- CompLinModCoReg(~nugget()+sph(5),comp) > fitted <- vgmfit2lrv(empvar,vgmodel,iterlim = 1000)
6 ACKNOWDLEDGEMENT This research was funded by the spanish Ministry of Science and Innovation though a Juan de la Cierva subprogram, supported by the European Social Fund (ESF-FSE), and through the Research project MTM REFERENCES [1] J. Aitchison. The Statistical Analysis of Compositional Data. Chapman & Hall Ltd., London, [2] C. Barceló-Vidal. Fundamentación matemática del análisis de datos composicionales. Technical Report IMA RR, Departament d Informática i Matemática Aplicada, Universitat de Girona, Spain, [3] F. Chayes. On correlation between variables of constant sum. Journal of Geophysical Research, 65: , [4] P. Goovaerts. Geostatistics for Natural Resources Evaluation. Oxford University Press, New York, [5] V. Pawlowsky-Glahn. On spurious spatial covariance between variables of constant sum. Science de la Terre, Sér. Informatique, 21: , [6] V. Pawlowsky-Glahn and H. Burger. Spatial structure analysis of regionalized compositions. Mathematical Geology, 24: , [7] V. Pawlowsky-Glahn and R.A. Olea. Geostatistical Analysis of Compositional Data. Oxford University Press, [8] R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, [9] R. Tolosana-Delgado, J.J. Egozcue, and V. Pawlowsky-Glahn. Cokriging of compositions: Log-ratios and unbiasedness. In J.M. Ortiz and X. Emery, editors, Geostatistics Chile 2008, pages Gecamin Ltd., Santiago de Chile, [10] K. G. van den Boogaart, R. Tolosana, and M. Bren. compositions: Compositional Data Analysis. R package version , [11] L. Weber and J.C. Davis. Multivariate statistical analysis of stream-sediment geochemistry in the grazer paläozoikum, austria. Mineralium Deposita, 25: , 1990.
Appendix 07 Principal components analysis
Appendix 07 Principal components analysis Data Analysis by Eric Grunsky The chemical analyses data were imported into the R (www.r-project.org) statistical processing environment for an evaluation of possible
More informationCoDa-dendrogram: A new exploratory tool. 2 Dept. Informàtica i Matemàtica Aplicada, Universitat de Girona, Spain;
CoDa-dendrogram: A new exploratory tool J.J. Egozcue 1, and V. Pawlowsky-Glahn 2 1 Dept. Matemàtica Aplicada III, Universitat Politècnica de Catalunya, Barcelona, Spain; juan.jose.egozcue@upc.edu 2 Dept.
More informationMethodological Concepts for Source Apportionment
Methodological Concepts for Source Apportionment Peter Filzmoser Institute of Statistics and Mathematical Methods in Economics Vienna University of Technology UBA Berlin, Germany November 18, 2016 in collaboration
More informationUpdating on the Kernel Density Estimation for Compositional Data
Updating on the Kernel Density Estimation for Compositional Data Martín-Fernández, J. A., Chacón-Durán, J. E., and Mateu-Figueras, G. Dpt. Informàtica i Matemàtica Aplicada, Universitat de Girona, Campus
More informationPrincipal balances.
Principal balances V. PAWLOWSKY-GLAHN 1, J. J. EGOZCUE 2 and R. TOLOSANA-DELGADO 3 1 Dept. Informàtica i Matemàtica Aplicada, U. de Girona, Spain (vera.pawlowsky@udg.edu) 2 Dept. Matemàtica Aplicada III,
More informationThe Dirichlet distribution with respect to the Aitchison measure on the simplex - a first approach
The irichlet distribution with respect to the Aitchison measure on the simplex - a first approach G. Mateu-Figueras and V. Pawlowsky-Glahn epartament d Informàtica i Matemàtica Aplicada, Universitat de
More informationMining. A Geostatistical Framework for Estimating Compositional Data Avoiding Bias in Back-transformation. Mineração. Abstract. 1.
http://dx.doi.org/10.1590/0370-4467015690041 Ricardo Hundelshaussen Rubio Engenheiro Industrial, MSc, Doutorando Universidade Federal do Rio Grande do Sul - UFRS Departamento de Engenharia de Minas Porto
More informationRegression with Compositional Response. Eva Fišerová
Regression with Compositional Response Eva Fišerová Palacký University Olomouc Czech Republic LinStat2014, August 24-28, 2014, Linköping joint work with Karel Hron and Sandra Donevska Objectives of the
More informationTime Series of Proportions: A Compositional Approach
Time Series of Proportions: A Compositional Approach C. Barceló-Vidal 1 and L. Aguilar 2 1 Dept. Informàtica i Matemàtica Aplicada, Campus de Montilivi, Univ. de Girona, E-17071 Girona, Spain carles.barcelo@udg.edu
More informationGeochemical Data Evaluation and Interpretation
Geochemical Data Evaluation and Interpretation Eric Grunsky Geological Survey of Canada Workshop 2: Exploration Geochemistry Basic Principles & Concepts Exploration 07 8-Sep-2007 Outline What is geochemical
More informationIntroduction. Semivariogram Cloud
Introduction Data: set of n attribute measurements {z(s i ), i = 1,, n}, available at n sample locations {s i, i = 1,, n} Objectives: Slide 1 quantify spatial auto-correlation, or attribute dissimilarity
More informationPrincipal component analysis for compositional data with outliers
ENVIRONMETRICS Environmetrics 2009; 20: 621 632 Published online 11 February 2009 in Wiley InterScience (www.interscience.wiley.com).966 Principal component analysis for compositional data with outliers
More informationAn affine equivariant anamorphosis for compositional data. presenting author
An affine equivariant anamorphosis for compositional data An affine equivariant anamorphosis for compositional data K. G. VAN DEN BOOGAART, R. TOLOSANA-DELGADO and U. MUELLER Helmholtz Institute for Resources
More informationDiscriminant analysis for compositional data and robust parameter estimation
Noname manuscript No. (will be inserted by the editor) Discriminant analysis for compositional data and robust parameter estimation Peter Filzmoser Karel Hron Matthias Templ Received: date / Accepted:
More informationTHE CLOSURE PROBLEM: ONE HUNDRED YEARS OF DEBATE
Vera Pawlowsky-Glahn 1 and Juan José Egozcue 2 M 2 1 Dept. of Computer Science and Applied Mathematics; University of Girona; Girona, SPAIN; vera.pawlowsky@udg.edu; 2 Dept. of Applied Mathematics; Technical
More information&RPSRVLWLRQDOGDWDDQDO\VLVWKHRU\DQGVSDWLDOLQYHVWLJDWLRQRQZDWHUFKHPLVWU\
021,725,1*52&('85(6,1(19,5210(17$/*(2&+(0,675< $1'&2026,7,21$/'$7$$1$/
More informationRegression with compositional response having unobserved components or below detection limit values
Regression with compositional response having unobserved components or below detection limit values Karl Gerald van den Boogaart 1 2, Raimon Tolosana-Delgado 1 2, and Matthias Templ 3 1 Department of Modelling
More informationBayes spaces: use of improper priors and distances between densities
Bayes spaces: use of improper priors and distances between densities J. J. Egozcue 1, V. Pawlowsky-Glahn 2, R. Tolosana-Delgado 1, M. I. Ortego 1 and G. van den Boogaart 3 1 Universidad Politécnica de
More informationFinding the Nearest Positive Definite Matrix for Input to Semiautomatic Variogram Fitting (varfit_lmc)
Finding the Nearest Positive Definite Matrix for Input to Semiautomatic Variogram Fitting (varfit_lmc) Arja Jewbali (arja.jewbali@riotinto.com) Resource Estimation Geologist Rio Tinto Iron Ore In resource
More informationarxiv: v2 [stat.me] 16 Jun 2011
A data-based power transformation for compositional data Michail T. Tsagris, Simon Preston and Andrew T.A. Wood Division of Statistics, School of Mathematical Sciences, University of Nottingham, UK; pmxmt1@nottingham.ac.uk
More informationA Critical Approach to Non-Parametric Classification of Compositional Data
A Critical Approach to Non-Parametric Classification of Compositional Data J. A. Martín-Fernández, C. Barceló-Vidal, V. Pawlowsky-Glahn Dept. d'informàtica i Matemàtica Aplicada, Escola Politècnica Superior,
More informationBasics of Point-Referenced Data Models
Basics of Point-Referenced Data Models Basic tool is a spatial process, {Y (s), s D}, where D R r Chapter 2: Basics of Point-Referenced Data Models p. 1/45 Basics of Point-Referenced Data Models Basic
More informationPRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH
PRODUCING PROBABILITY MAPS TO ASSESS RISK OF EXCEEDING CRITICAL THRESHOLD VALUE OF SOIL EC USING GEOSTATISTICAL APPROACH SURESH TRIPATHI Geostatistical Society of India Assumptions and Geostatistical Variogram
More informationExploring Compositional Data with the CoDa-Dendrogram
AUSTRIAN JOURNAL OF STATISTICS Volume 40 (2011), Number 1 & 2, 103-113 Exploring Compositional Data with the CoDa-Dendrogram Vera Pawlowsky-Glahn 1 and Juan Jose Egozcue 2 1 University of Girona, Spain
More informationA kernel indicator variogram and its application to groundwater pollution
Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session IPS101) p.1514 A kernel indicator variogram and its application to groundwater pollution data Menezes, Raquel University
More informationThis appendix provides a very basic introduction to linear algebra concepts.
APPENDIX Basic Linear Algebra Concepts This appendix provides a very basic introduction to linear algebra concepts. Some of these concepts are intentionally presented here in a somewhat simplified (not
More informationGeostatistics for Gaussian processes
Introduction Geostatistical Model Covariance structure Cokriging Conclusion Geostatistics for Gaussian processes Hans Wackernagel Geostatistics group MINES ParisTech http://hans.wackernagel.free.fr Kernels
More informationGeostatistics for Seismic Data Integration in Earth Models
2003 Distinguished Instructor Short Course Distinguished Instructor Series, No. 6 sponsored by the Society of Exploration Geophysicists European Association of Geoscientists & Engineers SUB Gottingen 7
More informationExploring the World of Ordinary Kriging. Dennis J. J. Walvoort. Wageningen University & Research Center Wageningen, The Netherlands
Exploring the World of Ordinary Kriging Wageningen University & Research Center Wageningen, The Netherlands July 2004 (version 0.2) What is? What is it about? Potential Users a computer program for exploring
More informationChapter 4 - Fundamentals of spatial processes Lecture notes
TK4150 - Intro 1 Chapter 4 - Fundamentals of spatial processes Lecture notes Odd Kolbjørnsen and Geir Storvik January 30, 2017 STK4150 - Intro 2 Spatial processes Typically correlation between nearby sites
More informationBeta-Binomial Kriging: An Improved Model for Spatial Rates
Available online at www.sciencedirect.com ScienceDirect Procedia Environmental Sciences 27 (2015 ) 30 37 Spatial Statistics 2015: Emerging Patterns - Part 2 Beta-Binomial Kriging: An Improved Model for
More informationIndex. Geostatistics for Environmental Scientists, 2nd Edition R. Webster and M. A. Oliver 2007 John Wiley & Sons, Ltd. ISBN:
Index Akaike information criterion (AIC) 105, 290 analysis of variance 35, 44, 127 132 angular transformation 22 anisotropy 59, 99 affine or geometric 59, 100 101 anisotropy ratio 101 exploring and displaying
More informationMapping Precipitation in Switzerland with Ordinary and Indicator Kriging
Journal of Geographic Information and Decision Analysis, vol. 2, no. 2, pp. 65-76, 1998 Mapping Precipitation in Switzerland with Ordinary and Indicator Kriging Peter M. Atkinson Department of Geography,
More informationCompositional data analysis of element concentrations of simultaneous size-segregated PM measurements
Compositional data analysis of element concentrations of simultaneous size-segregated PM measurements A. Speranza, R. Caggiano, S. Margiotta and V. Summa Consiglio Nazionale delle Ricerche Istituto di
More informationarxiv: v1 [math.st] 11 Jun 2018
Robust test statistics for the two-way MANOVA based on the minimum covariance determinant estimator Bernhard Spangl a, arxiv:1806.04106v1 [math.st] 11 Jun 2018 a Institute of Applied Statistics and Computing,
More informationAn EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models. Rafiq Hijazi
An EM-Algorithm Based Method to Deal with Rounded Zeros in Compositional Data under Dirichlet Models Rafiq Hijazi Department of Statistics United Arab Emirates University P.O. Box 17555, Al-Ain United
More informationError Propagation in Isometric Log-ratio Coordinates for Compositional Data: Theoretical and Practical Considerations
Math Geosci (2016) 48:941 961 DOI 101007/s11004-016-9646-x ORIGINAL PAPER Error Propagation in Isometric Log-ratio Coordinates for Compositional Data: Theoretical and Practical Considerations Mehmet Can
More informationPorosity prediction using cokriging with multiple secondary datasets
Cokriging with Multiple Attributes Porosity prediction using cokriging with multiple secondary datasets Hong Xu, Jian Sun, Brian Russell, Kris Innanen ABSTRACT The prediction of porosity is essential for
More informationTypes of Spatial Data
Spatial Data Types of Spatial Data Point pattern Point referenced geostatistical Block referenced Raster / lattice / grid Vector / polygon Point Pattern Data Interested in the location of points, not their
More informationA Covariance Conversion Approach of Gamma Random Field Simulation
Proceedings of the 8th International Symposium on Spatial Accuracy Assessment in Natural Resources and Environmental Sciences Shanghai, P. R. China, June 5-7, 008, pp. 4-45 A Covariance Conversion Approach
More informationDealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1
Mathematical Geology, Vol. 35, No. 3, April 2003 ( C 2003) Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation 1 J. A. Martín-Fernández, 2 C. Barceló-Vidal,
More informationOFTEN we need to be able to integrate point attribute information
ALLAN A NIELSEN: GEOSTATISTICS AND ANALYSIS OF SPATIAL DATA 1 Geostatistics and Analysis of Spatial Data Allan A Nielsen Abstract This note deals with geostatistical measures for spatial correlation, namely
More informationGeostatistics: Kriging
Geostatistics: Kriging 8.10.2015 Konetekniikka 1, Otakaari 4, 150 10-12 Rangsima Sunila, D.Sc. Background What is Geostatitics Concepts Variogram: experimental, theoretical Anisotropy, Isotropy Lag, Sill,
More informationThe Mathematics of Compositional Analysis
Austrian Journal of Statistics September 2016, Volume 45, 57 71. AJS http://www.ajs.or.at/ doi:10.17713/ajs.v45i4.142 The Mathematics of Compositional Analysis Carles Barceló-Vidal University of Girona,
More informationSPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA
SPATIAL-TEMPORAL TECHNIQUES FOR PREDICTION AND COMPRESSION OF SOIL FERTILITY DATA D. Pokrajac Center for Information Science and Technology Temple University Philadelphia, Pennsylvania A. Lazarevic Computer
More informationIntroductory Econometrics
Based on the textbook by Wooldridge: : A Modern Approach Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna December 17, 2012 Outline Heteroskedasticity
More informationSpatial Statistics with Image Analysis. Lecture L02. Computer exercise 0 Daily Temperature. Lecture 2. Johan Lindström.
C Stochastic fields Covariance Spatial Statistics with Image Analysis Lecture 2 Johan Lindström November 4, 26 Lecture L2 Johan Lindström - johanl@maths.lth.se FMSN2/MASM2 L /2 C Stochastic fields Covariance
More informationCompositional Canonical Correlation Analysis
Compositional Canonical Correlation Analysis Jan Graffelman 1,2 Vera Pawlowsky-Glahn 3 Juan José Egozcue 4 Antonella Buccianti 5 1 Department of Statistics and Operations Research Universitat Politècnica
More informationI don t have much to say here: data are often sampled this way but we more typically model them in continuous space, or on a graph
Spatial analysis Huge topic! Key references Diggle (point patterns); Cressie (everything); Diggle and Ribeiro (geostatistics); Dormann et al (GLMMs for species presence/abundance); Haining; (Pinheiro and
More informationarxiv: v3 [stat.me] 23 Oct 2017
Means and covariance functions for geostatistical compositional data: an axiomatic approach Denis Allard a, Thierry Marchant b arxiv:1512.05225v3 [stat.me] 23 Oct 2017 a Biostatistics and Spatial Processes,
More informationThe assumptions are needed to give us... valid standard errors valid confidence intervals valid hypothesis tests and p-values
Statistical Consulting Topics The Bootstrap... The bootstrap is a computer-based method for assigning measures of accuracy to statistical estimates. (Efron and Tibshrani, 1998.) What do we do when our
More informationNonlinear Kriging, potentialities and drawbacks
Nonlinear Kriging, potentialities and drawbacks K. G. van den Boogaart TU Bergakademie Freiberg, Germany; boogaart@grad.tu-freiberg.de Motivation Kriging is known to be the best linear prediction to conclude
More informationLinear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,
Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,
More informationModified Kolmogorov-Smirnov Test of Goodness of Fit. Catalonia-BarcelonaTECH, Spain
152/304 CoDaWork 2017 Abbadia San Salvatore (IT) Modified Kolmogorov-Smirnov Test of Goodness of Fit G.S. Monti 1, G. Mateu-Figueras 2, M. I. Ortego 3, V. Pawlowsky-Glahn 2 and J. J. Egozcue 3 1 Department
More informationAn Introduction to Spatial Statistics. Chunfeng Huang Department of Statistics, Indiana University
An Introduction to Spatial Statistics Chunfeng Huang Department of Statistics, Indiana University Microwave Sounding Unit (MSU) Anomalies (Monthly): 1979-2006. Iron Ore (Cressie, 1986) Raw percent data
More informationOn dealing with spatially correlated residuals in remote sensing and GIS
On dealing with spatially correlated residuals in remote sensing and GIS Nicholas A. S. Hamm 1, Peter M. Atkinson and Edward J. Milton 3 School of Geography University of Southampton Southampton SO17 3AT
More information7 Geostatistics. Figure 7.1 Focus of geostatistics
7 Geostatistics 7.1 Introduction Geostatistics is the part of statistics that is concerned with geo-referenced data, i.e. data that are linked to spatial coordinates. To describe the spatial variation
More informationESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS
ESTIMATING THE MEAN LEVEL OF FINE PARTICULATE MATTER: AN APPLICATION OF SPATIAL STATISTICS Richard L. Smith Department of Statistics and Operations Research University of North Carolina Chapel Hill, N.C.,
More informationSpace-time data. Simple space-time analyses. PM10 in space. PM10 in time
Space-time data Observations taken over space and over time Z(s, t): indexed by space, s, and time, t Here, consider geostatistical/time data Z(s, t) exists for all locations and all times May consider
More informationPoint-Referenced Data Models
Point-Referenced Data Models Jamie Monogan University of Georgia Spring 2013 Jamie Monogan (UGA) Point-Referenced Data Models Spring 2013 1 / 19 Objectives By the end of these meetings, participants should
More informationA Program for Data Transformations and Kernel Density Estimation
A Program for Data Transformations and Kernel Density Estimation John G. Manchuk and Clayton V. Deutsch Modeling applications in geostatistics often involve multiple variables that are not multivariate
More informationSIMPLICIAL REGRESSION. THE NORMAL MODEL
Journal of Applied Probability and Statistics Vol. 6, No. 1&2, pp. 87-108 ISOSS Publications 2012 SIMPLICIAL REGRESSION. THE NORMAL MODEL Juan José Egozcue Dept. Matemàtica Aplicada III, U. Politècnica
More informationSoil Moisture Modeling using Geostatistical Techniques at the O Neal Ecological Reserve, Idaho
Final Report: Forecasting Rangeland Condition with GIS in Southeastern Idaho Soil Moisture Modeling using Geostatistical Techniques at the O Neal Ecological Reserve, Idaho Jacob T. Tibbitts, Idaho State
More informationBivariate Weibull-power series class of distributions
Bivariate Weibull-power series class of distributions Saralees Nadarajah and Rasool Roozegar EM algorithm, Maximum likelihood estimation, Power series distri- Keywords: bution. Abstract We point out that
More informationSome Practical Aspects on Multidimensional Scaling of Compositional Data 2 1 INTRODUCTION 1.1 The sample space for compositional data An observation x
Some Practical Aspects on Multidimensional Scaling of Compositional Data 1 Some Practical Aspects on Multidimensional Scaling of Compositional Data J. A. Mart n-fernández 1 and M. Bren 2 To visualize the
More informationEstimation of direction of increase of gold mineralisation using pair-copulas
22nd International Congress on Modelling and Simulation, Hobart, Tasmania, Australia, 3 to 8 December 2017 mssanz.org.au/modsim2017 Estimation of direction of increase of gold mineralisation using pair-copulas
More informationCompositional Kriging: A Spatial Interpolation Method for Compositional Data 1
Mathematical Geology, Vol. 33, No. 8, November 2001 ( C 2001) Compositional Kriging: A Spatial Interpolation Method for Compositional Data 1 Dennis J. J. Walvoort 2,3 and Jaap J. de Gruijter 2 Compositional
More informationIntroductory compositional data (CoDa)analysis for soil
Introductory compositional data (CoDa)analysis for soil 1 scientists Léon E. Parent, department of Soils and Agrifood Engineering Université Laval, Québec 2 Definition (Aitchison, 1986) Compositional data
More informationEXPLORATION OF GEOLOGICAL VARIABILITY AND POSSIBLE PROCESSES THROUGH THE USE OF COMPOSITIONAL DATA ANALYSIS: AN EXAMPLE USING SCOTTISH METAMORPHOSED
1 EXPLORATION OF GEOLOGICAL VARIABILITY AN POSSIBLE PROCESSES THROUGH THE USE OF COMPOSITIONAL ATA ANALYSIS: AN EXAMPLE USING SCOTTISH METAMORPHOSE C. W. Thomas J. Aitchison British Geological Survey epartment
More informationLinear Algebra (Review) Volker Tresp 2017
Linear Algebra (Review) Volker Tresp 2017 1 Vectors k is a scalar (a number) c is a column vector. Thus in two dimensions, c = ( c1 c 2 ) (Advanced: More precisely, a vector is defined in a vector space.
More informationModels for spatial data (cont d) Types of spatial data. Types of spatial data (cont d) Hierarchical models for spatial data
Hierarchical models for spatial data Based on the book by Banerjee, Carlin and Gelfand Hierarchical Modeling and Analysis for Spatial Data, 2004. We focus on Chapters 1, 2 and 5. Geo-referenced data arise
More informationBasics in Geostatistics 2 Geostatistical interpolation/estimation: Kriging methods. Hans Wackernagel. MINES ParisTech.
Basics in Geostatistics 2 Geostatistical interpolation/estimation: Kriging methods Hans Wackernagel MINES ParisTech NERSC April 2013 http://hans.wackernagel.free.fr Basic concepts Geostatistics Hans Wackernagel
More informationCBMS Lecture 1. Alan E. Gelfand Duke University
CBMS Lecture 1 Alan E. Gelfand Duke University Introduction to spatial data and models Researchers in diverse areas such as climatology, ecology, environmental exposure, public health, and real estate
More informationInverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1
Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is
More information11/8/2018. Spatial Interpolation & Geostatistics. Kriging Step 1
(Z i Z j ) 2 / 2 (Z i Zj) 2 / 2 Semivariance y 11/8/2018 Spatial Interpolation & Geostatistics Kriging Step 1 Describe spatial variation with Semivariogram Lag Distance between pairs of points Lag Mean
More informationAnomaly Density Estimation from Strip Transect Data: Pueblo of Isleta Example
Anomaly Density Estimation from Strip Transect Data: Pueblo of Isleta Example Sean A. McKenna, Sandia National Laboratories Brent Pulsipher, Pacific Northwest National Laboratory May 5 Distribution Statement
More informationDiversity partitioning without statistical independence of alpha and beta
1964 Ecology, Vol. 91, No. 7 Ecology, 91(7), 2010, pp. 1964 1969 Ó 2010 by the Ecological Society of America Diversity partitioning without statistical independence of alpha and beta JOSEPH A. VEECH 1,3
More information9. Multivariate Linear Time Series (II). MA6622, Ernesto Mordecki, CityU, HK, 2006.
9. Multivariate Linear Time Series (II). MA6622, Ernesto Mordecki, CityU, HK, 2006. References for this Lecture: Introduction to Time Series and Forecasting. P.J. Brockwell and R. A. Davis, Springer Texts
More informationChapter 6. Eigenvalues. Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45
Chapter 6 Eigenvalues Josef Leydold Mathematical Methods WS 2018/19 6 Eigenvalues 1 / 45 Closed Leontief Model In a closed Leontief input-output-model consumption and production coincide, i.e. V x = x
More informationFinal Exam. Economics 835: Econometrics. Fall 2010
Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each
More informationINDIRECT GEOSTATISTICAL METHODS TO ASSESS ENVIRONMENTAL POLLUTION BY HEAVY METALS. CASE STUDY: UKRAINE.
INDIRECT GEOSTATISTICAL METHODS TO ASSESS ENVIRONMENTAL POLLUTION BY HEAVY METALS. CASE STUDY: UKRAINE. Carme Hervada-Sala 1, Eusebi Jarauta-Bragulat 2, Yulian G. Tyutyunnik, 3 Oleg B. Blum 4 1 Dept. Physics
More informationStatistical Analysis of. Compositional Data
Statistical Analysis of Compositional Data Statistical Analysis of Compositional Data Carles Barceló Vidal J Antoni Martín Fernández Santiago Thió Fdez-Henestrosa Dept d Informàtica i Matemàtica Aplicada
More informationAn overview of applied econometrics
An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical
More informationEstimation of AUC from 0 to Infinity in Serial Sacrifice Designs
Estimation of AUC from 0 to Infinity in Serial Sacrifice Designs Martin J. Wolfsegger Department of Biostatistics, Baxter AG, Vienna, Austria Thomas Jaki Department of Statistics, University of South Carolina,
More informationAn Introduction to Spatial Autocorrelation and Kriging
An Introduction to Spatial Autocorrelation and Kriging Matt Robinson and Sebastian Dietrich RenR 690 Spring 2016 Tobler and Spatial Relationships Tobler s 1 st Law of Geography: Everything is related to
More informationSPATIAL ELECTRICAL LOADS MODELING USING THE GEOSTATISTICAL METHODS
19 th International CODATA Conference THE INFORMATION SOCIETY: NEW HORIZONS FOR SCIENCE Berlin, Germany 7-1 November 24 SPATIAL ELECTRICAL LOADS MODELING USING THE GEOSTATISTICAL METHODS Barbara Namysłowska-Wilczyńska
More informationFluvial Variography: Characterizing Spatial Dependence on Stream Networks. Dale Zimmerman University of Iowa (joint work with Jay Ver Hoef, NOAA)
Fluvial Variography: Characterizing Spatial Dependence on Stream Networks Dale Zimmerman University of Iowa (joint work with Jay Ver Hoef, NOAA) March 5, 2015 Stream network data Flow Legend o 4.40-5.80
More informationPropensity score matching for multiple treatment levels: A CODA-based contribution
Propensity score matching for multiple treatment levels: A CODA-based contribution Hajime Seya *1 Graduate School of Engineering Faculty of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe
More informationEmpirical Gramians and Balanced Truncation for Model Reduction of Nonlinear Systems
Empirical Gramians and Balanced Truncation for Model Reduction of Nonlinear Systems Antoni Ras Departament de Matemàtica Aplicada 4 Universitat Politècnica de Catalunya Lecture goals To review the basic
More informationScientific registration nº 2293 Symposium nº 17 Presentation : poster. VIEIRA R. Sisney 1, TABOADA Teresa 2, PAZ Antonio 2
Scientific registration nº 2293 Symposium nº 17 Presentation : poster An assessment of heavy metal variability in a one hectare plot under natural vegetation in a serpentine area Evaluation de la variabilité
More informationSpatiotemporal Analysis of Environmental Radiation in Korea
WM 0 Conference, February 25 - March, 200, Tucson, AZ Spatiotemporal Analysis of Environmental Radiation in Korea J.Y. Kim, B.C. Lee FNC Technology Co., Ltd. Main Bldg. 56, Seoul National University Research
More informationSpatial Interpolation & Geostatistics
(Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 y Kriging Step 1 Describe spatial variation with Semivariogram (Z i Z j ) 2 / 2 Point cloud Map 3
More informationAdvanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland
EnviroInfo 2004 (Geneva) Sh@ring EnviroInfo 2004 Advanced analysis and modelling tools for spatial environmental data. Case study: indoor radon data in Switzerland Mikhail Kanevski 1, Michel Maignan 1
More informationR function for residual analysis in linear mixed models: lmmresid
R function for residual analysis in linear mixed models: lmmresid Juvêncio S. Nobre 1, and Julio M. Singer 2, 1 Departamento de Estatística e Matemática Aplicada, Universidade Federal do Ceará, Fortaleza,
More informationSTATS DOESN T SUCK! ~ CHAPTER 16
SIMPLE LINEAR REGRESSION: STATS DOESN T SUCK! ~ CHAPTER 6 The HR manager at ACME food services wants to examine the relationship between a workers income and their years of experience on the job. He randomly
More informationPACKAGE LMest FOR LATENT MARKOV ANALYSIS
PACKAGE LMest FOR LATENT MARKOV ANALYSIS OF LONGITUDINAL CATEGORICAL DATA Francesco Bartolucci 1, Silvia Pandofi 1, and Fulvia Pennoni 2 1 Department of Economics, University of Perugia (e-mail: francesco.bartolucci@unipg.it,
More informationClassification of Compositional Data Using Mixture Models: a Case Study Using Granulometric Data
Classification of Compositional Data Using Mixture Models 1 Classification of Compositional Data Using Mixture Models: a Case Study Using Granulometric Data C. Barceló 1, V. Pawlowsky 2 and G. Bohling
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationE(x i ) = µ i. 2 d. + sin 1 d θ 2. for d < θ 2 0 for d θ 2
1 Gaussian Processes Definition 1.1 A Gaussian process { i } over sites i is defined by its mean function and its covariance function E( i ) = µ i c ij = Cov( i, j ) plus joint normality of the finite
More informationCovariance function estimation in Gaussian process regression
Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian
More information