Descriptive Statistics for Symbolic Data

Size: px
Start display at page:

Download "Descriptive Statistics for Symbolic Data"

Transcription

1 Outline Descriptive Statistics for Symbolic Data Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account

2 Outline Outline 1 A new framework 2 3

3 Outline A new framework 1 A new framework 2 3

4 Descriptive Statistics for Symbolic Variables No unique and straightforward definitions! What is the variance of a set of interval observations? How de we measure correlation? Measures based on interval parameters Measures based on distributional assumptions Measures based on distances

5 Outline A new framework 1 A new framework 2 3

6 Biplots for Interval variables

7 Descriptive Statistics for Interval Variables First option : Using the dispersion of the interval centers The mean value and the dispersion of all interval midpoints are given by Y j = 1 n l ij + u ij n 2 S 2 Y k = 1 n n ( ) lij + u 2 ij Y j 2

8 Descriptive Statistics for Interval Variables Second option : Using the dispersion of the interval boundaries. The mean value and the dispersion of all interval midpoints are given by Y j = 1 n l ij + u ij n 2 S 2 Y k = 1 n n (l ij Y j ) 2 + (u ij Y j ) 2 2

9 Descriptive Statistics for Interval Variables Under the assumption that the observed Y j (s i ) and Y j (s i ) values, i = 1,..., n, are uniformly distributed across each interval I ik = [l ik, u ik ], k = j, j, we have E(Y ik ) = (l ik + u ik )/2 = c ik and Var(Y ik ) = (u ik l ik ) 2 /12 symbolic sample mean : Y k = 1 n (l ik + u ik ) = 1 2n n symbolic sample variance : SY 2 k = 1 n [(l ik Y k ) 2 + (l ik Y k )(u ik Y k ) + (u ik Y k ) 2 ] 3n = 1 n (lik 2 + l ik u ik + u 2 2 3n ik) Y k Bertrand and Goupil s (2000) obtained from the empirical density function for an interval variable n c ik

10 Descriptive Statistics for Interval Variables For the symbolic covariance three definitions were proposed : Cov 1 (Y j, Y j ) = 1 n (l ij + u ij )(l ij + u ij ) Y j.y j 4n Billard & Diday (2003) obtained from the empirical joint density function Cov 2 (Y j, Y j ) = 1 n G j G j [Q j, Q j ] 1/2 3n with Q k = (l ik Y k ) 2 + (l ik Y k )(u ik Y k ) + (u ik Y k ) 2, { 1 if c ik Y k G k = 1 if c ik > Y k Billard & Diday (2006) incorporating more accurately both between and within interval variations into the overall covariance

11 Descriptive Statistics for Interval Variables Cov 3 (Y j, Y j ) = 1 n (u ij l ij )(u ij l ij ) + n 12 }{{} = 1 6n WithinSP + 1 n ( lij + u )( ij lij + u ) ij Y j Y j n 2 2 }{{} BetweenSP n [2(l ij Y j )(l ij Y j ) + (l ij Y j )(u ij Y j ) +(u ij Y j )(l ij Y j ) + 2(u ij Y j )(u ij Y j )] Billard (2008) considering a decomposition into Within observations Sum of Products (WithinSP) and Between observations Sum of Products (BetweenSP)

12 : Distance measures Many measures proposed in the litterature Hausdorff distance : d H (I i, I j ) = max {{ l i l j, u i u j } Euclidean distance : d 2 (I i, I j ) = (l i l j ) 2 + (u i u j ) 2 City-Block distance : d 1 (I i, I j ) = l i l j + u i u j.

13 Outline A new framework 1 A new framework 2 3

14 Biplots for Histogram variables

15 Descriptive Statistics for Histogram Variables Assumming an Uniform distributon within each sub-interval of Y k (s i ), i = 1,..., n, I ikl = [l ikl, u ikl ], l = 1,... K j, k = j, j we have symbolic sample mean : Y k = 1 K n j ((l ikl + u ikl )p ikl ) 2n l=1 symbolic sample variance : SY 2 k == 1 K n j 3n l=1 Billard and Diday (2003) ((l 2 ik + l iku ik + u 2 ik )p ikl) Y k 2

16 Descriptive Statistics for Histogram Variables And for the symbolic covariance three definitions : Cov 1 (Y j, Y j ) = 1 K n j p ijl p ij 4n l(l ij + u ij )(l ij + u ij ) Y j.y j l=1 Billard & Diday (2003) obtained from the empirical joint density function

17 Correlation Between Symbolic Variables As in the classic variables: the correlation coefficient is defined as : where r Yj Y j = Cov(Y j, Y j ) S Yj S Yj Cov(Y j, Y j ) is the covariance function between Y j and Y j S Yj, S Yj the symbolic standard deviation of the variables Y j and Y j, respectively. In the particular case of interval variables the descriptive statistics depend on the assumed distribution within each interval. Results already obtained for other distributions, e.g., the triangular distribution.

Discriminant Analysis for Interval Data

Discriminant Analysis for Interval Data Outline Discriminant Analysis for Interval Data Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account

More information

Principal Component Analysis for Interval Data

Principal Component Analysis for Interval Data Outline Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account Outline Outline 1 Introduction to

More information

Modelling and Analysing Interval Data

Modelling and Analysing Interval Data Modelling and Analysing Interval Data Paula Brito Faculdade de Economia/NIAAD-LIACC, Universidade do Porto Rua Dr. Roberto Frias, 4200-464 Porto, Portugal mpbrito@fep.up.pt Abstract. In this paper we discuss

More information

Dependencies in Interval-valued. valued Symbolic Data. Lynne Billard University of Georgia

Dependencies in Interval-valued. valued Symbolic Data. Lynne Billard University of Georgia Dependencies in Interval-valued valued Symbolic Data Lynne Billard University of Georgia lynne@stat.uga.edu Tribute to Professor Edwin Diday: Paris, France; 5 September 2007 Naturally occurring Symbolic

More information

A new linear regression model for histogram-valued variables

A new linear regression model for histogram-valued variables Int. Statistical Inst.: Proc. 58th World Statistical Congress, 011, Dublin (Session CPS077) p.5853 A new linear regression model for histogram-valued variables Dias, Sónia Instituto Politécnico Viana do

More information

Regression for Symbolic Data

Regression for Symbolic Data Symbolic Data Analysis: Taking Variability in Data into Account Regression for Symbolic Data Sónia Dias I.P. Viana do Castelo & LIAAD-INESC TEC, Univ. Porto Paula Brito Fac. Economia & LIAAD-INESC TEC,

More information

Linear Regression Model with Histogram-Valued Variables

Linear Regression Model with Histogram-Valued Variables Linear Regression Model with Histogram-Valued Variables Sónia Dias 1 and Paula Brito 1 INESC TEC - INESC Technology and Science and ESTG/IPVC - School of Technology and Management, Polytechnic Institute

More information

Multivariate Parametric Analysis of Interval Data

Multivariate Parametric Analysis of Interval Data Outline Multivariate Parametric Analysis of Interval Data Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data

More information

A Resampling Approach for Interval-Valued Data Regression

A Resampling Approach for Interval-Valued Data Regression A Resampling Approach for Interval-Valued Data Regression Jeongyoun Ahn, Muliang Peng, Cheolwoo Park Department of Statistics, University of Georgia, Athens, GA, 30602, USA Yongho Jeon Department of Applied

More information

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29 REPEATED MEASURES Copyright c 2012 (Iowa State University) Statistics 511 1 / 29 Repeated Measures Example In an exercise therapy study, subjects were assigned to one of three weightlifting programs i=1:

More information

Econ 2120: Section 2

Econ 2120: Section 2 Econ 2120: Section 2 Part I - Linear Predictor Loose Ends Ashesh Rambachan Fall 2018 Outline Big Picture Matrix Version of the Linear Predictor and Least Squares Fit Linear Predictor Least Squares Omitted

More information

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU

LU Factorization. LU Decomposition. LU Decomposition. LU Decomposition: Motivation A = LU LU Factorization To further improve the efficiency of solving linear systems Factorizations of matrix A : LU and QR LU Factorization Methods: Using basic Gaussian Elimination (GE) Factorization of Tridiagonal

More information

On central tendency and dispersion measures for intervals and hypercubes

On central tendency and dispersion measures for intervals and hypercubes On central tendency and dispersion measures for intervals and hypercubes Marie Chavent, Jérôme Saracco To cite this version: Marie Chavent, Jérôme Saracco. On central tendency and dispersion measures for

More information

Time Series Analysis Fall 2008

Time Series Analysis Fall 2008 MIT OpenCourseWare http://ocw.mit.edu 14.384 Time Series Analysis Fall 008 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. Introduction 1 14.384 Time

More information

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture

More information

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30

Random Vectors 1. STA442/2101 Fall See last slide for copyright information. 1 / 30 Random Vectors 1 STA442/2101 Fall 2017 1 See last slide for copyright information. 1 / 30 Background Reading: Renscher and Schaalje s Linear models in statistics Chapter 3 on Random Vectors and Matrices

More information

Multivariate T-Squared Control Chart

Multivariate T-Squared Control Chart Multivariate T-Squared Control Chart Summary... 1 Data Input... 3 Analysis Summary... 4 Analysis Options... 5 T-Squared Chart... 6 Multivariate Control Chart Report... 7 Generalized Variance Chart... 8

More information

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use

Modeling Longitudinal Count Data with Excess Zeros and Time-Dependent Covariates: Application to Drug Use Modeling Longitudinal Count Data with Excess Zeros and : Application to Drug Use University of Northern Colorado November 17, 2014 Presentation Outline I and Data Issues II Correlated Count Regression

More information

Generalization of the Principal Components Analysis to Histogram Data

Generalization of the Principal Components Analysis to Histogram Data Generalization of the Principal Components Analysis to Histogram Data Oldemar Rodríguez 1, Edwin Diday 1, and Suzanne Winsberg 2 1 University Paris 9 Dauphine, Ceremade Pl Du Ml de L de Tassigny 75016

More information

Chapter 5 Class Notes

Chapter 5 Class Notes Chapter 5 Class Notes Sections 5.1 and 5.2 It is quite common to measure several variables (some of which may be correlated) and to examine the corresponding joint probability distribution One example

More information

02 Background Minimum background on probability. Random process

02 Background Minimum background on probability. Random process 0 Background 0.03 Minimum background on probability Random processes Probability Conditional probability Bayes theorem Random variables Sampling and estimation Variance, covariance and correlation Probability

More information

Local and Global Sensitivity Analysis

Local and Global Sensitivity Analysis Omar 1,2 1 Duke University Department of Mechanical Engineering & Materials Science omar.knio@duke.edu 2 KAUST Division of Computer, Electrical, Mathematical Science & Engineering omar.knio@kaust.edu.sa

More information

9.1 Orthogonal factor model.

9.1 Orthogonal factor model. 36 Chapter 9 Factor Analysis Factor analysis may be viewed as a refinement of the principal component analysis The objective is, like the PC analysis, to describe the relevant variables in study in terms

More information

Regression and Covariance

Regression and Covariance Regression and Covariance James K. Peterson Department of Biological ciences and Department of Mathematical ciences Clemson University April 16, 2014 Outline A Review of Regression Regression and Covariance

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

Complementary Random Numbers Dagger Sampling. Lecture 4, autumn 2015 Mikael Amelin

Complementary Random Numbers Dagger Sampling. Lecture 4, autumn 2015 Mikael Amelin Complementary Random Numbers Dagger Sampling Lecture 4, autumn 2015 Mikael Amelin 1 Introduction All observations in simple sampling are independent of each other. Sometimes it is possible to increase

More information

Interaction Analysis of Spatial Point Patterns

Interaction Analysis of Spatial Point Patterns Interaction Analysis of Spatial Point Patterns Geog 2C Introduction to Spatial Data Analysis Phaedon C Kyriakidis wwwgeogucsbedu/ phaedon Department of Geography University of California Santa Barbara

More information

SUMMARIZING MEASURED DATA. Gaia Maselli

SUMMARIZING MEASURED DATA. Gaia Maselli SUMMARIZING MEASURED DATA Gaia Maselli maselli@di.uniroma1.it Computer Network Performance 2 Overview Basic concepts Summarizing measured data Summarizing data by a single number Summarizing variability

More information

Properties of Summation Operator

Properties of Summation Operator Econ 325 Section 003/004 Notes on Variance, Covariance, and Summation Operator By Hiro Kasahara Properties of Summation Operator For a sequence of the values {x 1, x 2,..., x n, we write the sum of x 1,

More information

Distributions are the numbers of today From histogram data to distributional data. Javier Arroyo Gallardo Universidad Complutense de Madrid

Distributions are the numbers of today From histogram data to distributional data. Javier Arroyo Gallardo Universidad Complutense de Madrid Distributions are the numbers of today From histogram data to distributional data Javier Arroyo Gallardo Universidad Complutense de Madrid Introduction 2 Symbolic data Symbolic data was introduced by Edwin

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

Mallows L 2 Distance in Some Multivariate Methods and its Application to Histogram-Type Data

Mallows L 2 Distance in Some Multivariate Methods and its Application to Histogram-Type Data Metodološki zvezki, Vol. 9, No. 2, 212, 17-118 Mallows L 2 Distance in Some Multivariate Methods and its Application to Histogram-Type Data Katarina Košmelj 1 and Lynne Billard 2 Abstract Mallows L 2 distance

More information

Speci cation of Conditional Expectation Functions

Speci cation of Conditional Expectation Functions Speci cation of Conditional Expectation Functions Econometrics Douglas G. Steigerwald UC Santa Barbara D. Steigerwald (UCSB) Specifying Expectation Functions 1 / 24 Overview Reference: B. Hansen Econometrics

More information

7. The Multivariate Normal Distribution

7. The Multivariate Normal Distribution of 5 7/6/2009 5:56 AM Virtual Laboratories > 5. Special Distributions > 2 3 4 5 6 7 8 9 0 2 3 4 5 7. The Multivariate Normal Distribution The Bivariate Normal Distribution Definition Suppose that U and

More information

Math 180B, Winter Notes on covariance and the bivariate normal distribution

Math 180B, Winter Notes on covariance and the bivariate normal distribution Math 180B Winter 015 Notes on covariance and the bivariate normal distribution 1 Covariance If and are random variables with finite variances then their covariance is the quantity 11 Cov := E[ µ ] where

More information

Ch4. Distribution of Quadratic Forms in y

Ch4. Distribution of Quadratic Forms in y ST4233, Linear Models, Semester 1 2008-2009 Ch4. Distribution of Quadratic Forms in y 1 Definition Definition 1.1 If A is a symmetric matrix and y is a vector, the product y Ay = i a ii y 2 i + i j a ij

More information

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models

Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Mixed models in R using the lme4 package Part 4: Theory of linear mixed models Douglas Bates 8 th International Amsterdam Conference on Multilevel Analysis 2011-03-16 Douglas Bates

More information

STOR Lecture 16. Properties of Expectation - I

STOR Lecture 16. Properties of Expectation - I STOR 435.001 Lecture 16 Properties of Expectation - I Jan Hannig UNC Chapel Hill 1 / 22 Motivation Recall we found joint distributions to be pretty complicated objects. Need various tools from combinatorics

More information

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint

Biplots in Practice MICHAEL GREENACRE. Professor of Statistics at the Pompeu Fabra University. Chapter 6 Offprint Biplots in Practice MICHAEL GREENACRE Proessor o Statistics at the Pompeu Fabra University Chapter 6 Oprint Principal Component Analysis Biplots First published: September 010 ISBN: 978-84-93846-8-6 Supporting

More information

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects

Topic 25 - One-Way Random Effects Models. Outline. Random Effects vs Fixed Effects. Data for One-way Random Effects Model. One-way Random effects Topic 5 - One-Way Random Effects Models One-way Random effects Outline Model Variance component estimation - Fall 013 Confidence intervals Topic 5 Random Effects vs Fixed Effects Consider factor with numerous

More information

Course topics (tentative) The role of random effects

Course topics (tentative) The role of random effects Course topics (tentative) random effects linear mixed models analysis of variance frequentist likelihood-based inference (MLE and REML) prediction Bayesian inference The role of random effects Rasmus Waagepetersen

More information

More than one variable

More than one variable Chapter More than one variable.1 Bivariate discrete distributions Suppose that the r.v. s X and Y are discrete and take on the values x j and y j, j 1, respectively. Then the joint p.d.f. of X and Y, to

More information

Autocorrelation function of the daily histogram time series of SP500 intradaily returns

Autocorrelation function of the daily histogram time series of SP500 intradaily returns Autocorrelation function of the daily histogram time series of SP5 intradaily returns Gloria González-Rivera University of California, Riverside Department of Economics Riverside, CA 9252 Javier Arroyo

More information

F denotes cumulative density. denotes probability density function; (.)

F denotes cumulative density. denotes probability density function; (.) BAYESIAN ANALYSIS: FOREWORDS Notation. System means the real thing and a model is an assumed mathematical form for the system.. he probability model class M contains the set of the all admissible models

More information

A Nonparametric Kernel Approach to Interval-Valued Data Analysis

A Nonparametric Kernel Approach to Interval-Valued Data Analysis A Nonparametric Kernel Approach to Interval-Valued Data Analysis Yongho Jeon Department of Applied Statistics, Yonsei University, Seoul, 120-749, Korea Jeongyoun Ahn, Cheolwoo Park Department of Statistics,

More information

Fitting Linear Statistical Models to Data by Least Squares II: Weighted

Fitting Linear Statistical Models to Data by Least Squares II: Weighted Fitting Linear Statistical Models to Data by Least Squares II: Weighted Brian R. Hunt and C. David Levermore University of Maryland, College Park Math 420: Mathematical Modeling April 21, 2014 version

More information

Modelling of Dependent Credit Rating Transitions

Modelling of Dependent Credit Rating Transitions ling of (Joint work with Uwe Schmock) Financial and Actuarial Mathematics Vienna University of Technology Wien, 15.07.2010 Introduction Motivation: Volcano on Iceland erupted and caused that most of the

More information

Final Review. Yang Feng. Yang Feng (Columbia University) Final Review 1 / 58

Final Review. Yang Feng.   Yang Feng (Columbia University) Final Review 1 / 58 Final Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Final Review 1 / 58 Outline 1 Multiple Linear Regression (Estimation, Inference) 2 Special Topics for Multiple

More information

Random Processes. DS GA 1002 Probability and Statistics for Data Science.

Random Processes. DS GA 1002 Probability and Statistics for Data Science. Random Processes DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Aim Modeling quantities that evolve in time (or space)

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Chapter 5: Joint Probability Distributions

Chapter 5: Joint Probability Distributions Chapter 5: Joint Probability Distributions Seungchul Baek Department of Statistics, University of South Carolina STAT 509: Statistics for Engineers 1 / 19 Joint pmf Definition: The joint probability mass

More information

On Lead-Lag Estimation

On Lead-Lag Estimation On Lead-Lag Estimation Mathieu Rosenbaum CMAP-École Polytechnique Joint works with Marc Hoffmann, Christian Y. Robert and Nakahiro Yoshida 12 January 2011 Mathieu Rosenbaum On Lead-Lag Estimation 1 Outline

More information

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES

MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of

More information

Gaussian random variables inr n

Gaussian random variables inr n Gaussian vectors Lecture 5 Gaussian random variables inr n One-dimensional case One-dimensional Gaussian density with mean and standard deviation (called N, ): fx x exp. Proposition If X N,, then ax b

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Joint Simulation of Correlated Variables using High-order Spatial Statistics

Joint Simulation of Correlated Variables using High-order Spatial Statistics Joint Simulation of Correlated Variables using High-order Spatial Statistics Ilnur Minniakhmetov * Roussos Dimitrakopoulos COSMO Stochastic Mine Planning Laboratory Department of Mining and Materials Engineering

More information

E(x i ) = µ i. 2 d. + sin 1 d θ 2. for d < θ 2 0 for d θ 2

E(x i ) = µ i. 2 d. + sin 1 d θ 2. for d < θ 2 0 for d θ 2 1 Gaussian Processes Definition 1.1 A Gaussian process { i } over sites i is defined by its mean function and its covariance function E( i ) = µ i c ij = Cov( i, j ) plus joint normality of the finite

More information

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark

Prediction. is a weighted least squares estimate since it minimizes. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark March 22, 2017 WLS and BLUE (prelude to BLUP) Suppose that Y has mean β and known covariance matrix V (but Y need not

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2

MATH 38061/MATH48061/MATH68061: MULTIVARIATE STATISTICS Solutions to Problems on Random Vectors and Random Sampling. 1+ x2 +y 2 ) (n+2)/2 MATH 3806/MATH4806/MATH6806: MULTIVARIATE STATISTICS Solutions to Problems on Rom Vectors Rom Sampling Let X Y have the joint pdf: fx,y) + x +y ) n+)/ π n for < x < < y < this is particular case of the

More information

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013

Product Held at Accelerated Stability Conditions. José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Modeling Sub-Visible Particle Data Product Held at Accelerated Stability Conditions José G. Ramírez, PhD Amgen Global Quality Engineering 6/6/2013 Outline Sub-Visible Particle (SbVP) Poisson Negative Binomial

More information

Linear Algebra Review

Linear Algebra Review Linear Algebra Review Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Linear Algebra Review 1 / 45 Definition of Matrix Rectangular array of elements arranged in rows and

More information

Professor Terje Haukaas University of British Columbia, Vancouver System Reliability

Professor Terje Haukaas University of British Columbia, Vancouver  System Reliability System Reliability This material was first described to me in a course taught by Professor Armen Der iureghian at the University of California at Berkeley. In 2005 he made an excellent description available

More information

Histogram data analysis based on Wasserstein distance

Histogram data analysis based on Wasserstein distance Histogram data analysis based on Wasserstein distance Rosanna Verde Antonio Irpino Department of European and Mediterranean Studies Second University of Naples Caserta - ITALY Aims Introduce: New distances

More information

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg

Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws. Symeon Chatzinotas February 11, 2013 Luxembourg Random Matrix Theory Lecture 1 Introduction, Ensembles and Basic Laws Symeon Chatzinotas February 11, 2013 Luxembourg Outline 1. Random Matrix Theory 1. Definition 2. Applications 3. Asymptotics 2. Ensembles

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Bootstrapping high dimensional vector: interplay between dependence and dimensionality

Bootstrapping high dimensional vector: interplay between dependence and dimensionality Bootstrapping high dimensional vector: interplay between dependence and dimensionality Xianyang Zhang Joint work with Guang Cheng University of Missouri-Columbia LDHD: Transition Workshop, 2014 Xianyang

More information

Gaussian Process Regression Forecasting of Computer Network Conditions

Gaussian Process Regression Forecasting of Computer Network Conditions Gaussian Process Regression Forecasting of Computer Network Conditions Christina Garman Bucknell University August 3, 2010 Christina Garman (Bucknell University) GPR Forecasting of NPCs August 3, 2010

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Many phenomena in nature have approximately Normal distributions.

Many phenomena in nature have approximately Normal distributions. NORMAL DISTRIBUTION The Normal r.v. plays an important role in probability and statistics. Many phenomena in nature have approximately Normal distributions. has a Normal distribution with parameters and,

More information

Advanced statistical methods for data analysis Lecture 2

Advanced statistical methods for data analysis Lecture 2 Advanced statistical methods for data analysis Lecture 2 RHUL Physics www.pp.rhul.ac.uk/~cowan Universität Mainz Klausurtagung des GK Eichtheorien exp. Tests... Bullay/Mosel 15 17 September, 2008 1 Outline

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Basic Concepts in Matrix Algebra

Basic Concepts in Matrix Algebra Basic Concepts in Matrix Algebra An column array of p elements is called a vector of dimension p and is written as x p 1 = x 1 x 2. x p. The transpose of the column vector x p 1 is row vector x = [x 1

More information

Histogram data analysis based on Wasserstein distance

Histogram data analysis based on Wasserstein distance Histogram data analysis based on Wasserstein distance Rosanna Verde Antonio Irpino Department of European and Mediterranean Studies Second University of Naples Caserta - ITALY SYMPOSIUM ON LEARNING AND

More information

Robust Testing and Variable Selection for High-Dimensional Time Series

Robust Testing and Variable Selection for High-Dimensional Time Series Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional

More information

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices.

3d scatterplots. You can also make 3d scatterplots, although these are less common than scatterplot matrices. 3d scatterplots You can also make 3d scatterplots, although these are less common than scatterplot matrices. > library(scatterplot3d) > y par(mfrow=c(2,2)) > scatterplot3d(y,highlight.3d=t,angle=20)

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom

Covariance and Correlation Class 7, Jeremy Orloff and Jonathan Bloom 1 Learning Goals Covariance and Correlation Class 7, 18.05 Jerem Orloff and Jonathan Bloom 1. Understand the meaning of covariance and correlation. 2. Be able to compute the covariance and correlation

More information

WLS and BLUE (prelude to BLUP) Prediction

WLS and BLUE (prelude to BLUP) Prediction WLS and BLUE (prelude to BLUP) Prediction Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark April 21, 2018 Suppose that Y has mean X β and known covariance matrix V (but Y need

More information

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson

Sample Geometry. Edps/Soc 584, Psych 594. Carolyn J. Anderson Sample Geometry Edps/Soc 584, Psych 594 Carolyn J. Anderson Department of Educational Psychology I L L I N O I S university of illinois at urbana-champaign c Board of Trustees, University of Illinois Spring

More information

ANOVA: Analysis of Variance - Part I

ANOVA: Analysis of Variance - Part I ANOVA: Analysis of Variance - Part I The purpose of these notes is to discuss the theory behind the analysis of variance. It is a summary of the definitions and results presented in class with a few exercises.

More information

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses.

6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 6348 Final, Fall 14. Closed book, closed notes, no electronic devices. Points (out of 200) in parentheses. 0 11 1 1.(5) Give the result of the following matrix multiplication: 1 10 1 Solution: 0 1 1 2

More information

Hypothesis Testing For Multilayer Network Data

Hypothesis Testing For Multilayer Network Data Hypothesis Testing For Multilayer Network Data Jun Li Dept of Mathematics and Statistics, Boston University Joint work with Eric Kolaczyk Outline Background and Motivation Geometric structure of multilayer

More information

diluted treatment effect estimation for trigger analysis in online controlled experiments

diluted treatment effect estimation for trigger analysis in online controlled experiments diluted treatment effect estimation for trigger analysis in online controlled experiments Alex Deng and Victor Hu February 2, 2015 Microsoft outline Trigger Analysis and The Dilution Problem Traditional

More information

G E INTERACTION USING JMP: AN OVERVIEW

G E INTERACTION USING JMP: AN OVERVIEW G E INTERACTION USING JMP: AN OVERVIEW Sukanta Dash I.A.S.R.I., Library Avenue, New Delhi-110012 sukanta@iasri.res.in 1. Introduction Genotype Environment interaction (G E) is a common phenomenon in agricultural

More information

Robust scale estimation with extensions

Robust scale estimation with extensions Robust scale estimation with extensions Garth Tarr, Samuel Müller and Neville Weber School of Mathematics and Statistics THE UNIVERSITY OF SYDNEY Outline The robust scale estimator P n Robust covariance

More information

Lecture 6: Selection on Multiple Traits

Lecture 6: Selection on Multiple Traits Lecture 6: Selection on Multiple Traits Bruce Walsh lecture notes Introduction to Quantitative Genetics SISG, Seattle 16 18 July 2018 1 Genetic vs. Phenotypic correlations Within an individual, trait values

More information

2. Matrix Algebra and Random Vectors

2. Matrix Algebra and Random Vectors 2. Matrix Algebra and Random Vectors 2.1 Introduction Multivariate data can be conveniently display as array of numbers. In general, a rectangular array of numbers with, for instance, n rows and p columns

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

7 Multivariate Statistical Models

7 Multivariate Statistical Models 7 Multivariate Statistical Models 7.1 Introduction Often we are not interested merely in a single random variable but rather in the joint behavior of several random variables, for example, returns on several

More information

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation

1 Outline. 1. Motivation. 2. SUR model. 3. Simultaneous equations. 4. Estimation 1 Outline. 1. Motivation 2. SUR model 3. Simultaneous equations 4. Estimation 2 Motivation. In this chapter, we will study simultaneous systems of econometric equations. Systems of simultaneous equations

More information

Covariance between relatives

Covariance between relatives UNIVERSIDADE DE SÃO PAULO ESCOLA SUPERIOR DE AGRICULTURA LUIZ DE QUEIROZ DEPARTAMENTO DE GENÉTICA LGN5825 Genética e Melhoramento de Espécies Alógamas Covariance between relatives Prof. Roberto Fritsche-Neto

More information

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t,

For a stochastic process {Y t : t = 0, ±1, ±2, ±3, }, the mean function is defined by (2.2.1) ± 2..., γ t, CHAPTER 2 FUNDAMENTAL CONCEPTS This chapter describes the fundamental concepts in the theory of time series models. In particular, we introduce the concepts of stochastic processes, mean and covariance

More information

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin

ACE 562 Fall Lecture 2: Probability, Random Variables and Distributions. by Professor Scott H. Irwin ACE 562 Fall 2005 Lecture 2: Probability, Random Variables and Distributions Required Readings: by Professor Scott H. Irwin Griffiths, Hill and Judge. Some Basic Ideas: Statistical Concepts for Economists,

More information

Gaussian process regression for Sensitivity analysis

Gaussian process regression for Sensitivity analysis Gaussian process regression for Sensitivity analysis GPSS Workshop on UQ, Sheffield, September 2016 Nicolas Durrande, Mines St-Étienne, durrande@emse.fr GPSS workshop on UQ GPs for sensitivity analysis

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Multivariate Distributions CIVL 7012/8012

Multivariate Distributions CIVL 7012/8012 Multivariate Distributions CIVL 7012/8012 Multivariate Distributions Engineers often are interested in more than one measurement from a single item. Multivariate distributions describe the probability

More information