Histogram data analysis based on Wasserstein distance

Size: px
Start display at page:

Download "Histogram data analysis based on Wasserstein distance"

Transcription

1 Histogram data analysis based on Wasserstein distance Rosanna Verde Antonio Irpino Department of European and Mediterranean Studies Second University of Naples Caserta - ITALY

2 Aims Introduce: New distances to compare histogram (distribution) data in the framework of the Symbolic Data Analysis especially the Wasserstein Mallows l distance Show: Some properties of Wasserstein Mallows l distance Present: Methodological approaches based on l distance Basic Statistics for histogram data (including intervals and distributions as particular cases) Clustering methods:dca and hierarchical (Ward criterion) Linear regression model: l as metric for Sum Square Errors in OLS estimation method

3 Main Sources of histogram data Results of summary/clustering procedures From surveys From large databases From sensors 0,6 0,5 0,5 Temperatures Pollutant concentration Network activity 0,4 0,3 0, 0,1 0, 0,1 0, Data streams 0 Description of time window data sequences Image analysis Color bandwidths 0,6 0,5 0,4 0,5 0,4 Confidentiality data Summary data non punctual 0,3 0, 0,1 0 0, 0,1 0, 0, 0,1 0,1

4 Histogram data as a particular case of modal symbolic descriptions [Bock and Diday (000) ] Histogram data is a model for representing the empirical distribution of a continuous variable Y partitioned into a set of contiguous I h intervals (bins) with associated h weights. A histogram is represented by a set of H ordered pairs (I h, h )} such that: Ihi yh; y h ; yh yh ; yh, yh h1,..., H ; max I min y y h h' I I hi h h h1,..., H h1,..., H h h' 0 h h1,..., H h 1 Y(i)=[([0-10],0.);([10-0],0.5);([0-30],0.1); ([30-40],0.)] 0,6 0,5 0,4 0,3 0, 0,1 0 0,5 0, 0, 0,

5 Comparison of two histogram data How do compare two units described by two histograms? A possibility is to use metrics developed for comparing probability distributions 0,5 0,4 0,4 0,45 0,3 0,3 0,3 0, 0,1 0, 0,1 0,15 0,

6 Metrics used for the evaluation of the convergence of two probability measures (Gibbs and Su, 00) Given: - a domain on which is possible to define a Borel -algebra, - two measure, - the density functions f and g - the corresponding distribution functions F and G and - a subdominant measure, like: =(+)/, Gibbs and Su (00) present a review of the most used dissimilarities:

7 A suitable measure to compute the distance between histograms: Wasserstein-Kantorovich metric we propose to use the Wasserstein-Kantorovich metric: W 1 1 1, ( ) ( ) d F t G t dt 0 in particular the derived l Mallow s distance between two quantile functions W, ( ) ( ) d F t G t dt 0 The main difficulties to compute this distance is the analytical definition of the quantile function But in our case we treat especially with histogram data), indeed

8 Histograms are locally uniform Having represented an histogram Y(i) as {(I 1, 1 ),,(I h, h ),, (I m, H )} (where H is the number of intervals of the support), we may define: the empirical density function f(y) as: h f( y) if y [ yh; yh) f y h h yh 0 otherwise h the empirical distribution function w h h k k 1 0 if h 0 if h1,..., m hence the empirical quantile function (The inverse of the distribution function) y h y wh wh 1 F( y) wh y y iff y y y h h y y F( y) h t h h h 1 h1 F t y y h h y w h h t wh 1 wh wh 1 t w ( ) where 0 1 F 1 () t

9 Geometric interpretation The comparison of two quantile functions associated with two histograms requires the partition of the support of the two qf s into m intervals with associated uniform density 1,00 0,95 0,90 0,85 0,9 t=0.90 0,9 0,80 0,75 0,70 0,7 t=0.70 0,65 0,60 0,6 0,55 t=0.60 0,50 0,45 3 0,40 0,35 t=0.30 0,30 0,3 0,5 0,0 0,15 0,10 t=0.15 0,15 0,05 0, I 11 I 1 10 I 31 I0 41 I I I 1 70I I 3 80I 4 I 5 90 I

10 l l Ili yli, y li I y, y lj lj lj d W(I il ;I jl ) l d W(I il ;I jl ) [0; 5] [60,70] [ 5;10] [70;73.3] [10; 16.6] [73.3;80] [16.6;0] [80;83.3] [0;30] [83.3;90] [30;40] [90;100] d Y Y d I I c c r r m m (, ) (, ) ( ) ( ) W i j l li lj l li lj li lj l1 l1 3 With y y y y Ili yli, y li ; cli ; rli li li li li center and radius of I li Verde, Irpino, COMPSTAT006

11 Multivariate distance Assuming independence among p variables we propose the an extension of d W distance to the multivariate case p=

12 Average histogram based on distance d W Wasserstein The barycenter (average) histogram Y(b) of a set of histogram data Y(i) (i=1,, n) can be computed minimizing the Sum of Square n Distances f ( Y( b*)) dw ( Y( i), Y( b)) i1 (like for a cloud of points in classical data analysis) n m 1 f ( Y( b*)) ( c c ) ( r r ) i1 l1 3 l li lb li lb That is minimized when the usual first order conditions are satisfied: n f c c 0 clb f r lb 3 l li lb n n i1 1 1 clb n li ; n c rlb n rli i1 i1 lrli rlb 0 i Y( b) c r ; c r, ;...; c r ; c r, ;...; c r ; c r, b b b b lb lb lb lb l mb mb mb mb m

13 Variance of the set of histograms Y(i) Determined Y(b) through the minimization problem n * arg min W ( ( ), ( )) i1 b d Y i Y b The variance of the set of histogram data is: n m l ( cli clb ) ( rli rlb ) i1 l1 3 1

14 Clustering methods for Histogram data based on Wasserstein distance Dynamic Clustering Algorithm Hierarchical method (Ward criterion)

15 Dynamic Clustering Algorithm general schema of the Nuées Dynamiques (Diday 197) The algorithm aims to obtain a partition P of a set E of symbolic data in k clusters and a set L of k prototypes {G 1,,G i,,g k } that best represent the clusters {C 1,,C i,,c k } of the partition P The algorithm optimizes a criterion D of fitting between prototypes and clusters: * * D ( P, L ) Min D( P, L) P P, L k L k where : P k is the set of all the partitions of E into k clusters and L k is the set of k prototypes. The algorithm executes alternatively: an allocation step and a representation step In this case E is a set of histogram data

16 Wasserstein distance for Clustering data according to Dynamic Clustering algorithm classical schema (a) Initialization k prototypes Y(b 1 ),...,Y(b K ) of L are randomly chosen (b) Allocation step For each histogram Y(i) of E the allocation index l to the clusters is computed and Y(i) is assigned to the cluster C l where: (c) Representation step For each cluster C k is identified the prototype Y(b k ) of L that minimizes D ( C, Y( b )) d ( Y( i), Y( b )) k k W k ic k l arg min d ( Y( i), Y( b )) k1,...,k W k (b) and (c) are repeated until the convergence Wasserstein l distance Histogram prototype/ barycenter of histograms belonging to the cluster C k

17 Property of Wasserstein distance: Inertia decomposition Being the prototype the barycenter and Euclidean distance, then d W is a squared is the inertia for grouped data Then, according to the Huygens theorem TI can be decomposed in Within and Between inertia

18 Some results Application on US monthly temperatures data set We have considered a dataset constituted by the Monthly Average Temperatures recorded in the 48 states of US from 1895 to 004 (Hawaii and Alaska are not present in the dataset). The analysis consists of the following three steps: 1. Representation of the distributions of temperatures of each State for each month by means of histograms;. Computing of the distance matrix using d W; 3. DCA procedure to find the best partition P 4. Calinski Harabaz index is computed to compute the optimal number k of clusters 5. Hierarchical clustering procedure based on the Ward criterion The original dataset is freely available at the National Climatic Data Center website

19 Histogram barycenter (prototype) Montana February Florida Barycenter (prototype)

20 Experimental results Dynamic Clastering algorithm results: The first 5 columns represent the Quality Partition index (min, max, means, median) on 100 iterations the last one the best value of Calinski-Harabaz index

21 Hierarchical clustering according to the Ward criterion - based on Wasserstein distance For example, we can apply the Ward criterion for a hierarchical clustering of the set E of histogram data: the procedure joins those two classes which minimize

22 Hierarchical tree

23 Final results and coloured map

24 Regression model for histogram variables based on Wasserstein distance

25 A Regression model for histogram data Data = Model Fit + Residual Linear regression is a general method for estimating/describing association between a continuous outcome variable (dependent) and one or multiple predictors in one equation. Easy conceptual task with classic data But what does it means when dealing with histogram data? Billard, Diday, IFCS 006 Verde, Irpino, COMPSTAT 010; CLADAG 011 Dias, Brito, ISI 011

26 Regression with histograms variables: a proposal in SDA framework A solution was given by Billard and Diday (006) The model fit a linear regression line throught the mixture of the n bivariate distributions Given a punctual value of X it is possible to predict the punctual value of Y

27 Linear Regression Model for histogram data: our approach involves SDA as well as Functional Data Analysis (Verde, Irpino, 010) Given a histogram variable X, we search for a linear trasformation of X which allows us to predict the histogram variable Y For example: given the histogram of the temperatures observed in a region during a month, is it possible to predict the distribution of the temperature of another month using a linear transformation of the histogram variable? A histogram by a histogram

28 Multiple regression model for quantile functions Our concurrent multiple regression model is: in matrix notation: p y ( t) b b x ( t) e ( t) i 0 j ij i j1 Quantile functions associated to histogram/ distribution data Y( t) X ( t) b e( t) This formulation is analogous to the functional linear model (Ramsay, Silverman, 003) except for the constants b parameters and for the functions y i (t), x ij (t) which are quantile functions while each e i (t) is a residual function (distribution?) for all i=1,, n.

29 Parameters estimation - LS method using Wasserstein distance According to the nature of the variables, for the parameters estimation, we propose to extend the Least Squares principle to the functional case using a typical metric between quantile functions: W 0 i j i j d x,x F ( t ) F ( t ) dt Wasserstein l distance between two quantile functions

30 Interpretative decomposition of the distance in three components related to the location size and shape parameters W ( i, j ) : Fi ( ) Fj ( ) 0 d x x t t dt x x (1 ( x, x )) i j i j i j Location Size Shape i j In general case of distributions If the two distributions have the same shape: d ( x, x ) : x x W i j i j i j Location Size If they have the same size and shape: d ( x, x ) : x x W i j i j Location

31 Notations x t F t 1 i( ) i ( ) quantile function of x i Mean and variance of the quantile function: 1 xi xi() t dt 0 n n x( t) xi( t) t [0,1]; x xi( t) dt x( t) dt n n i1 i x ( ) ( ) i xi t dt xi xi t dt x x i i 0 0 Correlation between quantile functions (x i, x j ) 1 x ( t) x ( t) dt i j i j 1 0 ( xi, x j ) xi ( t) x j ( t) dt ( xi, x j ) xi x x j ix j x x 0 i j x x

32 If data are histograms the mean, variance and correlation QQ plot formulae are: Empirical quantile function Mean and variance of the quantile function: 1 x x ( t) dt x c i i i l il 0 l Correlation between quantile functions (x i, x j ) 1 1 x () i xi t dt xi x i l cil ril x i 3 0 l 1 x ( t) x ( t) dt x x 1 c c r r xx i j i j l il jl il jl i j 0 l 3 ( xi, xj ) ( xi, x j ) x x i j i j

33 Interpretation of the Linear Regression model The regression model is here proposed to find the best linear transformation of the X j (t) s in order to predict Y(t) y i (t) = b 0 + b 1 x 1i (t) + + b p x pi (t) + e i (t) y i (t), x 1i (t),, x pi (t) are quantile functions and the estimated response variable yˆ i () t is again a quantile function = linear combination of quantile functions - according to the aim of symbolic data analysis: Input Symbolic data Numerical/Symbolic method Output Symbolic data (same nature of the input data)

34 Fitting linear regression model Find a linear transformation of the quantile functions of x ij (for j=1,,p) in order to predict the quantile function of y i i.e.: yˆ ( t) b b x ( t) t [0, 1] i 0 j ij j1 p It is worth of noting the linear transformation is unique: the parameters b 0 and b j are estimated for all the x ij and y i distributions A first problem: Only if b j > 0 a quantile function yˆ i () t can be derived. In order to overcome this problem, we propose a solution based on the decomposition of the Wasserstein distance and on the NNLS algorithm.

35 Solution The quantile function can be decomposed as: x c ( t) x x ( t) where ij ij ij c x ( t) x ( t) x is the centered quantile function ij ij ij Then, we propose the following regression model: p p c i b0 b j ij g j ij i j1 j1 y ( t) x x ( t) ( t) 0 t 1 yˆ () t i Using the Wasserstein distance it is possible to set up a OLS method that returns the two sets of coefficients (b 0,b j ; g j ).

36 The error term: a property of the Wasserstein distance decomposition The squared error can be written according to the two components 1 ˆ ˆ i W i, y i i yi 0 e d ( y ) y ( t) ( t) dt ˆ c c y y d ( y, yˆ ) i i W i i

37 Least Squares parameters estimation n n arg min f ( b, ) ( ), ˆ j g j ei () t dw yi t yi ( t) bj, g j i1 i1 n 1 p p c c SSE f ( b j, g j ) yi yi ( t) b0 b jxij g ijxi ( t) dt i1 0 j1 j1 Matrix notation: 1 c c SSE Y Y ( t) X X ( t) dt 0

38 The estimated parameters x y ny x ( x, y ) ˆ b y ˆ b x; ˆ b ; ˆ g n Correlation between quantile functions x i i i i i xi y i1 i1 n 1 n xi nx xi i1 i1 y i n i It is easy to see that: ˆ b, ˆ b and ˆ g

39 Multiple regression estimated parameters According to the properties of the Wasserstein distance between 1 c c quantile functions, the elements of the product matrix X ( t)' X ( t) computed according to our definition of a inner product operator between two quantile functions x ij and x i k : Then, the parameters are estimated under the constraint j (j=1,, p) using the NNLS (Lawson, Hanson, 1974). 1 ˆ X ' X ' X Y ˆ X ( t)' X ( t) dt X ( t)' Y ( t) dt 1 0 x ( t) x ( t) dt ( x, x ) ij i' k ij i' k x x c c c c 0 0 X c otherwise ij X c i ' k 0 if g 0 j ˆ g 0 it is a transformed matrix by NNLS

40 Interpretation of the parameters Regression parameters for the distribution mean locations ˆ b ˆ,..., ˆ 0, b1 b p Shrinking factors for the variability ˆ 1,..., ˆp g g y > 1 (< 1) the ˆi histogram has a greater (smaller) variability than the x ij histogram.

41 Tools for the interpretation The sum of squares of Y is n n 1 ( ) W i ( ), ( ) i ( ) ( ) i1 i1 0 SS Y d y t y t y t y t dt In classical regression model, the SS(Y) is constituted by: SS Error +SS Regression

42 Decomposition of SS(Y) Being: we obtain p p c i b0 b j ij g j ij j1 j1 yˆ ( t) x x ( t) n n 1 ( ) W i ( ), ( ) i( ) i( ) i1 i1 0 ˆ SS Y d y t y t y t y t dt n 1 1 i i1 0 0 i1 Error y( t) yˆ ( t) dt n y( t) e ( t) dt SS Regression n 1 e ( t) ( y ( ) ˆ i t yi( t)) t [0,1] n SS Average error function Bias

43 The bias The bias is due to different shapes of distributions: p c bias n ˆ y g j( x j( t), y( t)) c x y j j1 Correlation between the average quantile functions bias=0 when all the histograms data have the same shape That represents the incapacity of the linear transformation of fitting distributions that are very different in shape

44 A measure of fitting Pseudo R Considering that n 1 1 n c c ˆ ˆ Regression i i i i1 i1 0 0 SS y y y ( t) y ( t) dt y( t) e( t) dt We propose the following pseudo R PseudoR SS min max 0;1 Error ;1. SS( Y)

45 An application on the Ozone concentration in 78 USA sites ( Hourly monitoring of Ozone ground-level concentration and other meteorological variables

46 Forecasting OZONE levels Ozone is a gas that can cause respiratory deseases. In the literature there exists studies that relates the OZONE level to the Temperature, the Wind speed and the Solar radiation. Given the distribution of TEMPERATURE (C ), the distribution of WIND SPEED (meters per second) and the distribution of SOLAR RADIATION (watt per square meter), the main objective is to predict the distribution of OZONE CONCENTRATION (particles per billion) using a linear model. We have chosen as period of observation the Summer seasons of 010 and the central hours of the days (10 a.m. 5 p.m.). The model is: OZONEt ( ) b b TEMP b WSPEED b SORAD g TEMP ( t) g WSPEED ( t) g SORAD ( t) e () t c c c 1 3

47 The estimated model OZONE() t Wasserstein based LS (WASS-LS) (the current proposal) In parentesis the 95% Bootstrap C.I TEMP WSPEED SORAD -1.4; ; ,8;,34 0,049;0,09 c c c TEMP ( t) WSPEED ( t) SORAD ( t) 0,49;1,36 1,08;3,0 0,0118;0,04 Billard (006) regression (SREG) OZONE 6, , 358TEMP 1, 555 WSPEED 0, 0086 SORAD [19.17;34.74] [ 0.03;0.51] [0.78;3.5] [0.005;0.0117]

48 Diagnostics Diagnostics WASS- LS SREG Rsquare NA 0.08 Pseudo Rsquare * (Dias-Brito 011) 0.74 NA RMSE_W ,93* RMSE_L * RMSE. n 1 ˆ L n d Fi i i1 d F ( y) Fˆ ( y) dy i i ( y), F ( y) n 1i n 1i W W d yˆ ( t), Y d y ( t), Y i i n 1 ˆ W W i t i i1 n 1 where Y n yi i1 RMSE n d y ( ), y ( t) * The Billard model does not allow to compute directly a distribution. Given a set of input distributions, a Montecarlo experiment is needed to compute an output distribution.

49 Main references BILLARD, L. and DIDAY, E. (006): Symbolic Data Analysis: Conceptual Statistics and Data Mining. Wiley Series in Computational Statistics. John Wiley & Sons. BOCK, H.H. and DIDAY, E. (000): Analysis of Symbolic Data, Exploratory methods for extracting statistical information from complex data. Studies in Classification, Data Analysis and Knowledge Organisation, Springer-Verlag. CUESTA-ALBERTOS, J.A., MATRAN, C., TUERO-DIAZ, A. (1997): Optimal transportation plans and convergence in distribution. Journ. of Multiv. An., 60, GIBBS, A.L. and SU, F.E. (00): On choosing and bounding probability metrics. Intl. Stat. Rev. 7 (3), IRPINO, A., LECHEVALLIER, Y. and VERDE, R. (006): Dynamic clustering of histograms using Wasserstein metric. In: Rizzi, A., Vichi, M. (eds.) COMPSTAT 006. Physica-Verlag, Berlin, VERDE, R. and IRPINO, A.(008): Comparing Histogram data using a Mahalanobis Wasserstein distance. In: Brito, P. (eds.) COMPSTAT 008. Physica Verlag, Springer, Berlin, VERDE, R. and IRPINO, A.(010): Ordinary Least Squares for Histogram Data based on Wasserstein Distance COMPSTAT 010 DIAS, S. and BRITO P.,(011) A new linear regression model for histogram-valued variables, ISI 011

50 Thank you Rosanna Verde Antonio Irpino

51

52 ( X i, X ) j l c c 1 r r 3 s s xx l il jl il jl i j i j

Histogram data analysis based on Wasserstein distance

Histogram data analysis based on Wasserstein distance Histogram data analysis based on Wasserstein distance Rosanna Verde Antonio Irpino Department of European and Mediterranean Studies Second University of Naples Caserta - ITALY SYMPOSIUM ON LEARNING AND

More information

How to cope with modelling and privacy concerns? A regression model and a visualization tool for aggregated data

How to cope with modelling and privacy concerns? A regression model and a visualization tool for aggregated data for aggregated data Rosanna Verde (rosanna.verde@unina2.it) Antonio Irpino (antonio.irpino@unina2.it) Dominique Desbois (desbois@agroparistech.fr) Second University of Naples Dept. of Political Sciences

More information

Order statistics for histogram data and a box plot visualization tool

Order statistics for histogram data and a box plot visualization tool Order statistics for histogram data and a box plot visualization tool Rosanna Verde, Antonio Balzanella, Antonio Irpino Second University of Naples, Caserta, Italy rosanna.verde@unina.it, antonio.balzanella@unina.it,

More information

A new linear regression model for histogram-valued variables

A new linear regression model for histogram-valued variables Int. Statistical Inst.: Proc. 58th World Statistical Congress, 011, Dublin (Session CPS077) p.5853 A new linear regression model for histogram-valued variables Dias, Sónia Instituto Politécnico Viana do

More information

Distributions are the numbers of today From histogram data to distributional data. Javier Arroyo Gallardo Universidad Complutense de Madrid

Distributions are the numbers of today From histogram data to distributional data. Javier Arroyo Gallardo Universidad Complutense de Madrid Distributions are the numbers of today From histogram data to distributional data Javier Arroyo Gallardo Universidad Complutense de Madrid Introduction 2 Symbolic data Symbolic data was introduced by Edwin

More information

Mallows L 2 Distance in Some Multivariate Methods and its Application to Histogram-Type Data

Mallows L 2 Distance in Some Multivariate Methods and its Application to Histogram-Type Data Metodološki zvezki, Vol. 9, No. 2, 212, 17-118 Mallows L 2 Distance in Some Multivariate Methods and its Application to Histogram-Type Data Katarina Košmelj 1 and Lynne Billard 2 Abstract Mallows L 2 distance

More information

Proximity Measures for Data Described By Histograms Misure di prossimità per dati descritti da istogrammi Antonio Irpino 1, Yves Lechevallier 2

Proximity Measures for Data Described By Histograms Misure di prossimità per dati descritti da istogrammi Antonio Irpino 1, Yves Lechevallier 2 Proximity Measures for Data Described By Histograms Misure di prossimità per dati descritti da istogrammi Antonio Irpino 1, Yves Lechevallier 2 1 Dipartimento di Studi Europei e Mediterranei, Seconda Università

More information

Modelling and Analysing Interval Data

Modelling and Analysing Interval Data Modelling and Analysing Interval Data Paula Brito Faculdade de Economia/NIAAD-LIACC, Universidade do Porto Rua Dr. Roberto Frias, 4200-464 Porto, Portugal mpbrito@fep.up.pt Abstract. In this paper we discuss

More information

New Clustering methods for interval data

New Clustering methods for interval data New Clustering methods for interval data Marie Chavent, Francisco de A.T. De Carvahlo, Yves Lechevallier, Rosanna Verde To cite this version: Marie Chavent, Francisco de A.T. De Carvahlo, Yves Lechevallier,

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

Linear Regression Model with Histogram-Valued Variables

Linear Regression Model with Histogram-Valued Variables Linear Regression Model with Histogram-Valued Variables Sónia Dias 1 and Paula Brito 1 INESC TEC - INESC Technology and Science and ESTG/IPVC - School of Technology and Management, Polytechnic Institute

More information

Discriminant Analysis for Interval Data

Discriminant Analysis for Interval Data Outline Discriminant Analysis for Interval Data Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account

More information

A Resampling Approach for Interval-Valued Data Regression

A Resampling Approach for Interval-Valued Data Regression A Resampling Approach for Interval-Valued Data Regression Jeongyoun Ahn, Muliang Peng, Cheolwoo Park Department of Statistics, University of Georgia, Athens, GA, 30602, USA Yongho Jeon Department of Applied

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Forecasting Complex Time Series: Beanplot Time Series

Forecasting Complex Time Series: Beanplot Time Series COMPSTAT 2010 19 International Conference on Computational Statistics Paris-France, August 22-27 Forecasting Complex Time Series: Beanplot Time Series Carlo Drago and Germana Scepi Dipartimento di Matematica

More information

On central tendency and dispersion measures for intervals and hypercubes

On central tendency and dispersion measures for intervals and hypercubes On central tendency and dispersion measures for intervals and hypercubes Marie Chavent, Jérôme Saracco To cite this version: Marie Chavent, Jérôme Saracco. On central tendency and dispersion measures for

More information

Co-clustering algorithms for histogram data

Co-clustering algorithms for histogram data Co-clustering algorithms for histogram data Algoritmi di Co-clustering per dati ad istogramma Francisco de A.T. De Carvalho and Antonio Balzanella and Antonio Irpino and Rosanna Verde Abstract One of the

More information

Generalized Ward and Related Clustering Problems

Generalized Ward and Related Clustering Problems Generalized Ward and Related Clustering Problems Vladimir BATAGELJ Department of mathematics, Edvard Kardelj University, Jadranska 9, 6 000 Ljubljana, Yugoslavia Abstract In the paper an attempt to legalize

More information

Clustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University

Clustering and Model Integration under the Wasserstein Metric. Jia Li Department of Statistics Penn State University Clustering and Model Integration under the Wasserstein Metric Jia Li Department of Statistics Penn State University Clustering Data represented by vectors or pairwise distances. Methods Top- down approaches

More information

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1

Lecture 2 Simple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: Chapter 1 Lecture Simple Linear Regression STAT 51 Spring 011 Background Reading KNNL: Chapter 1-1 Topic Overview This topic we will cover: Regression Terminology Simple Linear Regression with a single predictor

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Lasso-based linear regression for interval-valued data

Lasso-based linear regression for interval-valued data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS065) p.5576 Lasso-based linear regression for interval-valued data Giordani, Paolo Sapienza University of Rome, Department

More information

arxiv: v1 [stat.ml] 23 Jul 2015

arxiv: v1 [stat.ml] 23 Jul 2015 CLUSTERING OF MODAL VALUED SYMBOLIC DATA VLADIMIR BATAGELJ, NATAŠA KEJŽAR, AND SIMONA KORENJAK-ČERNE UNIVERSITY OF LJUBLJANA, SLOVENIA arxiv:157.6683v1 [stat.ml] 23 Jul 215 Abstract. Symbolic Data Analysis

More information

A Nonparametric Kernel Approach to Interval-Valued Data Analysis

A Nonparametric Kernel Approach to Interval-Valued Data Analysis A Nonparametric Kernel Approach to Interval-Valued Data Analysis Yongho Jeon Department of Applied Statistics, Yonsei University, Seoul, 120-749, Korea Jeongyoun Ahn, Cheolwoo Park Department of Statistics,

More information

Regression for Symbolic Data

Regression for Symbolic Data Symbolic Data Analysis: Taking Variability in Data into Account Regression for Symbolic Data Sónia Dias I.P. Viana do Castelo & LIAAD-INESC TEC, Univ. Porto Paula Brito Fac. Economia & LIAAD-INESC TEC,

More information

STAT 540: Data Analysis and Regression

STAT 540: Data Analysis and Regression STAT 540: Data Analysis and Regression Wen Zhou http://www.stat.colostate.edu/~riczw/ Email: riczw@stat.colostate.edu Department of Statistics Colorado State University Fall 205 W. Zhou (Colorado State

More information

Part 8: GLMs and Hierarchical LMs and GLMs

Part 8: GLMs and Hierarchical LMs and GLMs Part 8: GLMs and Hierarchical LMs and GLMs 1 Example: Song sparrow reproductive success Arcese et al., (1992) provide data on a sample from a population of 52 female song sparrows studied over the course

More information

Multiple Linear Regression for the Supervisor Data

Multiple Linear Regression for the Supervisor Data for the Supervisor Data Rating 40 50 60 70 80 90 40 50 60 70 50 60 70 80 90 40 60 80 40 60 80 Complaints Privileges 30 50 70 40 60 Learn Raises 50 70 50 70 90 Critical 40 50 60 70 80 30 40 50 60 70 80

More information

General Linear Model Introduction, Classes of Linear models and Estimation

General Linear Model Introduction, Classes of Linear models and Estimation Stat 740 General Linear Model Introduction, Classes of Linear models and Estimation An aim of scientific enquiry: To describe or to discover relationshis among events (variables) in the controlled (laboratory)

More information

Simple Linear Regression for the Advertising Data

Simple Linear Regression for the Advertising Data Revenue 0 10 20 30 40 50 5 10 15 20 25 Pages of Advertising Simple Linear Regression for the Advertising Data What do we do with the data? y i = Revenue of i th Issue x i = Pages of Advertisement in i

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( )

Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil ( ) Trends in the Relative Distribution of Wages by Gender and Cohorts in Brazil (1981-2005) Ana Maria Hermeto Camilo de Oliveira Affiliation: CEDEPLAR/UFMG Address: Av. Antônio Carlos, 6627 FACE/UFMG Belo

More information

Finding Multiple Outliers from Multidimensional Data using Multiple Regression

Finding Multiple Outliers from Multidimensional Data using Multiple Regression Finding Multiple Outliers from Multidimensional Data using Multiple Regression L. Sunitha 1*, Dr M. Bal Raju 2 1* CSE Research Scholar, J N T U Hyderabad, Telangana (India) 2 Professor and principal, K

More information

Probability Models for Bayesian Recognition

Probability Models for Bayesian Recognition Intelligent Systems: Reasoning and Recognition James L. Crowley ENSIAG / osig Second Semester 06/07 Lesson 9 0 arch 07 Probability odels for Bayesian Recognition Notation... Supervised Learning for Bayesian

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Part 6: Multivariate Normal and Linear Models

Part 6: Multivariate Normal and Linear Models Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Robust model selection criteria for robust S and LT S estimators

Robust model selection criteria for robust S and LT S estimators Hacettepe Journal of Mathematics and Statistics Volume 45 (1) (2016), 153 164 Robust model selection criteria for robust S and LT S estimators Meral Çetin Abstract Outliers and multi-collinearity often

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

Autocorrelation function of the daily histogram time series of SP500 intradaily returns

Autocorrelation function of the daily histogram time series of SP500 intradaily returns Autocorrelation function of the daily histogram time series of SP5 intradaily returns Gloria González-Rivera University of California, Riverside Department of Economics Riverside, CA 9252 Javier Arroyo

More information

STAT Chapter 11: Regression

STAT Chapter 11: Regression STAT 515 -- Chapter 11: Regression Mostly we have studied the behavior of a single random variable. Often, however, we gather data on two random variables. We wish to determine: Is there a relationship

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION

OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION OPTIMAL TRANSPORTATION PLANS AND CONVERGENCE IN DISTRIBUTION J.A. Cuesta-Albertos 1, C. Matrán 2 and A. Tuero-Díaz 1 1 Departamento de Matemáticas, Estadística y Computación. Universidad de Cantabria.

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

Accounting for measurement uncertainties in industrial data analysis

Accounting for measurement uncertainties in industrial data analysis Accounting for measurement uncertainties in industrial data analysis Marco S. Reis * ; Pedro M. Saraiva GEPSI-PSE Group, Department of Chemical Engineering, University of Coimbra Pólo II Pinhal de Marrocos,

More information

Verification of Continuous Forecasts

Verification of Continuous Forecasts Verification of Continuous Forecasts Presented by Barbara Brown Including contributions by Tressa Fowler, Barbara Casati, Laurence Wilson, and others Exploratory methods Scatter plots Discrimination plots

More information

Simple Linear Regression for the Climate Data

Simple Linear Regression for the Climate Data Prediction Prediction Interval Temperature 0.2 0.0 0.2 0.4 0.6 0.8 320 340 360 380 CO 2 Simple Linear Regression for the Climate Data What do we do with the data? y i = Temperature of i th Year x i =CO

More information

Importance of Input Data and Uncertainty Associated with Tuning Satellite to Ground Solar Irradiation

Importance of Input Data and Uncertainty Associated with Tuning Satellite to Ground Solar Irradiation Importance of Input Data and Uncertainty Associated with Tuning Satellite to Ground Solar Irradiation James Alfi 1, Alex Kubiniec 2, Ganesh Mani 1, James Christopherson 1, Yiping He 1, Juan Bosch 3 1 EDF

More information

Module Master Recherche Apprentissage et Fouille

Module Master Recherche Apprentissage et Fouille Module Master Recherche Apprentissage et Fouille Michele Sebag Balazs Kegl Antoine Cornuéjols http://tao.lri.fr 19 novembre 2008 Unsupervised Learning Clustering Data Streaming Application: Clustering

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Principal Component Analysis for Interval Data

Principal Component Analysis for Interval Data Outline Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account Outline Outline 1 Introduction to

More information

Lecture 10 Multiple Linear Regression

Lecture 10 Multiple Linear Regression Lecture 10 Multiple Linear Regression STAT 512 Spring 2011 Background Reading KNNL: 6.1-6.5 10-1 Topic Overview Multiple Linear Regression Model 10-2 Data for Multiple Regression Y i is the response variable

More information

Gov 2002: 5. Matching

Gov 2002: 5. Matching Gov 2002: 5. Matching Matthew Blackwell October 1, 2015 Where are we? Where are we going? Discussed randomized experiments, started talking about observational data. Last week: no unmeasured confounders

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Support Vector Method for Multivariate Density Estimation

Support Vector Method for Multivariate Density Estimation Support Vector Method for Multivariate Density Estimation Vladimir N. Vapnik Royal Halloway College and AT &T Labs, 100 Schultz Dr. Red Bank, NJ 07701 vlad@research.att.com Sayan Mukherjee CBCL, MIT E25-201

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

New models for symbolic data analysis

New models for symbolic data analysis New models for symbolic data analysis B. Beranger, H. Lin and S. A. Sisson arxiv:1809.03659v1 [stat.co] 11 Sep 2018 Abstract Symbolic data analysis (SDA) is an emerging area of statistics based on aggregating

More information

Gaussian Mixture Model

Gaussian Mixture Model Case Study : Document Retrieval MAP EM, Latent Dirichlet Allocation, Gibbs Sampling Machine Learning/Statistics for Big Data CSE599C/STAT59, University of Washington Emily Fox 0 Emily Fox February 5 th,

More information

Investigation of Monthly Pan Evaporation in Turkey with Geostatistical Technique

Investigation of Monthly Pan Evaporation in Turkey with Geostatistical Technique Investigation of Monthly Pan Evaporation in Turkey with Geostatistical Technique Hatice Çitakoğlu 1, Murat Çobaner 1, Tefaruk Haktanir 1, 1 Department of Civil Engineering, Erciyes University, Kayseri,

More information

Bayesian spatial quantile regression

Bayesian spatial quantile regression Brian J. Reich and Montserrat Fuentes North Carolina State University and David B. Dunson Duke University E-mail:reich@stat.ncsu.edu Tropospheric ozone Tropospheric ozone has been linked with several adverse

More information

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD

Least Squares. Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 175A Winter UCSD Least Squares Ken Kreutz-Delgado (Nuno Vasconcelos) ECE 75A Winter 0 - UCSD (Unweighted) Least Squares Assume linearity in the unnown, deterministic model parameters Scalar, additive noise model: y f (

More information

Probabilistic forecasting of solar radiation

Probabilistic forecasting of solar radiation Probabilistic forecasting of solar radiation Dr Adrian Grantham School of Information Technology and Mathematical Sciences School of Engineering 7 September 2017 Acknowledgements Funding: Collaborators:

More information

Applied Machine Learning Annalisa Marsico

Applied Machine Learning Annalisa Marsico Applied Machine Learning Annalisa Marsico OWL RNA Bionformatics group Max Planck Institute for Molecular Genetics Free University of Berlin 22 April, SoSe 2015 Goals Feature Selection rather than Feature

More information

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves

Chapter 3. Riemannian Manifolds - I. The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves Chapter 3 Riemannian Manifolds - I The subject of this thesis is to extend the combinatorial curve reconstruction approach to curves embedded in Riemannian manifolds. A Riemannian manifold is an abstraction

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

STAT 350: Geometry of Least Squares

STAT 350: Geometry of Least Squares The Geometry of Least Squares Mathematical Basics Inner / dot product: a and b column vectors a b = a T b = a i b i a b a T b = 0 Matrix Product: A is r s B is s t (AB) rt = s A rs B st Partitioned Matrices

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Data Analytics for Solar Energy Management

Data Analytics for Solar Energy Management Data Analytics for Solar Energy Management Lipyeow Lim1, Duane Stevens2, Sen Chiao3, Christopher Foo1, Anthony Chang2, Todd Taomae1, Carlos Andrade1, Neha Gupta1, Gabriella Santillan2, Michael Gonzalves2,

More information

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b)

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b) REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b) Explain/Obtain the variance-covariance matrix of Both in the bivariate case (two regressors) and

More information

of the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated

of the 7 stations. In case the number of daily ozone maxima in a month is less than 15, the corresponding monthly mean was not computed, being treated Spatial Trends and Spatial Extremes in South Korean Ozone Seokhoon Yun University of Suwon, Department of Applied Statistics Suwon, Kyonggi-do 445-74 South Korea syun@mail.suwon.ac.kr Richard L. Smith

More information

Analytics for an Online Retailer: Demand Forecasting and Price Optimization

Analytics for an Online Retailer: Demand Forecasting and Price Optimization Analytics for an Online Retailer: Demand Forecasting and Price Optimization Kris Johnson Ferreira Technology and Operations Management Unit, Harvard Business School, kferreira@hbs.edu Bin Hong Alex Lee

More information

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors:

statistical sense, from the distributions of the xs. The model may now be generalized to the case of k regressors: Wooldridge, Introductory Econometrics, d ed. Chapter 3: Multiple regression analysis: Estimation In multiple regression analysis, we extend the simple (two-variable) regression model to consider the possibility

More information

12 Discriminant Analysis

12 Discriminant Analysis 12 Discriminant Analysis Discriminant analysis is used in situations where the clusters are known a priori. The aim of discriminant analysis is to classify an observation, or several observations, into

More information

ECE 661: Homework 10 Fall 2014

ECE 661: Homework 10 Fall 2014 ECE 661: Homework 10 Fall 2014 This homework consists of the following two parts: (1) Face recognition with PCA and LDA for dimensionality reduction and the nearest-neighborhood rule for classification;

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Pattern Recognition of Multivariate Time Series using Wavelet

Pattern Recognition of Multivariate Time Series using Wavelet Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS070) p.6862 Pattern Recognition of Multivariate Time Series using Wavelet Features Maharaj, Elizabeth Ann Monash

More information

Stat 602 Exam 1 Spring 2017 (corrected version)

Stat 602 Exam 1 Spring 2017 (corrected version) Stat 602 Exam Spring 207 (corrected version) I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed This is a very long Exam. You surely won't be able to

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands

Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Asymptotic Multivariate Kriging Using Estimated Parameters with Bayesian Prediction Methods for Non-linear Predictands Elizabeth C. Mannshardt-Shamseldin Advisor: Richard L. Smith Duke University Department

More information

the logic of parametric tests

the logic of parametric tests the logic of parametric tests define the test statistic (e.g. mean) compare the observed test statistic to a distribution calculated for random samples that are drawn from a single (normal) distribution.

More information

Chapter 11 Building the Regression Model II:

Chapter 11 Building the Regression Model II: Chapter 11 Building the Regression Model II: Remedial Measures( 補救措施 ) 許湘伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 11 1 / 48 11.1 WLS remedial measures may

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

Chapter 11 Estimation of Absolute Performance

Chapter 11 Estimation of Absolute Performance Chapter 11 Estimation of Absolute Performance Purpose Objective: Estimate system performance via simulation If q is the system performance, the precision of the estimator qˆ can be measured by: The standard

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Andrew O. Finley 1 and Sudipto Banerjee 2 1 Department of Forestry & Department of Geography, Michigan

More information

Descriptive Statistics for Symbolic Data

Descriptive Statistics for Symbolic Data Outline Descriptive Statistics for Symbolic Data Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account

More information

NATCOR Regression Modelling for Time Series

NATCOR Regression Modelling for Time Series Universität Hamburg Institut für Wirtschaftsinformatik Prof. Dr. D.B. Preßmar Professor Robert Fildes NATCOR Regression Modelling for Time Series The material presented has been developed with the substantial

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

Generalization of the Principal Components Analysis to Histogram Data

Generalization of the Principal Components Analysis to Histogram Data Generalization of the Principal Components Analysis to Histogram Data Oldemar Rodríguez 1, Edwin Diday 1, and Suzanne Winsberg 2 1 University Paris 9 Dauphine, Ceremade Pl Du Ml de L de Tassigny 75016

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Alan Gelfand 1 and Andrew O. Finley 2 1 Department of Statistical Science, Duke University, Durham, North

More information

More on Unsupervised Learning

More on Unsupervised Learning More on Unsupervised Learning Two types of problems are to find association rules for occurrences in common in observations (market basket analysis), and finding the groups of values of observational data

More information

Hierarchical Modelling for Multivariate Spatial Data

Hierarchical Modelling for Multivariate Spatial Data Hierarchical Modelling for Multivariate Spatial Data Geography 890, Hierarchical Bayesian Models for Environmental Spatial Data Analysis February 15, 2011 1 Point-referenced spatial data often come as

More information

Time Series Modeling of Histogram-valued Data The Daily Histogram Time Series of SP500 Intradaily Returns

Time Series Modeling of Histogram-valued Data The Daily Histogram Time Series of SP500 Intradaily Returns Time Series Modeling of Histogram-valued Data The Daily Histogram Time Series of SP5 Intradaily Returns Gloria González-Rivera University of California, Riverside Department of Economics Riverside, CA

More information

Spatial Statistics with Image Analysis. Lecture L11. Home assignment 3. Lecture 11. Johan Lindström. December 5, 2016.

Spatial Statistics with Image Analysis. Lecture L11. Home assignment 3. Lecture 11. Johan Lindström. December 5, 2016. HA3 MRF:s Simulation Estimation Spatial Statistics with Image Analysis Lecture 11 Johan Lindström December 5, 2016 Lecture L11 Johan Lindström - johanl@maths.lth.se FMSN20/MASM25 L11 1/22 HA3 MRF:s Simulation

More information

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL

TIME SERIES DATA PREDICTION OF NATURAL GAS CONSUMPTION USING ARIMA MODEL International Journal of Information Technology & Management Information System (IJITMIS) Volume 7, Issue 3, Sep-Dec-2016, pp. 01 07, Article ID: IJITMIS_07_03_001 Available online at http://www.iaeme.com/ijitmis/issues.asp?jtype=ijitmis&vtype=7&itype=3

More information

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes

Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Bayesian dynamic modeling for large space-time weather datasets using Gaussian predictive processes Sudipto Banerjee 1 and Andrew O. Finley 2 1 Biostatistics, School of Public Health, University of Minnesota,

More information