Dependencies in Interval-valued. valued Symbolic Data. Lynne Billard University of Georgia

Size: px
Start display at page:

Download "Dependencies in Interval-valued. valued Symbolic Data. Lynne Billard University of Georgia"

Transcription

1 Dependencies in Interval-valued valued Symbolic Data Lynne Billard University of Georgia Tribute to Professor Edwin Diday: Paris, France; 5 September 2007

2 Naturally occurring Symbolic Data -- Mushrooms

3 Patient Records Single Hospital, Cardiology Patient Hospital Age Smoker. Patient 1 Fontaines 74 heavy Patient 2 Fontaines 78 light Patient 3 Beaune 69 no Patient 4 Beaune 73 heavy Patient 5 Beaune 80 light Patient 6 Fontaines 70 heavy Patient 7 Fontaines 82 heavy M M M M

4 Patient Records by Hospital -- aggregate over patients Result: Symbolic Data Patient Hospital Age Smoker Patient 1 Fontaines 74 heavy Patient 2 Fontaines 78 light Patient 3 Beaune 69 no Patient 4 Beaune 73 heavy Patient 5 Beaune 80 light Patient 6 Fontaines 70 heavy M M Patient 7 Fontaines 82 heavy M M Hospital Age Smoker Fontaines [70, 82] {light ¼, heavy ¾} Beaune [69, 80] {no, light, heavy} M M M

5 Histogram-valued Data -- Weight by Age Distribution:

6 Logical dependency rule E.g. Y 1 = age Y 2 = # children Classical: Y a = (10, 0), Y b = (20, 2), Y c = (18, 1) Aggregation Symbolic: ξ = (10, 20) (0, 1, 2) I.e., ξ implies classical Y d = (10, 2) is possible Need rule ν: {If Y 1 < 15, then Y 2 = 0}

7 Interval-valued data u Team Y 1 # At-Bats Y 2 # Hits u Team Y 1 # At-Bats Y 2 # Hits 1 (289, 538) (75, 162) 11 (212, 492) (57, 151) 2 (88, 422) (49, 149) 12 (177, 245) (189, 238) 3 (189, 223) (201, 254) 13 (342, 614) (121, 206) 4 (184, 476) (46, 148) 14 (120, 439) (35, 102) 5 (283, 447) (86, 115) 15 (80, 468) (55, 115) 6 (24, 26) (133, 141) 16 (75, 110) (75, 110) 7 (168, 445) (37, 135) 17 (116, 557) (95, 163) 8 (123, 148) (137, 148) 18 (197, 507) (52, 53) 9 (256, 510) (78, 124) 19 (167, 203) (48, 232) 10 (101, 126) (101, 132) ξ(2): Y 2 = 149 not possible when Y 1 < 149

8 Observation ξ(2) Y 2 Y 2 = αy R 1 R 4 R 2 R

9

10 Dependencies between Variables Interval-valued Variables E.g., Regression Analysis Dependent variable: Y = ( Y 1, L, Y q ), e.g., q=1 Predictor/regression variable: X = (X 1, L, X p ) Multiple regression model: Y = β 0 + β 1 X 1 + L + β p X p + e Error: e E(e)=0, Var(E) = σ 2, Cov(e i, e k )= 0, i k.

11 Multiple Regression Model: Y = β 0 + β 1 X 1 + L + β p X p + e In vector terms, Y = X β + e Observation matrix: Y 0 = (Y 1, L, Y n ) Design matrix: X = 1 X 11 X 1p... 1 X n1 X np Regression coefficient matrix: β 0 = (β 0, β 1, L, β p ) Error matrix: e 0 = (e 1, L, e n )

12 Model: Y = X β + e Least squares estimator of β is ˆβ = (X 0 X) -1 X 0 Y When p=1, ˆβ 1 = P ni=1 (X i X)(Y i Ȳ ) P ni=1 (X i X) 2 = Cov(X, Y ) Var(X), ˆβ 0 = Ȳ ˆβ X where Ȳ = 1 n nx i=1 Y i, X = 1 n nx i=1 X i.

13 Model: Y = β 0 + β 1 X 1 + L + β p X p + e Or, write as Y Ȳ = β 1 (X 1 X 1 )+...+ β p (X p X p )+e X j = 1 n nx i=1 X ij, j =1,...,p. β 0 Ȳ (β 1 X β p X p ) Then,

14 Y Ȳ = β 1 (X 1 X 1 )+...+ β p (X p X p )+e Least squares estimator of β is ˆβ =[(X X ) 0 (X X )] 1 (X X ) 0 (Y Ȳ ) where (X X) 0 (X X) = = Σ(X 1 X 1 ) 2. Σ(X 1 X 1 )(X p X p ). Σ(X p X p )(X 1 X 1 ) Σ(X p X p ) 2 = X i (X j1 X j1 )(X j2 X j2 ), j 1,j 2 =1,,p (X X) 0 (Y Ȳ )= X i (X j X j )(Y Ȳ ),j =1,,p

15 Interval-valued data: Yuj = [ auj, buj ], j= 1,..., p, u E= { w1,..., wu,... wm} Bertrand and Goupil (2000): Symbolic sample mean is 1 Yj = ( b a ), 2m uj + uj u E Symbolic sample variance is 1 1 S b b a a b a = ( ) [ ( )] j + + uj uj uj uj uj uj m u E 4m u E Notice, e.g., m = 1, Y = Weight Y 1 = [132, 138] Y 2 = [129, 141] 2 Y = 135, S = Y = 135, S =

16 Can rewrite 1 S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j = uj j uj j uj j uj j 3 m + + u E Then, by analogy, for j = 1,2, for interval-valued variables Y 1 and Y 2, empirical covariance function Cov(Y 1, Y 2 ) is 1 Cov( Y, Y ) [ ] 1/2 1 2 = G1G 2 Q1Q 2 3m u E 2 2 Qj = ( auj Yj) + ( auj Yj)( buj Yj) + ( buj Yj) G j 1, if Yuj Yj, = 1, if Yuj > Yj, Y = ( a + b )/2. uj uj uj Notice, special cases: (i) Cov( Y, Y ) S (ii) If a uj = b uj = y j, for all u, i.e., classical data, 1 Cov( Y1, Y2) = Σ( y1 Y1)( y2 Y2) m

17 Back to Bertrand and Goupil (2000) Sample variance is 1 1 S b b a a a b = ( ) [ ( )] j + + uj uj uj uj uj uj m u E 4m u E This is total variance. Take Total Sum of Squares = Total SS j = ms 2 j Then, we can show where Total SS = Within Objects SS + Between Objects SS j j j

18 Within Objects SS j Between Objects 1 S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j = uj j uj j uj j uj j 3 m + + u E 1 = [( ) + ( )( ) + ( ) ] 2 2 auj Yuj auj Yuj buj Yuj buj Yuj 3 u E SS = [( a + b ) / 2 Y ] j uj uj j u E with 1 Yuj = auj + buj Y j = auj + buj 2 m u E ( ) / 2, ( ). 2 Classical data: auj = buj = Yuj Within Objects SS j = 0

19 So, for Y j, we have Sum of Squares SS, Total SS = Within Objects SS + Between Objects SS j j j Likewise, for (Y i, Y j ), we have Sum of Products SP Total SPij = Within Objects SPij + Between Objects SPij

20 Can rewrite 1 S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j = uj j uj j uj j uj j 3 m + + u E Then, by analogy, for j = 1,2, for interval-valued variables Y 1 and Y 2, empirical covariance function Cov(Y 1, Y 2 ) is 1 Cov( Y, Y ) [ ] 1/2 1 2 = G1G 2 Q1Q 2 3m u E 2 2 Qj = ( auj Yj) + ( auj Yj)( buj Yj) + ( buj Yj) G j 1, if Yuj Yj, = 1, if Yuj > Yj, Y = ( a + b )/2. uj uj uj

21 Can rewrite 1 S [( a Y ) ( a Y )( b Y ) ( b Y ) ] j = uj j uj j uj j uj j 3 m + + u E Then, by analogy, for j = 1,2, for interval-valued variables Y 1 and Y 2, empirical covariance function Cov(Y 1, Y 2 ) is 1 Cov( Y, Y ) [ ] 1/2 1 2 = G1G 2 Q1Q 2 3m u E 2 2 Qj = ( auj Yj) + ( auj Yj)( buj Yj) + ( buj Yj) G j 1, if Yuj Yj, = 1, if Yuj > Yj, Y = ( a + b )/2. uj uj uj (Total)SP part can be replaced by Total SP = 1 6 X u 2(a Ȳ )(c X)+(a Ȳ )(d X)+(b Ȳ )(c X) +2(b Ȳ )(d X)

22 How is this obtained? Recall that for a Uniform distribution, Y S (a, b), Var(Y )= (b a)2 12 By analogy, we can show, for u=1,,m observations, Within SP = 1 12 Between SP = mx u=1 mx u=1 (a u b u )(c u d u ) µ au + b u 2 Ȳ 1 µ cu + d u 2 Ȳ 2 where Y u1 =[a u,b u ], Y u2 =[c u,d u ] mx µ au + b u mx µ cu + d u Ȳ 1 = 1 m u=1 2, Ȳ 2 = 1 m u=1 2

23 Within SP = 1 12 Between SP = mx u=1 mx u=1 (a u b u )(c u d u ) µ au + b u 2 Ȳ 1 µ cu + d u 2 Ȳ 2 Hence, from Total SP = Within SP + Between SP = 1 mx 2(au Ȳ 1 )(c Ȳ 2 )+(a Ȳ 1 )(d Ȳ 2 ) 6 u=1 +(b Ȳ 1 )(c Ȳ 2 )+2(b Ȳ 1 )(d Ȳ 2 )

24 Y X1 X2 Pulse Systolic Diastolic u Rate Pressure Pressure 1 [44, 68] [90, 110] [50, 70] 2 [60, 72] [90, 130] [70, 90] 3 [56, 90] [140, 180] [90, 100] 4 [70, 112] [110, 142] [80, 108] 5 [54, 72] [90, 100] [50, 70] 6 [70, 100] [134, 142] [80, 110] 7 [72, 100] [130, 160] [76, 90] 8 [76, 98] [110, 190] [70, 110] 9 [86, 96] [138, 180] [90, 110] 10 [86, 100] [110, 150] [78, 100] 11 [63, 75] [60, 100] [140, 150] Rule: X2 = Diastolic Pressure < Systolic Pressure = X1

25

26 The regression equation becomes, for Y = Pulse Rate, X1 = Systolic Pressure Y = X1 Ȳ = X = Std Devn(Y) = Std Devn(X1) = Cov(Y, X1) = rho(y, X1) = 0.725

27 Y = Pulse Rate, X1 = Systolic Pressure Y = X1 Prediction with Ŷ u =[â u1,ˆb u1 ] â u1 = a u2 ˆb u1 = b u2

28 Symbolic Prediction Equation

29 Symbolic Prediction Intervals

30 Symbolic Prediction Intervals and Equation

31 Original Intervals Prediction Intervals

32 Data Intervals:. Prediction Intervals:

33 Predicted Pulse Rates and Residuals Observed (Y, X1) Predicted Y Residuals u Pulse Rate Systolic âu ˆbu Resa Res b 1 [44,68] [90,100] [62.099, ] [ , 1.805] 2 [60,72] [90,130] [62.099, ] [-2.099, ] 3 [56,90] [140,180] [ , ] [ , ] 4 [70,112] [110,142] [70.292, ] [-0.292, ] 5 [54,72] [90,100] [62.099, ] [-8.099, 5.805] 6 [70,100] [130,160] [78.486, ] [-8.486, ] 7 [72,100] [130,160] [78.486, ] [-6.486, ] 8 [76,98] [110,190] [70.292, ] [5.708, ] 9 [86,96] [138,180] [81.763, ] [4.237, ] 10 [86,100] [110,150] [70.292, ] [15.708, ] Y u = Pulse Rate =[a u, b u ] Ŷ u = Predicted Pulse Rate =[â u, ˆb u ] Residual =[Resa, Res b ]

34 Sum of Residuals for Symbolic Fit Sum of Min Residuals Σ u Res au = Sum of Max Residuals Σ u Res bu = Sum of Squared Residuals for Symbolic Fit Sum of Min Squared Residuals = Sum of Max Squared Residuals =

35 Classical Regression on Midpoints Y c u =(a u1+b u1 )/2, X c ju =(a uj+b uj )/2, j =1, 2 Y c = X 1 Ŷ c =[â c,ˆb c ] â c u = a u2 ˆb u = b u2

36 Classical Regression through Midpoints

37 Symbolic Regression ---- Classical regression ----

38 Comparison of Regression Fits Sum of Residuals for Symbolic Fit Sum of Min Residuals = Sum of Max Residuals = Sum of Squared Residuals for Symbolic Fit Sum of Min Squared Residuals = Sum of Max Squared Residuals = Sum of Residuals for Classical Fit Sum of Min Residuals = Sum of Max Residuals = Sum of Squared Residuals for Classical Fit Sum of Min Squared Residuals = Sum of Max Squared Residuals =

39 Centers and Range Regression DeCarvalho, Lima Neto, Tenorio, Freire,... (2004, 2005, ) Midpoint: Y c = (a + b)/2, X c = (c + d)/2 Range: Y r = (b a)/2, X r = (d - c)/2 Ŷ c = X c Ŷ r = X r

40 Centers and Range Regression DeCarvalho, Lima Neto, Tenorio, Freire,... (2004, 2005, ) Midpoint: Y c = (a + b)/2, X c = (c + d)/2 Range: Y r = (b a)/2, X r = (d - c)/2 Ŷ c = X c Ŷ r = X r Ŷ c = X1 c Xr 1 Ŷ r = X1 c Xr 1

41 Centers and Range Regression --Predictions Obs Single Multiple Y [Ŷ a, Ŷ b ] [Ŷ a, Ŷ b ] 1 [44,68] [52.572,77.439] [53.195,75.299] 2 [60,72] [59.230,82.365] [63.089,81.937] 3 [56,90] [78.537, ] [75.334, ] 4 [70,112] [65.178,88.774] [65.349,88.470] 5 [54,72] [52.572,77.439] [53.195,75.299] 6 [70,100] [72.457,96.168] [69.587,96.331] 7 [72,100] [72.457,96.168] [69.587,96.331] 8 [76,98] [75.831,96.655] [81.180,99.092] 9 [86,96] [78.209, ] [75.504, ] 10 [86,100] [66.953,90.087] [67.987,90.241]

42 Symbolic Principal Components -- BATS Y1=Head, Y2=Tail, Y3=Height, Y4=Forearm Obs [Y1a,Y1b] [Y2a,Y2b] [Y3a,Y3b] [Y4a,Y4b] [33, 52] [26, 33] [4, 7] [27, 32] 2 [38, 50] [30, 40] [7, 8] [32, 37] 3 [43, 48] [34, 39] [6, 7] [31, 38] 4 [44, 48] [34, 44] [7, 8] [31, 36] 5 [41, 51] [30, 39] [8, 11] [33, 41] 6 [40, 45] [39, 44] [9, 9] [36, 42] 7 [45, 53] [35, 38] [10, 12] [39, 44] 8 [44, 58] [41, 54] [6, 8] [35, 41] 9 [47, 53] [43, 53] [7, 9] [37, 41] 10 [50, 69] [30, 43] [11, 13] [51, 61] 11 [65, 80] [48, 60] [12, 16] [55, 68] 12 [82, 87] [46, 57] [11, 12] [58, 63]

43 Symbolic Principal Components -- BATS Y1=Head, Y2=Tail,Y3=Height,Y4=Forearm Obs PC1a PC1b PC2a PC2b PC3a PC3b

44 Symbolic Principal Component Analysis -- BATS

Descriptive Statistics for Symbolic Data

Descriptive Statistics for Symbolic Data Outline Descriptive Statistics for Symbolic Data Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account

More information

A Resampling Approach for Interval-Valued Data Regression

A Resampling Approach for Interval-Valued Data Regression A Resampling Approach for Interval-Valued Data Regression Jeongyoun Ahn, Muliang Peng, Cheolwoo Park Department of Statistics, University of Georgia, Athens, GA, 30602, USA Yongho Jeon Department of Applied

More information

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18

Variance reduction. Michel Bierlaire. Transport and Mobility Laboratory. Variance reduction p. 1/18 Variance reduction p. 1/18 Variance reduction Michel Bierlaire michel.bierlaire@epfl.ch Transport and Mobility Laboratory Variance reduction p. 2/18 Example Use simulation to compute I = 1 0 e x dx We

More information

Discriminant Analysis for Interval Data

Discriminant Analysis for Interval Data Outline Discriminant Analysis for Interval Data Paula Brito Fac. Economia & LIAAD-INESC TEC, Universidade do Porto ECI 2015 - Buenos Aires T3: Symbolic Data Analysis: Taking Variability in Data into Account

More information

Lasso-based linear regression for interval-valued data

Lasso-based linear regression for interval-valued data Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS065) p.5576 Lasso-based linear regression for interval-valued data Giordani, Paolo Sapienza University of Rome, Department

More information

Modelling and Analysing Interval Data

Modelling and Analysing Interval Data Modelling and Analysing Interval Data Paula Brito Faculdade de Economia/NIAAD-LIACC, Universidade do Porto Rua Dr. Roberto Frias, 4200-464 Porto, Portugal mpbrito@fep.up.pt Abstract. In this paper we discuss

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

A Short Course in Basic Statistics

A Short Course in Basic Statistics A Short Course in Basic Statistics Ian Schindler November 5, 2017 Creative commons license share and share alike BY: C 1 Descriptive Statistics 1.1 Presenting statistical data Definition 1 A statistical

More information

Generalization of the Principal Components Analysis to Histogram Data

Generalization of the Principal Components Analysis to Histogram Data Generalization of the Principal Components Analysis to Histogram Data Oldemar Rodríguez 1, Edwin Diday 1, and Suzanne Winsberg 2 1 University Paris 9 Dauphine, Ceremade Pl Du Ml de L de Tassigny 75016

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

Simple Linear Regression Estimation and Properties

Simple Linear Regression Estimation and Properties Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking

More information

ECO220Y Simple Regression: Testing the Slope

ECO220Y Simple Regression: Testing the Slope ECO220Y Simple Regression: Testing the Slope Readings: Chapter 18 (Sections 18.3-18.5) Winter 2012 Lecture 19 (Winter 2012) Simple Regression Lecture 19 1 / 32 Simple Regression Model y i = β 0 + β 1 x

More information

Optimization and Simulation

Optimization and Simulation Optimization and Simulation Variance reduction Michel Bierlaire Transport and Mobility Laboratory School of Architecture, Civil and Environmental Engineering Ecole Polytechnique Fédérale de Lausanne M.

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 51 Outline 1 Matrix Expression 2 Linear and quadratic forms 3 Properties of quadratic form 4 Properties of estimates 5 Distributional properties 3 / 51 Matrix

More information

11. Regression and Least Squares

11. Regression and Least Squares 11. Regression and Least Squares Prof. Tesler Math 186 Winter 2016 Prof. Tesler Ch. 11: Linear Regression Math 186 / Winter 2016 1 / 23 Regression Given n points ( 1, 1 ), ( 2, 2 ),..., we want to determine

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

Simple Linear Regression for the MPG Data

Simple Linear Regression for the MPG Data Simple Linear Regression for the MPG Data 2000 2500 3000 3500 15 20 25 30 35 40 45 Wgt MPG What do we do with the data? y i = MPG of i th car x i = Weight of i th car i =1,...,n n = Sample Size Exploratory

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) The Simple Linear Regression Model based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #2 The Simple

More information

Econometrics I. Professor William Greene Stern School of Business Department of Economics 3-1/29. Part 3: Least Squares Algebra

Econometrics I. Professor William Greene Stern School of Business Department of Economics 3-1/29. Part 3: Least Squares Algebra Econometrics I Professor William Greene Stern School of Business Department of Economics 3-1/29 Econometrics I Part 3 Least Squares Algebra 3-2/29 Vocabulary Some terms to be used in the discussion. Population

More information

Chapter 10. Simple Linear Regression and Correlation

Chapter 10. Simple Linear Regression and Correlation Chapter 10. Simple Linear Regression and Correlation In the two sample problems discussed in Ch. 9, we were interested in comparing values of parameters for two distributions. Regression analysis is the

More information

Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23

2.4.3 Estimatingσ Coefficient of Determination 2.4. ASSESSING THE MODEL 23 2.4. ASSESSING THE MODEL 23 2.4.3 Estimatingσ 2 Note that the sums of squares are functions of the conditional random variables Y i = (Y X = x i ). Hence, the sums of squares are random variables as well.

More information

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors

ECON Introductory Econometrics. Lecture 6: OLS with Multiple Regressors ECON4150 - Introductory Econometrics Lecture 6: OLS with Multiple Regressors Monique de Haan (moniqued@econ.uio.no) Stock and Watson Chapter 6 Lecture outline 2 Violation of first Least Squares assumption

More information

STATISTICAL LEARNING SYSTEMS

STATISTICAL LEARNING SYSTEMS STATISTICAL LEARNING SYSTEMS LECTURE 8: UNSUPERVISED LEARNING: FINDING STRUCTURE IN DATA Institute of Computer Science, Polish Academy of Sciences Ph. D. Program 2013/2014 Principal Component Analysis

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

Statistics 910, #15 1. Kalman Filter

Statistics 910, #15 1. Kalman Filter Statistics 910, #15 1 Overview 1. Summary of Kalman filter 2. Derivations 3. ARMA likelihoods 4. Recursions for the variance Kalman Filter Summary of Kalman filter Simplifications To make the derivations

More information

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b)

REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b) REVIEW (MULTIVARIATE LINEAR REGRESSION) Explain/Obtain the LS estimator () of the vector of coe cients (b) Explain/Obtain the variance-covariance matrix of Both in the bivariate case (two regressors) and

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1

MA 575 Linear Models: Cedric E. Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 MA 575 Linear Models: Cedric E Ginestet, Boston University Mixed Effects Estimation, Residuals Diagnostics Week 11, Lecture 1 1 Within-group Correlation Let us recall the simple two-level hierarchical

More information

. a m1 a mn. a 1 a 2 a = a n

. a m1 a mn. a 1 a 2 a = a n Biostat 140655, 2008: Matrix Algebra Review 1 Definition: An m n matrix, A m n, is a rectangular array of real numbers with m rows and n columns Element in the i th row and the j th column is denoted by

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Final Exam. Economics 835: Econometrics. Fall 2010

Final Exam. Economics 835: Econometrics. Fall 2010 Final Exam Economics 835: Econometrics Fall 2010 Please answer the question I ask - no more and no less - and remember that the correct answer is often short and simple. 1 Some short questions a) For each

More information

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects

Contents. 1 Review of Residuals. 2 Detecting Outliers. 3 Influential Observations. 4 Multicollinearity and its Effects Contents 1 Review of Residuals 2 Detecting Outliers 3 Influential Observations 4 Multicollinearity and its Effects W. Zhou (Colorado State University) STAT 540 July 6th, 2015 1 / 32 Model Diagnostics:

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

Chapter 5 Matrix Approach to Simple Linear Regression

Chapter 5 Matrix Approach to Simple Linear Regression STAT 525 SPRING 2018 Chapter 5 Matrix Approach to Simple Linear Regression Professor Min Zhang Matrix Collection of elements arranged in rows and columns Elements will be numbers or symbols For example:

More information

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept

Interactions. Interactions. Lectures 1 & 2. Linear Relationships. y = a + bx. Slope. Intercept Interactions Lectures 1 & Regression Sometimes two variables appear related: > smoking and lung cancers > height and weight > years of education and income > engine size and gas mileage > GMAT scores and

More information

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29

REPEATED MEASURES. Copyright c 2012 (Iowa State University) Statistics / 29 REPEATED MEASURES Copyright c 2012 (Iowa State University) Statistics 511 1 / 29 Repeated Measures Example In an exercise therapy study, subjects were assigned to one of three weightlifting programs i=1:

More information

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables.

Random vectors X 1 X 2. Recall that a random vector X = is made up of, say, k. X k. random variables. Random vectors Recall that a random vector X = X X 2 is made up of, say, k random variables X k A random vector has a joint distribution, eg a density f(x), that gives probabilities P(X A) = f(x)dx Just

More information

L2: Two-variable regression model

L2: Two-variable regression model L2: Two-variable regression model Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics Revision: September 4, 2014 What we have learned last time...

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Regression and Covariance

Regression and Covariance Regression and Covariance James K. Peterson Department of Biological ciences and Department of Mathematical ciences Clemson University April 16, 2014 Outline A Review of Regression Regression and Covariance

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Statistics 910, #5 1. Regression Methods

Statistics 910, #5 1. Regression Methods Statistics 910, #5 1 Overview Regression Methods 1. Idea: effects of dependence 2. Examples of estimation (in R) 3. Review of regression 4. Comparisons and relative efficiencies Idea Decomposition Well-known

More information

Problem set 1: answers. April 6, 2018

Problem set 1: answers. April 6, 2018 Problem set 1: answers April 6, 2018 1 1 Introduction to answers This document provides the answers to problem set 1. If any further clarification is required I may produce some videos where I go through

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model Topic 17 - Single Factor Analysis of Variance - Fall 2013 One way ANOVA Cell means model Factor effects model Outline Topic 17 2 One-way ANOVA Response variable Y is continuous Explanatory variable is

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

ACM 116: Lectures 3 4

ACM 116: Lectures 3 4 1 ACM 116: Lectures 3 4 Joint distributions The multivariate normal distribution Conditional distributions Independent random variables Conditional distributions and Monte Carlo: Rejection sampling Variance

More information

Chapter 14. Linear least squares

Chapter 14. Linear least squares Serik Sagitov, Chalmers and GU, March 5, 2018 Chapter 14 Linear least squares 1 Simple linear regression model A linear model for the random response Y = Y (x) to an independent variable X = x For a given

More information

Stochastic Optimization One-stage problem

Stochastic Optimization One-stage problem Stochastic Optimization One-stage problem V. Leclère September 28 2017 September 28 2017 1 / Déroulement du cours 1 Problèmes d optimisation stochastique à une étape 2 Problèmes d optimisation stochastique

More information

Lecture 2: Linear and Mixed Models

Lecture 2: Linear and Mixed Models Lecture 2: Linear and Mixed Models Bruce Walsh lecture notes Introduction to Mixed Models SISG, Seattle 18 20 July 2018 1 Quick Review of the Major Points The general linear model can be written as y =

More information

Analysing data: regression and correlation S6 and S7

Analysing data: regression and correlation S6 and S7 Basic medical statistics for clinical and experimental research Analysing data: regression and correlation S6 and S7 K. Jozwiak k.jozwiak@nki.nl 2 / 49 Correlation So far we have looked at the association

More information

Section 3: Simple Linear Regression

Section 3: Simple Linear Regression Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Well-developed and understood properties

Well-developed and understood properties 1 INTRODUCTION TO LINEAR MODELS 1 THE CLASSICAL LINEAR MODEL Most commonly used statistical models Flexible models Well-developed and understood properties Ease of interpretation Building block for more

More information

Lecture 9 SLR in Matrix Form

Lecture 9 SLR in Matrix Form Lecture 9 SLR in Matrix Form STAT 51 Spring 011 Background Reading KNNL: Chapter 5 9-1 Topic Overview Matrix Equations for SLR Don t focus so much on the matrix arithmetic as on the form of the equations.

More information

18.440: Lecture 28 Lectures Review

18.440: Lecture 28 Lectures Review 18.440: Lecture 28 Lectures 18-27 Review Scott Sheffield MIT Outline Outline It s the coins, stupid Much of what we have done in this course can be motivated by the i.i.d. sequence X i where each X i is

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Answers to Problem Set #4

Answers to Problem Set #4 Answers to Problem Set #4 Problems. Suppose that, from a sample of 63 observations, the least squares estimates and the corresponding estimated variance covariance matrix are given by: bβ bβ 2 bβ 3 = 2

More information

Regression Analysis Chapter 2 Simple Linear Regression

Regression Analysis Chapter 2 Simple Linear Regression Regression Analysis Chapter 2 Simple Linear Regression Dr. Bisher Mamoun Iqelan biqelan@iugaza.edu.ps Department of Mathematics The Islamic University of Gaza 2010-2011, Semester 2 Dr. Bisher M. Iqelan

More information

Key Algebraic Results in Linear Regression

Key Algebraic Results in Linear Regression Key Algebraic Results in Linear Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 30 Key Algebraic Results in

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

Principal Components Analysis

Principal Components Analysis Principal Components Analysis Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 16-Mar-2017 Nathaniel E. Helwig (U of Minnesota) Principal

More information

Stat 579: Generalized Linear Models and Extensions

Stat 579: Generalized Linear Models and Extensions Stat 579: Generalized Linear Models and Extensions Linear Mixed Models for Longitudinal Data Yan Lu April, 2018, week 12 1 / 34 Correlated data multivariate observations clustered data repeated measurement

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Seven Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Seven Notes Spring 2011 1 / 42 Outline

More information

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin

Regression Review. Statistics 149. Spring Copyright c 2006 by Mark E. Irwin Regression Review Statistics 149 Spring 2006 Copyright c 2006 by Mark E. Irwin Matrix Approach to Regression Linear Model: Y i = β 0 + β 1 X i1 +... + β p X ip + ɛ i ; ɛ i iid N(0, σ 2 ), i = 1,..., n

More information

Chapter 4: Models for Stationary Time Series

Chapter 4: Models for Stationary Time Series Chapter 4: Models for Stationary Time Series Now we will introduce some useful parametric models for time series that are stationary processes. We begin by defining the General Linear Process. Let {Y t

More information

TMA4255 Applied Statistics V2016 (5)

TMA4255 Applied Statistics V2016 (5) TMA4255 Applied Statistics V2016 (5) Part 2: Regression Simple linear regression [11.1-11.4] Sum of squares [11.5] Anna Marie Holand To be lectured: January 26, 2016 wiki.math.ntnu.no/tma4255/2016v/start

More information

sociology 362 regression

sociology 362 regression sociology 36 regression Regression is a means of modeling how the conditional distribution of a response variable (say, Y) varies for different values of one or more independent explanatory variables (say,

More information

Analysis of Covariance

Analysis of Covariance Analysis of Covariance (ANCOVA) Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 10 1 When to Use ANCOVA In experiment, there is a nuisance factor x that is 1 Correlated with y 2

More information

R 2 and F -Tests and ANOVA

R 2 and F -Tests and ANOVA R 2 and F -Tests and ANOVA December 6, 2018 1 Partition of Sums of Squares The distance from any point y i in a collection of data, to the mean of the data ȳ, is the deviation, written as y i ȳ. Definition.

More information

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington

Analysis of Longitudinal Data. Patrick J. Heagerty PhD Department of Biostatistics University of Washington Analsis of Longitudinal Data Patrick J. Heagert PhD Department of Biostatistics Universit of Washington 1 Auckland 2008 Session Three Outline Role of correlation Impact proper standard errors Used to weight

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

Topic 16 Interval Estimation

Topic 16 Interval Estimation Topic 16 Interval Estimation Additional Topics 1 / 9 Outline Linear Regression Interpretation of the Confidence Interval 2 / 9 Linear Regression For ordinary linear regression, we have given least squares

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Statistical Analysis in Modelling: MCMC methods. Heikki Haario Lappeenranta University of Technology

Statistical Analysis in Modelling: MCMC methods. Heikki Haario Lappeenranta University of Technology Statistical Analysis in Modelling: methods Heikki Haario Lappeenranta University of Technology heikki.haario@lut.fi Contents Introduction Linear Models Regression Analysis Covariance, classical error bounds

More information

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections

Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections Applied Regression Modeling: A Business Approach Chapter 3: Multiple Linear Regression Sections 3.4 3.6 by Iain Pardoe 3.4 Model assumptions 2 Regression model assumptions.............................................

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Random Vectors, Random Matrices, and Matrix Expected Value

Random Vectors, Random Matrices, and Matrix Expected Value Random Vectors, Random Matrices, and Matrix Expected Value James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 16 Random Vectors,

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

1/15. Over or under dispersion Problem

1/15. Over or under dispersion Problem 1/15 Over or under dispersion Problem 2/15 Example 1: dogs and owners data set In the dogs and owners example, we had some concerns about the dependence among the measurements from each individual. Let

More information

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =

Matrices and vectors A matrix is a rectangular array of numbers. Here s an example: A = Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write

More information

Multivariate Linear Models

Multivariate Linear Models Multivariate Linear Models Stanley Sawyer Washington University November 7, 2001 1. Introduction. Suppose that we have n observations, each of which has d components. For example, we may have d measurements

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula

Lecture 4: Proofs for Expectation, Variance, and Covariance Formula Lecture 4: Proofs for Expectation, Variance, and Covariance Formula by Hiro Kasahara Vancouver School of Economics University of British Columbia Hiro Kasahara (UBC) Econ 325 1 / 28 Discrete Random Variables:

More information