D I S C U S S I O N P A P E R

Size: px
Start display at page:

Download "D I S C U S S I O N P A P E R"

Transcription

1 I S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E C E S A C T U A R I E L L E S ( I S B A ) UIVERSITÉ CATHOLIQUE DE LOUVAI D I S C U S S I O P A P E R 2012/39 Hyperbolic confidence bands of errors-in-variables regression lines applied to method comparison studies FRACQ, B. and B. GOVAERTS

2 Submission Journal de la Société Française de Statistique Hyperbolic confidence bands of errors-in-variables regression lines applied to method comparison studies Bernard G. Francq 1 and Bernadette B. Govaerts 2 Abstract: This paper focuses on the confidence bands of errors-in-variables regression line applied in the context of measurement methods comparison with or without replicated data. In method comparison studies, the goal is to proof that two measurement methods are equivalent. In that case, there is no analytical bias between them and they must provide the same results on average notwithstanding the errors of measurement. Usually, there is no gold-standard and each measurement method is compared to other one and vice-versa with a linear regression analysis. If there is no bias, the results should not be far from the identity line Y = X with a slope (β) equal to 1 and the intercept (α) equal to 0. A joint-ci is ideally used to test this joint hypothesis and the measurement errors in both axes must be taken into account in the regression analysis by applying a linear errors-in-variables regression. As the very well-known OLS regression assumes that there is no error in the X-axis, it provides a biased estimated line. The DR (Deming regression) or the BLS (Bivariate Least Square regression) are more general regressions and take into account the precision ratio (λ) of the two measurement methods. DR and BLS provide consistent estimated lines which are confounded in the case of homoscedasticity. The joint-ci with a shape of ellipse in a (β,α) plane computed with such regressions (DR or BLS) already exist in the literature but we propose to transform these joint-ci into hyperbolic confidence bands for the line in the (X,Y ) plane which are easier to display and interpret. Four methodologies are proposed based on previous papers and their properties and advantages are discussed. The proposed confidence bands are mathematically equivalent to the ellipses but a detailed comparison of the coverage probabilities is provided with simulations and real data. When the error variances are known, the obtained coverage probabilities are very close to each other but the joint-ci computed with the maximum likelihood (ML) or the method of moments provide slightly better coverage probabilities. Under unknown and heteroscedastic error variances, the ML coverage probabilities drop drastically while the BLS provide better coverage probabilities. Keywords: confidence bands, errors-in-variables regression, method comparison study, equivalence of measurement methods, deming regression, bivariate least square AMS 2000 subject classifications:,, 1. Introduction The needs of the industries to quickly assess the quality of products leads to the development and improvement of new measurement methods sometimes faster, easier to handle, less expensive or more accurate than the reference method. These alternative methods should ideally lead to results comparable to those obtained by a standard method [35] in such a way that there is no bias between these two methods or that these measurement methods can be interchangeable. 1 Université Catholique de Louvain, Institut de Statistique, Biostatistique et sciences Actuarielles, Belgium bernard.g.francq@uclouvain.be 2 Université Catholique de Louvain, Institut de Statistique, Biostatistique et sciences Actuarielles, Belgium bernadette.govaerts@uclouvain.be

3 2 B. G. Francq, B. B. Govaerts There are different approaches proposed in the literature to deal with the method comparison studies. Firstly, the most known and widely used is certainly the approach proposed by Bland and Altman which focuses directly on the differences between the results of the two measurement methods [1, 4, 5]. Secondly, the approach based on regression analysis (or linear functional relationship [18]) is also widely applied and focuses on the regression parameter estimates and their confidence intervals (CI) [28]. Each approach has his own advantages and disadvantages, the first being more intuitive for the user and the second providing a stronger statistical basis. This paper focuses on the regression approach. Practically, to test statistical equivalence between two measurement methods, a certain characteristic of series of samples (or subjects) is measured by the two methods over the experimental domain of interest. The pairs of measures taken by the reference method and the alternative one are modelled by a regression line and the parameter estimates used to test the equivalence. Indeed, an intercept significantly different from zero indicates a systematic analytical bias between the methods and a slope significantly different from one indicates a proportional bias [28]. To achieve this correctly, it is essential to take into account the errors in both axes and the heteroscedasticity if necessary [28]. Various types of regressions exist to deal with this problem [31]. In this paper, we base the equivalence test on the joint-ci and propose to transform the classical ellipse computed with the bivariate distribution of the parameters into hyperbolic confidence bands for the regression line with replicated or unreplicated data. We don t consider the confidence bands of other shapes [25, 24] like two straight lines around the estimated line as the hyperbolic curve is the only correct way to fit the distribution of estimated lines. We ll first review, explain and compare the joint-ci and the confidence bands of the very wellknown OLS regression. Then, we ll review four different approaches to compute a joint-ci with an ellipse and we ll transform these joint-ci into confidence bands. on-parametric and bootstrap confidence bands for errors-in-variables regressions are available in the literature [6] but according to our knowledge, the confidence bands given in this paper are novel. The systolic blood pressure data published by Bland and Altman [5] are used to illustrate these techniques. 2. The model and the goal of method equivalence testing 2.1. What is method equivalence and which data are needed to test it? In the systolic blood pressure data [5], simultaneous measurements were performed using a sphygmomanometer and a semi-automatic blood pressure monitor in order to study whether these two devices are equivalent or not. The literature does not provide a clear and unique definition of the equivalence concept. Statistical equivalence approach will, in priority, test whether the two devices are equivalent notwithstanding the errors of measurement or whether there is a bias between the two devices. Additionaly, a statistical test can be performed to compare the accuracies of the two methods. Practical equivalence approach will not focus on statistical parameters (bias and variance) but will consider two methods equivalent when one device can substitute by the other one without affecting the decision taken from the measurement result. In this paper, we ll focus mainly on the statistical question and test whether there is a bias between the two devices. The most classical design in method comparison studies consists of measuring each sample (or subject) once by both devices. Unfortunately, whith unreplicated data, it is not possible to estimate

4 the variances of measurement errors if needed and a design with replicated data is more suitable. If the accuracy of a reference measurement method (gold standard) is known, each sample (or subject) can, more simply, be measured only once by this method and at several times by the new method. In the systolic blood pressure data, the design is quite simple: three sets of readings were made in quick succession by both devices on each patient The general model To compare two measurement methods, we suppose that a parameter of interest is measured on samples (i = 1,2,...,) or subjects by both methods [3, 11, 26]: X i j = ξ i + τ i j, Y i j = η i + ν i j (1) The X i j ( j = 1,2,...,n Xi ) and Y i j ( j = 1,2,...,n Yi ) are the repeated measures for sample i by methods X and Y respectively, n Xi and n Yi are the number of repeated measures of sample i by each method. ξ i and η i are the true but unobservable value of the parameter of interest for both methods, which we ll assume to be linked by a linear regression [3, 11, 26]: η i = α + βξ i (2) Let s define by X i and Y i, the means of the repeated measures for a given sample: X i = 1 n Xi n Xi X i j and Y i = 1 j=1 n Yi n Yi j=1 Y i j (3) τ i j and ν i j are the measurement errors which are assumed to be independent and normally distributed (with constant variances in the case of homoscedasticity: ( ) (( ) ( )) τi j 0 σ 2 i, τi 0 ν i j 0 0 σν 2 (4) i So, the means of the repeated measures are also normally distributed with: ( ) ) στ Xi i ( 2 i ξi, n Xi 0 (5) Y i η i 0 When the variances σ 2 τ i and σ 2 ν i are unknown, they can be estimated, if repeated measures are available, respectively by S 2 τ i and S 2 ν i : Sτ 2 i = 1 n Xi n Xi 1 j=1 In this paper, we ll also use the notations: S xx = i=1 σ 2 ν i n Yi (X i j X i ) 2 and Sν 2 i = 1 n Yi 1 X = 1 (X i X) 2,S yy = i=1x i and Y = 1 i=1 i=1 (Y i Y ) 2 and S xy = Y i, n Yi j=1 i=1 3 (Y i j Y i ) 2 (6) (X i X)(Y i Y ).

5 4 B. G. Francq, B. B. Govaerts 2.3. The homoscedastic model In the homoscedastic model, we consider that both measurement methods have a constant accuracy through the domain of interest, i. e., σ 2 τ i = σ 2 τ and σ 2 ν i = σ 2 ν i. Moreover, as we ll regress the mean measures Y i to X i, we also assume as constant the number of replicates (n Xi = n X and n Yi = n Y i) to prevent the model to become heteroscedastic even if the accuracies of measurement methods are constant. In this case of homoscedasticity, the variances S 2 τ i and S 2 ν i are estimates of σ 2 τ and σ 2 ν and we can define global estimates for σ 2 τ and σ 2 ν respectively by S 2 τ and S 2 ν: Sτ 2 = i=1 (n X i 1)Sτ 2 i ( i=1 n X i ) and S2 ν = i=1 (n Y i 1)Sν 2 i ( i=1 n Y i ) Hence, with constant number of repeated measures, we have n Xi = n X and n Yi = n Y i: 2.4. How to test the equivalence? S 2 τ = i=1 S2 τ i and S 2 ν = i=1 S2 ν i (7) If the two measurement methods are equivalent, they should give the same results for a given sample notwithstanding the measurement errors. In the model notations, method equivalence means then ξ i = η i i [28, 33]. In practice, due to the measurement errors, these parameters are unobservable and the equivalence test will be based on the following regression model: Y i = α + βx i + ε i with ε i (0,σ 2 ε i ) and σ 2 ε i = σ 2 ν i n Yi + β 2 σ 2 τ i n Xi, (8) where α, the intercept, and β, the slope, are estimated respectively by ˆα and ˆβ. This regression model is applied on the averages of repeated measures because individual measures cannot be paired. A lot of practitioners wonder which measurement method to put on the X-axis or on the Y-axis. Actually the variables X and Y play similar roles [2] and if the regression line is estimated adequately (by taking into account errors in both axis), the coordinate system should not matter [34]. The estimated parameters ˆα and ˆβ provide the information to assess the equivalence. Indeed, an intercept significantly different from 0 means that there is a constant bias between the two measurement methods and a slope significantly different from 1 means that there is a proportional bias between the two measurement methods [28]. So, the following two-sided hypothesis will be used to test method equivalence: H α 0 : α = 0, H α 1 : α 0 and H β 0 : β = 1, Hβ 1 : β 1 (9) The null hypothesis H0 α : α = 0 is rejected if 0 isn t included inside a confidence interval (CI) for α and the null hypothesis H β 0 : β = 1 is rejected if 1 isn t included inside a CI for β. The two measurement methods can be called average equivalent if the hypothesis H β 0 : β = 1 and Hα 0 :

6 α = 0 are not rejected [33]. However, these tests can also be applied jointly by checking whether the point (1,0) is included or not in a joint-ci for the regression coefficients θ = (α,β) which is, classically, a confidence ellipse. This paper focuses on this joint hypothesis: H 0 : θ = (0,1) and H 1 : θ (0,1) (10) and proposes simultaneous confidence bands (CB) for the regression line Y = α + βx = x θ over X (, ) where x = (1,X) which are equivalent to the confidence ellipses. The hypothesis H 0 : θ = (0,1) is rejected if the line Y = X intercepts the CB and not rejected if the line lies inside the CB The OLS regression This section reviews results on the classical estimation of a regression line by OLS when X is observed without errors. These will be crucial in the development in further sections on errors in variables models. In particular, the not well known concept of confidence bands (CB) around the regression line is reviewed. It is shown that it is equivalent to the joint confidence interval on the regression parameters and that they can both be used for equivalence testing Ordinary Least Squares (OLS) regression estimators The easiest way to estimate the parameters α and β of model (8) in the case of homoscedasticity is to use the very well known technique of Ordinary Least Squares [13, 17]: OLS. The OLS regression minimizes the criterion C BLS : the sum of squared vertical distances (residuals) between each point and the line as shown in Figure 1: C OLS = i=1 (Y i ˆα ˆβX i ) 2, and the corresponding parameter estimators are given by the following formulas: ˆβ OLS = S xy S xx and ˆα OLS = Y ˆβ OLS X (11) Unfortunately, OLS minimization criterion doesn t take into account the errors in the independent variables [7]. In other words, the OLS assumes that there is no error given by the measurement method in the X-axis, i.e., τ i j are supposed to be equal to zero or negligible in (1). The corresponding estimates are therefore biased [7] and the bias are related to the variance ratio: σ 2 τ /σ 2 ξ [10]. For large sample sizes, the estimated parameters provided by the OLS converge to [10, 16]: 1 ˆβ OLS 1 + (στ 2 /n X )/(σξ 2/n Y ) β, ˆα OLS α + (σ 2 τ /n X )/(σ 2 ξ /n Y ) 1 + (σ 2 τ /n X )/(σ 2 ξ /n Y ) Xβ.

7 6 B. G. Francq, B. B. Govaerts Y DR-BLS OLS λ XY X FIGURE 1. Illustration of OLS and DR-BLS regressions criteria of minimization 3.2. Confidence intervals computed from OLS estimators Supposing that σ 2 τ = 0, separated 100(1 γ)% CI for β and α are symmetric around ˆβ OLS and ˆα OLS respectively and are computed as [8]: CI(β OLS ) : ˆβ OLS ±t 1 γ 2, 2S ˆβOLS with S ˆβOLS = S 2 OLS S xx (12) where SOLS 2 = 1 2 (Y i ˆα OLS ˆβ OLS X i ) 2, i=1 ( ) CI(α OLS ) : ˆα OLS ±t 1 γ, 2S ˆα 2 OLS with S ˆαOLS = S 2 1 OLS + X 2, (13) S xx where t 1 γ/2, 2 is the 100(1 γ/2)% percentile of a t-distribution with 2 df. Additionally, the covariance between the slope and the intercept, S ˆα ˆβOLS is given by: S ˆα ˆβOLS = X S2 OLS S xx = XS 2ˆβOLS The joint-ci for α and β is given by: ( ˆα OLS α ˆβ OLS β) ( S 2ˆα OLS S ˆα ˆβOLS S ˆα ˆβOLS S 2ˆβOLS ) 1 ( ) ˆαOLS α 2F 1 γ,2, 2, (14) ˆβ OLS β where F 1 γ,2, 2 is the 100(1 γ)% percentile of the F distribution with 2 and 2 degrees of freedom. This joint-ci is an ellipse centered on the estimated parameters ˆθ = ( ˆα OLS, ˆβ OLS ). When S ˆα ˆβOLS increases, the ellipse becomes narrower and collapses to a line, otherwise when S ˆα ˆβOLS 0 the ellipse becomes wider until its major and minor axes become parallel to the

8 β-axis and α-axis. These CI (separate and joint CI) are exact under the assumptions of OLS, i. e., no errors in the X-axis (σ 2 τ = 0) and normality of ε i. Finally, the equivalence between two measurements methods is rejected if: ( ˆα OLS 0 ˆβOLS 1) ( S 2ˆα OLS S ˆα ˆβOLS S ˆα ˆβOLS S 2ˆβOLS ) 1 ( ) ˆαOLS 0 > 2F 1 γ,2, 2. ˆβ OLS The confidence bands for the OLS line The confidence bands for the OLS line is a concept not very well-known by non-statisticians despite many papers already published in the past. Indeed, the confidence band for a line is often not available in classical software. The confidence bands are the CI for the regression line and should not be confused with the CI for the conditional mean E(Ŷ 0 X 0 ) given by the following well-known formula [8]: Ŷ 0 ±t 1 γ 2, 2S Ŷ 0, where Ŷ 0 = ˆα OLS + ˆβ OLS X 0. ( 1 and S 2 Ŷ 0 = S 2ˆα OLS + X0 2 S 2ˆβOLS + 2X 0 S ˆα = S 2 ˆβOLS OLS + (X 0 X) 2 ). (15) S xx This formula is computed point by point but can, of course, be displayed on a graph for all possible values of X 0 and has a hyperbolic shape. The CI for the line Y = α +βx over X (, ) (and not for a given X 0 ) relies on the bivariate dimension of a line as we have to take into account the uncertainties of both parameters (α and β) jointly. The confidence bands (CB) of the line Y = α + βx under OLS assumptions is given by the following formula [22]: ( ˆα OLS + ˆβ ) OLS X ± ( ) 1 2F 1 γ,2, 2 SOLS 2 (X X)2 +. (16) S xx Both formulas are similar but a F-distribution is used for the confidence bands based on the work of Working and Hotelling [36] Comparison of the joint-ci and the confidence bands Actually, the joint-ci on the regression coefficients and the confidence bands of the regression line are mathematically equivalent. In other words, to test the equivalence between two measurement methods, we can check whether the point (β = 1,α = 0) lies inside the ellipse or check whether the identity line (Y = X) lies inside the confidence bands. Figure 1 displays a simulated data set ( = 10,X 1 = 1,...,X 10 = 10,Y i = X i + ε i with ε i (0,1)) on the right with the OLS line and the 95% CB. We can notice that the identity line (the dashed line) lies inside the CB which means that the equivalence isn t rejected. On the left, the 95% joint-ci on (β,α) is displayed as an ellipse and the equivalence point (β = 1, α = 0) lies inside the ellipse. More generally, if a given point (β,α) lies inside the ellipse, the corresponding line (Y = α + βx) lies inside the CB and

9 8 B. G. Francq, B. B. Govaerts α OLS Equivalence Y OLS line and CB Equivalence line β X FIGURE 2. Mathematical equivalence between the joint-ci on (β,α) (ellipse on the left) and the CB (hyperbolic band on the right) computed around the line vice-versa. On the other hand, if a given point (β,α) lies outside the ellipse, the corresponding line intercepts the CB. For a given point (β,α) on the edge of the ellipse, the corresponding line is tangent to the CB. If we consider all the lines which lie inside the CB, it s easy to understand that the two extreme slopes correspond to the oblique asymptotes of the hyperbolic CB (red lines in Figure 2-right). These two extreme slopes correspond to the domain of the ellipse (red arrow on Figure 2-left) and it s easy to check that these slopes are: ˆβ OLS ± 2F 1 γ,2, 2 S ˆβOLS. (17) From the CB, the slopes of the oblique asymptotes can be easily computed: ( ˆα OLS + ˆβ OLS X) ± ( ) 2F 1 γ,2, 2 SOLS (X X)2 S xx lim X X = lim X ˆβOLS ± 2F 1 γ,2, 2 S 2 OLS ( ) 1 + (X X)2 S xx X 2 = ˆβ OLS ± 2F 1 γ,2, 2 S ˆβOLS. By analogy, it s easy to check that the image of the ellipse gives the two extreme intercepts (green arrow in Figure 2-left): ˆα OLS ± 2F 1 γ,2, 2 S ˆαOLS, (18) and correspond to the two intercepts of the CB (when X = 0). Finally, for any slope β 0 in the domain of the ellipse (17), there exists two lines tangent to the hyperbolic CB with slope β 0 and intercepts α 1 0 and α2 0 which correspond on the ellipse, to the two points α1 0 and α2 0 where the vertical line β = β 0 intercepts the ellipse (details not given). This justifies that the joint-ci (ellipse) and the CB (hyperbola) are mathematically equivalent.

10 9 4. The errors-in-variables regressions under homoscedasticity 4.1. The estimators of the errors-in-variables regressions This section presents the formulas of the regression estimators of the model Y i = α + βx i + ε i when X i is observed with error and the errors are assumed to be homoscedastic. We consider four methodologies published previously in the literature. The formulas are written with the notations presented in Section 2. Under homoscedasticity, these four methodologies provide identical estimations of the regression line but the associated variance-covariance matrix of the parameters are different and lead to different CI The Deming regression (DR) estimators To take into account the errors in both axes, let s first define the ratio between the two error variances: C DR = i=1 X i Y i + λ XY X i / ˆβ ˆα ˆβ + λ XY / ˆβ λ XY = σ 2 ν /n Y σ 2 τ /n X (19) λ XY is the ratio of the errors variance in the Y-axis and the errors variance in the X-axis. Usually, the problem of replicated data isn t discussed in details in the literature and a precision ratio λ is defined by λ = σν 2 /στ 2 [33] (or inversely [19, 20, 21, 22]). The Deming regression (DR) is the Maximum Likelihood (ML) solution of model (1) when λ XY is known [11]. In practice, if λ XY is unknown since στ 2 or/and σν 2 is/are unknown, it can be estimated with replicated data by replacing στ 2 and σν 2 with Sτ 2 and Sν 2 respectively. We ll denote the estimator of λ by ˆλ = Sν/S 2 τ 2 (and λ XY by ˆλ XY ). The DR minimizes criterion C DR which is the sum of squared oblique distances between each point and the line [22, 33] as shown in Figure 1, the angle of the direction is related to λ XY and given by λ XY / ˆβ [33]. Specifically, ( ) 2 ( ) 2 + Y i ˆα ˆβY i + λ XY X i ˆα ˆβ ˆβ + λ XY / ˆβ The ML estimators are given by: S yy λ XY S xx + (S yy λ XY S xx ) 2 + 4λ XY Sxy 2 ˆβ DR = and ˆα DR = Y 2S ˆβ DR X (20) xy The assumption of DR is the constancy of λ XY. This assumption is fulfilled when we suppose homoscedasticity and a balanced design (i. e., n Xi and n Yi are constant) The Galea-Rojas et al. procedure Galea-Rojas et al. [12] propose a regression model based on a paper previously published by Ripley and Thompson [30] where maximum likelihood is applied to take into account the errors

11 10 B. G. Francq, B. B. Govaerts and heteroscedasticity in both axes. We present in this section the formulas in the case of homoscedasticity and with replicated data. The estimators of the parameters are: where and ˆβ GR = i=1 W GR ˆx i (Y i Y ) i=1 W GR ˆx i (X i X) = i=1 ˆx i(y i Y ) i=1 ˆx i(x i X) and ˆα GR = Y ˆβ GR X (21) ˆx i = W GR = σ 2 ν 1 n Y + ˆβ 2 GR στ 2 n X σ 2 ν n Y X i + ˆβ GR σ 2 τ n X (Y i ˆα GR ) σ 2 ν n Y + ˆβ 2 GR ˆx i is the abscissa of the projection of the i th point to the line in the oblique direction defined in the previous section. It can be proven that ˆβ GR = ˆβ DR and ˆα GR = ˆα DR (as the GR procedure is more general than the DR). In practice, if σ 2 τ and σ 2 ν are unknown, they can be estimated with replicated data and replaced by S 2 τ and S 2 ν respectively The Bivariate Least Square regression: BLS The Bivariate Least Square regression, BLS, is a generic name but this article refers to the papers published first by Lisý et al. [23] and later by other authors [28, 29, 9, 32]. BLS can take into account error and heteroscedasticity in both axes and is written usually in matrix notation [23, 28, 29, 9, 32]. We present here its formulas in the case of homoscedasticity and with replicated data. The BLS minimizes the criterion C BLS : C BLS = 1 W BLS i=1 στ 2 n X (Y i ˆα ˆβX i ) 2 = ( 2)s 2 BLS with W BLS = σε 2 = σ ν 2 + n ˆβ 2 σ τ 2 Y Practically, the estimations of the parameters (the b vector) are computed by iterations with the following formula: Rb = g (22) where and ( g = 1 W BLS R = 1 ( ) i=1 X i W BLS i=1 X i i=1 X i 2, ( ) ˆαBLS b = ˆβ BLS i=1 i=1 Y ) ( i X i Y i + ˆβ στ 2 (Y i ˆα BLS ˆβ ) BLS X i ) 2 BLS n X W BLS If σ 2 τ and σ 2 ν are unknown, they can be estimated with replicated data and replaced by S 2 τ and S 2 ν. Moreover, even if the parameters σ 2 τ and σ 2 ν are present separately in the formula, the solution b only depends of the ratio λ XY and it can be proven (after some algebraic manipulations) that under homoscedasticity ˆβ BLS = ˆβ DR = ˆβ GR and ˆα BLS = ˆα DR = ˆα GR. n X (23)

12 The Mandel procedure Mandel developed a regression in the context of inter-laboratories studies [27] that can take into account the correlation between the error terms (as he regressed the results of a given laboratory with respect to the averages of the results of all laboratories). In the context of measurement methods, the errors τ i j and ν i j are uncorrelated and it can be shown that, if the correlation term is set to zero, Mandel s regression is exactly equivalent to the DR. The Mandel s procedure consists in transforming the (X,Y ) data into (U = X i + ky i,v = Y i ˆβ Mandel X i ) data such that U has a very small error. The OLS s regression is then applied to the (U,V ) data and transformed back into the (X,Y) axis to finally get a regression line which takes into account errors in both axis and their correlation. By using λ XY instead of λ in the Mandel s procedure (with uncorrelated errors), the formulas are: ˆβ Mandel = S xy + ks yy S xx + ks xy and ˆα Mandel = Y ˆβ Mandel X (24) with k = ˆβ Mandel λ XY ˆβ Mandel can be computed by iterations or by solving a 2 nd degree equation and it is easy to check that ˆβ Mandel = ˆβ DR = ˆβ GR = ˆβ BLS and ˆα Mandel = ˆα DR = ˆα GR = ˆα BLS. By analogy, we ll also use these notations: S uu = i=1 U = 1 (U i U) 2, S vv = i=1 i=1u i and V = 1 i=1 (V i V ) 2 and S uv = V i, i=1 (U i U)(V i V ) = The joint-ci and confidence bands of the errors-in-variables regressions This section considers four methodologies to compute the approximate covariance matrix ˆΣ of the estimates ˆθ = ( ˆα, ˆβ) of the errors-in-variables regression coefficients θ presented in section 4.1 in order to compute the joint-ci for θ or the CB for the regression line. The joint-ci are all confidence ellipses centered on the same point (as the estimators given in the previous section are identical) but with different shapes and are all approximate. The confidence ellipse for θ can be computed with the following formula: ( ˆθ,θ) ˆΣ 1 ( ˆθ,θ) < c (25) where the critical constant c is chosen suitably depending on whether χ1 γ,2 2 (the 100(1 γ)% percentile of a χ 2 distribution with 2 df) or 2F 1 γ,2, 2 is used as the approximate distribution. This confidence ellipse can be represented equivalently as a CB: x θ x ˆθ ± c x ˆΣ 1 x (26)

13 12 B. G. Francq, B. B. Govaerts The variance-covariance matrix of the estimators provided by DR Gillard and Iles [14, 15] propose to compute the variance-covariance matrix of the estimators by the method of moments. When the ratio of error variances λ XY is assumed to be known, the variances and covariance of the estimators can be computed with the following formulas (which have been modified to take into account the replicated data): S 2ˆβDR = S xxs yy Sxy 2 ( ) Sxy 2, S 2ˆα = X 2 DR S + ˆβ DR 2 σ τ 2 /n X + σν 2 /n Y 2ˆβDR ˆβ DR and S ˆα ˆβDR = XS 2ˆβDR (27) If needed, the variance(s) σ 2 τ and σ 2 ν can be replaced by S 2 τ and S 2 ν. Based on the asymptotic normal distribution of the parameters ˆβ DR and ˆα DR (get by ML), we propose to use a F-distribution with c = 2F 1 γ,2, 2 to prevent the joint-ci or CB being too narrow for small sample sizes The variance-covariance matrix of the estimators provided by Galea-Rojas et al. Galea-Rojas et al. [12] provide the asymptotic variance-covariance matrix of the parameters derived by ML. In the case of homoscedasticity and with replicated data, the asymptotic variancecovariance matrix of the parameters is given by: Σ GR = Wn 1 V n Wn 1 /, (28) where with W n = W GR ( ) ( ) i=1 ξ i 0 0 i=1 ξ i i=1 ξ i 2 and V n = W n + 0 k GR k GR = W GR C and C = 1 σ 2 τ /n X + β 2 GR σ 2 ν /n Y In practice, to get a consistent estimator of the variance-covariance matrix, βgr 2 can be replaced by ˆβ GR 2, ξ i by ˆx i and ξi 2 by ˆξ i 2 1/C [12]. From the Wald statistic given in the literature [12], the critical constant is c = χ1 γ,2 2. The variances of the parameters can also be computed with the following equivalent formulas [12]: σ 2ˆβGR = 1 ( 1 + k ) GR, σ 2ˆα SS W SS GR = 1 + ξ 2 σ 2ˆβGR and σ W W ˆα = ξ σ 2ˆβGR (29) ˆβGR GR with SS W = W GR i=1 (ξ i ξ ) 2 and ξ = i=1 ξ i. In practice, σ 2ˆβGR, σ 2ˆα GR and σ ˆα ˆβGR can be estimated by replacing ξ by X, and SS W by W GR i=1 ( ˆx2 i C 1 2 ˆx i X + X 2 ).

14 The variance-covariance matrix of the estimators provided by BLS Riu and Rius [32] propose the following variance-covariance matrix for the BLS parameters: or equivalently: S 2ˆβBLS = ˆΣ BLS = s 2 BLSR 1 (30) W BLS s 2 BLS i=1 X i 2 ( i=1 X i), 2 S2ˆα BLS = WBLSs2 BLS i=1 X i 2 i=1 X i 2 ( i=1 X i) and S 2 ˆα ˆβ BLS = XS 2ˆβBLS (31) The critical constant c is given by 2F 1 γ,2, 2 [12]. The disadvantages of this approximate ellipse (the theoretical background of this joint-ci isn t rigorous) can be found in the literature [12] The variance-covariance matrix of the estimators provided by Mandel With the Mandel s procedure, the OLS technique is applied in the (U,V) axes [27] and the variances of the parameters computed with the formulas given in section 3.2. The variances of the parameters ˆβ Mandel and ˆα Mandel are derived by the general formula for the propagation of errors from the reconversion to (X,Y ) scales: S 2ˆβMandel = (1 + k ˆβ Mandel ) 2 S uu S 2 e Mandel and S 2ˆα Mandel = ( 1 + X 2 (1 + k ˆβ ) Mandel ) 2 Se 2 S Mandel (32) uu with Se 2 Mandel = S vv 2 The covariance and the joint-ci are not provided by Mandel but the covariance can be also easily computed by the formula for the propagation of errors: S ˆα ˆβMandel = XS 2ˆβMandel (33) We propose to use a F-distribution with c = 2F 1 γ,2, 2 as the Mandel s procedure is based on the OLS technique Coverage probabilities of the joint confidence intervals or confidence bands In order to compare the coverage probabilities of the joint-ci or CB provided by the four methodologies presented in the previous sections, we simulated samples with = 10, 20, 50, with unreplicated data (n X = n Y = 1,λ = λ XY known) and under equivalence (α = 0,β = 1,η i = ξ i ) for the values of λ XY given in Table 1 (with σ 2 ν from 0.1 to 2 and σ 2 τ inversely from 2 to 0.1 providing 13 values of λ from 0.05 to 20). Different values of η i were drawn randomly from an Uniform distribution U(10,20) for each simulated sample. The corresponding joint-ci or CB were computed for each simulated sample and the coverage probabilities (at a nominal level = 95%) computed per value of λ XY. We also simulated replicated data to allow the estimation of σ 2 ν and σ 2 τ : with equal number of replicates (n X = n Y = 2 and λ = λ XY or n X = n Y = 4 and

15 14 B. G. Francq, B. B. Govaerts σν στ λ = λ XY TABLE 1. Values of σ 2 ν and σ 2 τ for the simulations with n X = n Y = 1 and the corresponding values of λ and λ XY λ = λ XY ) and with unequal number of replicates (n X = 4,n Y = 2 and λ XY = 2λ) as shown in Table 2 (Appendix) in such a way that the values of λ XY are identical to those of the unreplicated case. To study in more details the effect of the sample size, we also run simulations with λ XY = 0.33,1,3, and from 10 to 100 (10, 12, 14, 16, 20, 30, 50, 75, 100), with or without replicated data as shown in Table 3 (Appendix). The coverage probabilities are displayed for λ XY known in Figure 3 with respect to λ XY (left, on a logarithmic scale) and to (right). Figures 4,5,13 display the coverage probabilities corresponding to replicated data and estimated λ XY. When the variances are known, all the coverage probabilities are between 93% and 96% and closer to 95% when λ XY > 1 for the GR, BLS and Mandel procedures. That s the reason why we recommend choosing the coordinate system such that λ XY is higher than 1 in practice. When increases, the GR and DR are closer to 95% while the BLS and Mandel procedures provide slightly lower coverage probabilities. When the variances are estimated with replicated data, the coverage probabilities provided by the DR are slightly lower but are the closest to 95%. On the other hand, the coverage probabilities provided by the GR s procedure drop because of the uncertainties on the estimated variances but these coverage probabilities increase with. For instance, when n X = n Y = 2 with λ XY = 1, the GR outperforms the BLS with > 30 (approximately). The BLS and Mandel methodologies provide similar coverage probabilities with known or unknown variances. In practice, the accuracy of one measurement method is rarely three times higher than the other such that λ XY often lies between around 0.33 and 3. In this interval, the DR and GR are always slightly more suitable with known variances while the DR is always more suitable with unknown variances. 5. The errors-in-variables regressions under heteroscedasticity This section presents the formulas of the regression estimators of model (8) under heteroscedasticity when the variances σ 2 τ i and σ 2 ν i are known or estimated with replicates. The Mandel s regression and the DR are not considered anymore as they are not suitable in the case of heteroscedasticity. The GR procedure and the BLS can take into account the heteroscedasticity and are identical for estimating the regression line but the variance-covariance matrix of the parameters are computed differently. The covariance matrix ˆΣ of the estimates ˆθ = ( ˆα, ˆβ) given in this section can be plugged in formulas (25) and (26) to get respectively the corresponding joint-ci (confidence ellipse) or CB (with the same critical constant c given in section 4.2) The Galea-Rojas et al. procedure Based on a paper published previously by Ripley and Thompson [30], the ML parameters estimators are: ˆβ GR = i=1 W i GR ˆx i (Y i Y ) i=1 W i GR ˆx i (X i X) and ˆα GR = Y W ˆβ GR X W, (34)

16 15 FIGURE 3. Coverage probabilities of the joint-ci or CB related to λ XY in a logarithmic scale with = 10, 20, 50 (left) and related to for λ XY = 0.33, 1, 3 (right), (n X = n Y = 1)

17 16 B. G. Francq, B. B. Govaerts FIGURE 4. Coverage probabilities of the joint-ci or CB related to λ XY in a logarithmic scale with = 10, 20, 50 (left) and related to for λ XY = 0.33, 1, 3 (right), (n X = n Y = 2)

18 17 FIGURE 5. Coverage probabilities of the joint-ci or CB related to λ XY in a logarithmic scale with = 10, 20, 50 (left) and related to for λ XY = 0.33, 1, 3 (right), (n X = n Y = 4)

19 18 B. G. Francq, B. B. Govaerts where W igr = 1 σν 2 i n Yi + ˆβ and ˆx GR 2 στ 2 i = i n Xi σ 2 ν i n Yi X i + ˆβ GR σ 2 τ i n Xi (Y i ˆα GR ) σ 2 ν i n Yi + ˆβ 2 GR, στ 2 i n Xi X W = i=1 W i GR X i i=1 W i GR and Y W = i=1 W i GR Y i i=1 W i GR In practice, if στ 2 i and σν 2 i are unknown, they can be estimated with replicated data and replaced by Sτ 2 i and Sν 2 i. The asymptotic variance-covariance matrix of the parameters derived by ML is given by Galea-Rojas et al. [12]: Σ GR = Wn 1 V n Wn 1 /, (35) where with W n = 1 ( i=1 W igr i=1 W ) ( ) i GR ξ i 0 0 i=1 W i GR ξ i i=1 W i GR ξi 2 and V n = W n + 0 k GR k GR = 1 i=1 W igr C i and C i = 1 σ 2 τ i /n Xi + β 2 GR σ 2 ν i /n Yi In practice, to get a consistent estimator of the variance-covariance matrix, βgr 2 can be replaced by ˆβ GR 2, ξ i by ˆx i and ξi 2 by ˆξ i 2 1/C i [12]. The variances of the parameters can also be computed with the following equivalent formulas [12]: σ 2ˆβGR = 1 ( 1 + k ) GR, σ 2ˆα 1 SS W SS GR = W i=1 W + ξ 2 W σ 2ˆβGR and σ ˆα = ξ ˆβGR W σ 2ˆβGR (36) i GR with SS W = i=1 W i GR (ξ i ξ W ) 2 and ξ W = i=1 W i GR ξ i i=1 W i GR. In practice, σ 2ˆβGR, σ 2ˆα GR and σ ˆα ˆβGR can be estimated by replacing ξ W by X W, and SS W by i=1 W i GR ( ˆx 2 i C 1 i 2 ˆx i X W + X 2 W ) The Bivariate Least Square regression: BLS The BLS regression line minimizes the criterion C BLS [29, 9, 32]: C BLS = i=1 1 (Y i ˆα W ˆβX i ) 2 = ( 2)s 2 BLS with W ibls = σε 2 i = σ ν 2 i ibls n Yi + ˆβ 2 σ 2 τ i n Xi The estimations of the parameters are computed by iterations with the following formula: i=1 i=1 1 W ibls i=1 X i W ibls i=1 X i W ibls ( ˆαBLS Xi 2 W ibls ) = ˆβ BLS Rb = g (37) i=1 ( i=1 X i Y i W ibls + ˆβ BLS σ 2 τ i Y i W ibls (Y i ˆα BLS ˆβ BLS X i ) 2 n Xi Wi 2 BLS ) (38)

20 In practice, if σ 2 τ i and σ 2 ν i are unknown, they can be estimated with replicated data and replaced by S 2 τ i and S 2 ν i. It can be proven (after some algebraic manipulations) that ˆβ BLS = ˆβ GR and ˆα BLS = ˆα GR under heteroscedasticity. The variance-covariance matrix of the parameters provided by Riu and Rius [32] is given by: ˆΣ BLS = s 2 BLSR 1 (39) or can be computed with the following formulas: 19 with D = i=1 s 2 S 2ˆβBLS BLS i=1 = D 1 W ibls i=1 1 W ibls X 2 i W ibls s 2, S 2ˆα BLS i=1 BLS = D ( i=1 ) X 2. i W ibls X 2 i W ibls s 2 BLS i=1 and S ˆα = ˆβBLS D X i W ibls (40) 5.3. Coverage probabilities of the joint confidence intervals or confidence bands The coverage probabilities of the joint-ci provided by GR and BLS have already been compared in the literature only in the case of known variances [12]. Since in practice, the variances are never known but always estimated, we propose to study in more details the coverage probabilities with known and unknown variances. We simulated samples with = 10, 12, 15, 20, 30, 50, 75, 100, with replicated data (n X = n Y = 5) under equivalence (α = 0,β = 1,η i = ξ i ). The values of ξ i were drawn randomly from an Uniform distribution U(10,20) for each simulated sample and the coverage probabilities computed at a nominal level = 95%. We first chose the variances according to the following functions: σ 2 τ i = 0.45ξ i and σ 2 ν i = 0.45ξ i, where the variances σ 2 τ i and σ 2 ν i increase with ξ i and are equal for a given ξ i (it is common in practice that the precision of a measurement method decreases when ξ i increases) and such that σ 2 τ i = σ 2 ν i = 0.5 when ξ i = 10 and σ 2 τ i = σ 2 ν i = 5 when ξ i = 20. Secondly the variances were chosen independently and randomly ( random heteroscedasticity ) from an Uniform distribution U(0.5,5). To study in more details the effect of the replicates when the variances are estimated, simulations were also launched with = 20 and n X = n Y = 2, 4, 6, 8, 10, 12, 15, 20. Figure 6 displays the coverage probabilities, which are close to 95% when the variances are known but closer to 95% for GR. The coverage probabilities collapse drastically when the variances have to be estimated (the number of replicates, n X = n Y = 5, is too low to estimate a variance) and drop more for GR and especially when increases or with a random heteroscedasticity. Obviously, when the number of replicates increases, the coverage probabilities increase but BLS still outperforms for n X = n Y = 20. Actually, the uncertainties on the estimated variances are not taken into account with BLS or GR s methodologies and work is in progress to estimate such regression s lines by taking into account additionally the variances uncertainties. 6. Applications 6.1. The systolic blood pressure data under homoscedasticity In the Bland and Altman systolic blood pressure data [5], simultaneous measurements were made by two observers (denoted J and R) using a sphygmomanometer and a semi-automatic blood

21 20 B. G. Francq, B. B. Govaerts FIGURE 6. Coverage probabilities of the joint-ci or CB related to in the X-axis (top and middle) or related to n X = n Y (bottom) with heteroscedasticity and variances known (top) or unknown and estimated (middle and bottom), simulated with a function (left) or randomly (right)

22 pressure monitor (denoted S). We ll consider only the measurements made by observer J with the manual device and the measurements made by the semi-automatic monitor S. In other words, the systolic blood pressure was measured three times per patient (85 patients) by the semi-automatic device S and three times per patient by observer J with the manual device. If we decide to put on the Y-axis the mean measures given by S and on the X-axis those given by J (for reasons explained below), we have: X 1 = ,...,X 85 = and Y 1 = ,...,Y 85 = 124. The variances σ 2 τ i and σ 2 ν i are unknown but can be estimated with S 2 τ i and S 2 ν i : S 2 τ 1 = ,...,S 2 τ 85 = and S 2 ν 1 = 9.333,...,S 2 ν 85 = 13. If we calculate the minimum and the maximum of the estimated variances S 2 τ i and S 2 ν i, we get: Min(S 2 τ i ) = 1.333,Max(S 2 τ i ) = and Min(S 2 ν i ) = 1.333,Max(S 2 ν i ) = It is clear that the variances are significantly different from one patient to another one. evertheless, as this section supposes homoscedasticity and for didactic purposes, we compute the global estimates S 2 τ and S 2 ν: S 2 τ = and S 2 ν = λ (= λ XY as n X = n Y ) is estimated by ˆλ = We can observe that ˆλ is higher than 1. Actually, that s the reason why we chose to put on the Y-axis the mean measures given by S and on the X-axis those given by J. Indeed, we explained in Section 4.3 that the coverage probabilities are closer to the nominal level when λ XY > 1. We have also: X = ,Y = ,S xx = ,S yy = ,S xy = ,R 2 = 0.67 For the different regression s lines, we get: ˆβ DR = ˆβ GR = ˆβ BLS = ˆβ Mandel = and ˆα DR = ˆα GR = ˆα BLS = ˆα Mandel = Figure 7 displays the scatterplot of the data with standard errors of the mean (S τ / n X and S ν / n Y ). It s obvious that some points are outliers but they will not be removed for didactic purposes. The 95% CB are displayed for the four methodologies (Figure 8-right) and one can see that the equivalence line (Y = X) isn t inside the CB which means that the manual device and the semi-automatic one are not equivalent: the null hypothesis is rejected; the estimated line is significantly different from the equivalence line. The CB provided by Mandel and BLS are very close to each other (nearly superimposed on the graph) and close to DR when we move away from X while the GR is the narrowest. In fact, the are outliers according to the bivariate distribution of the points (X i,y i ). When the distance from an outlier to the line increases, S xy increases and the variance-covariance matrix computed by the DR is modified as it is related to S xy in formulas (27). The variance-covariance matrix computed by the BLS is related to the sum of the weighted residuals (31) and obviously this sum increases when a point moves away from the line. On the other hand, the variance-covariance matrix computed by the GR is not related to S xy or the residuals but is related to W GR and ˆx i which are not modified by such outliers. This can also be noticed in Figure 8-left where the equivalence point (β = 1 and α = 0) is outside all the ellipses and the one provided by GR is the smallest. The minor axes of the GR and DR ellipses are equal and by analogy, the widths of their hyperbolic CB are equal at (X,Y ). The Mandel and BLS s ellipses are nearly 9 times larger than the GR s ellipse and the DR s ellipse nearly 3 times larger than the GR s ellipse The systolic blood pressure data under heteroscedasticity If we suppose heteroscedasticity in the systolic blood pressure data, the variances σ 2 τ i and σ 2 ν i are unknown but can be estimated with S 2 τ i and S 2 ν i from the replicates for each patient and both devices

23 22 B. G. Francq, B. B. Govaerts J S Estimated line Equivalence line FIGURE 7. Scatterplot of the systolic blood pressure data under homoscedasticity with the regression line and standard errors of the mean β α DR GR BLS Mandel J S DR GR BLS Mandel FIGURE 8. Confidence ellipses (left) computing by the four methodologies and the corresponding CB (right)

24 23 S Homoscedasticity Heteroscedasticity Equivalence line J FIGURE 9. Scatterplot of the systolic blood pressure data with the regression s lines and standard errors of the means under heteroscedasticity (n Xi = n X = n Yi = n Y = 3 i). The estimates GR and BLS regression lines are: ˆβ GR = ˆβ BLS = and ˆα GR = ˆα BLS = Figure 9 displays the scatterplot of the data with local standard errors of the means. The estimated line is close to the one estimated under homoscedasticity assumption (the parameters are very close to each other). The 95% CB provided by the BLS and GR under heteroscedasticity are displayed in Figure 10-right with the ones under homoscedasticity (for comparative purpose). It can be noticed that the CB are narrower under heteroscedasticity. Actually, some of the outliers have lower weights and are less taken into account in the heteroscedasticity analysis which leads to lower variances of the parameters. Like in the previous section, the equivalence line (Y = X) isn t inside the CB: the equivalence between both devices is rejected. This is confirmed in Figure 10-left where the equivalence point (β = 1 and α = 0) is outside all the ellipses and the ones provided by the GR are the narrowest (under homoscedasticity and heteroscedasticity) The arsenate ion in natural river water under heteroscedasticity In the arsenate ion in natural river water data [30], 30 pairs of measures are provided by 2 methods: firstly, a continuous selective reduction and atomic absorption spectrometry and secondly, a nonselective reduction, cold trapping and atomic emission spectrometry. The mean measures (X i,y i ) with their standard errors of the mean are given and analysed in the literature [12, 30]. Figure 11 shows the scatterplot of the data and Figure 12 shows the ellipses of the parameters (left) and the CB (right). We can observe that lower concentrations are frequent and that higher concentrations are rarer (logarithmic scales could be tested). The errors increase with concentration for both devices (like our simulations in Section 5.3). The estimated line is close to the equivalence line (Figure 11) and is inside the hyperbolic CB (Figure 12-right) provided by BLS or GR. Obviously, the equivalence point is inside both ellipses (Figure 12-left). Both CB and both ellipses are very

25 24 B. G. Francq, B. B. Govaerts S GR BLS α Homoscedasticity Heteroscedasticity GR BLS J β FIGURE 10. Systolic blood pressure data, confidence ellipses provided by BLS and GR with or without heteroscedasticity (left) and the corresponding CB under heteroscedasticity (right) close to each other to the opposite of the systolic blood pressure example. This is because there is no outliers in this example (in the bivariate distribution of X i and Y i ). Ideally, the data should be transformed into logarithm but we didn t find the detailed data in the literature. Moreover, these data were analysed in the literature without such transformation [12, 30]. 7. Conclusions Under homoscedasticity, four methodologies (DR, GR, BLS and Mandel) have been compared to estimate a regression line which take into account errors in both axes. They are applied to assess the equivalence between two measurement methods on the basis of a confidence ellipse for the parameters estimates or equivalently a confidence band for the regression line. These four methodologies give identical estimations of the regression line. However, the variance-covariance matrices of the parameters estimates are computed differently. The confidence ellipses and the CB are therefore dissimilar. The coverage probabilities are close to each other and close to the nominal level when the error variances are known, especially for λ XY > 1. However, the variance-covariance matrices of the parameters estimates, derived by the method of moments for DR and by the maximum likelihood for the GR, provide slightly better coverage probabilities. Unfortunately, when the error variances are unknown, the coverage probabilities of the GR drop while they are similar for the other procedures. To tackle this problem, the sample size or the number of replicates has to be increased to recover the coverage probabilities of GR. Moreover, the confidence ellipse or CB computed by the GR is the smallest or narrowest with outliers, work is still in progress to assess the influence of outliers on the covaraince matrix of the parameters. Under heteroscedasticity, the GR and BLS provide the same estimated line. The coverage probabilities of the joint-ci or CB are very close to each other even if the GR slightly outperforms BLS. On the other hand, when the variances are unknown and estimated with replicates, the coverage

26 25 Atomic abs. spec Heteroscedasticity Equivalence line Atomic emission spec. FIGURE 11. Scatterplot of arsenate ion in natural river water data with the regression line and standard errors of the means α GR BLS Atomic abs. spec GR BLS β Atomic emission spec. FIGURE 12. Arsenate ion data, confidence ellipses provided by BLS and GR with heteroscedasticity and the corresponding CB

27 26 B. G. Francq, B. B. Govaerts probabilities of the GR collapse and the BLS outperforms GR. Actually, the uncertainties on the estimated variances are not taken into account and work is still in progress to tackle this problem. To summarize, we recommend the DR or GR under homoscedasticity (with a high number of replicates or sample size for the GR when the error variances have to be estimated) and the BLS or GR under heteroscedasticity by being careful to assess the impact of outliers if needed. Finally, the CB presented in this paper are easier to compute, easier to interpret than an ellipse and can be displayed directly in the (X,Y ) space while an ellipse is displayed in a (β,α) space. Appendix A: Tables n x = n Y = 2 σν στ λ = λ XY n x = n Y = 4 σν στ λ = λ XY n x = 4,n Y = 2 σν στ λ λ XY TABLE 2. Values of σ 2 ν and σ 2 τ for the simulations with replicated data and the corresponding values of λ and λ XY n x = n Y = 1 n x = n Y = 2 n x = 4,n Y = 2 n x = n Y = 4 σν στ λ = λ XY TABLE 3. Values of σ 2 ν and σ 2 τ for the simulations with or without replicated data and the corresponding values of λ and λ XY Thanks Support from the IAP Research etwork P7/06 of the Belgian State (Belgian Science Policy) is gratefully acknowledged.

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Scatter plot of data from the study. Linear Regression

Scatter plot of data from the study. Linear Regression 1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25

More information

4 Multiple Linear Regression

4 Multiple Linear Regression 4 Multiple Linear Regression 4. The Model Definition 4.. random variable Y fits a Multiple Linear Regression Model, iff there exist β, β,..., β k R so that for all (x, x 2,..., x k ) R k where ε N (, σ

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5)

Simple Linear Regression. (Chs 12.1, 12.2, 12.4, 12.5) 10 Simple Linear Regression (Chs 12.1, 12.2, 12.4, 12.5) Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 2 Simple Linear Regression Rating 20 40 60 80 0 5 10 15 Sugar 3 Simple Linear Regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

ECON The Simple Regression Model

ECON The Simple Regression Model ECON 351 - The Simple Regression Model Maggie Jones 1 / 41 The Simple Regression Model Our starting point will be the simple regression model where we look at the relationship between two variables In

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Simple linear regression

Simple linear regression Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik

MAT2377. Rafa l Kulik. Version 2015/November/26. Rafa l Kulik MAT2377 Rafa l Kulik Version 2015/November/26 Rafa l Kulik Bivariate data and scatterplot Data: Hydrocarbon level (x) and Oxygen level (y): x: 0.99, 1.02, 1.15, 1.29, 1.46, 1.36, 0.87, 1.23, 1.55, 1.40,

More information

Advanced Econometrics

Advanced Econometrics Based on the textbook by Verbeek: A Guide to Modern Econometrics Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies Vienna May 16, 2013 Outline Univariate

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Binary Logistic Regression

Binary Logistic Regression The coefficients of the multiple regression model are estimated using sample data with k independent variables Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients Ŷ = b

More information

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009

EXAMINERS REPORT & SOLUTIONS STATISTICS 1 (MATH 11400) May-June 2009 EAMINERS REPORT & SOLUTIONS STATISTICS (MATH 400) May-June 2009 Examiners Report A. Most plots were well done. Some candidates muddled hinges and quartiles and gave the wrong one. Generally candidates

More information

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III)

Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Chapter 4: Constrained estimators and tests in the multiple linear regression model (Part III) Florian Pelgrin HEC September-December 2010 Florian Pelgrin (HEC) Constrained estimators September-December

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

3. Linear Regression With a Single Regressor

3. Linear Regression With a Single Regressor 3. Linear Regression With a Single Regressor Econometrics: (I) Application of statistical methods in empirical research Testing economic theory with real-world data (data analysis) 56 Econometrics: (II)

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Reading: Hoff Chapter 9 November 4, 2009 Problem Data: Observe pairs (Y i,x i ),i = 1,... n Response or dependent variable Y Predictor or independent variable X GOALS: Exploring

More information

Homework 2: Simple Linear Regression

Homework 2: Simple Linear Regression STAT 4385 Applied Regression Analysis Homework : Simple Linear Regression (Simple Linear Regression) Thirty (n = 30) College graduates who have recently entered the job market. For each student, the CGPA

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Applied Statistics and Econometrics

Applied Statistics and Econometrics Applied Statistics and Econometrics Lecture 6 Saul Lach September 2017 Saul Lach () Applied Statistics and Econometrics September 2017 1 / 53 Outline of Lecture 6 1 Omitted variable bias (SW 6.1) 2 Multiple

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility The Slow Convergence of OLS Estimators of α, β and Portfolio Weights under Long Memory Stochastic Volatility New York University Stern School of Business June 21, 2018 Introduction Bivariate long memory

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression September 24, 2008 Reading HH 8, GIll 4 Simple Linear Regression p.1/20 Problem Data: Observe pairs (Y i,x i ),i = 1,...n Response or dependent variable Y Predictor or independent

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

An overview of applied econometrics

An overview of applied econometrics An overview of applied econometrics Jo Thori Lind September 4, 2011 1 Introduction This note is intended as a brief overview of what is necessary to read and understand journal articles with empirical

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Chapter 27 Summary Inferences for Regression

Chapter 27 Summary Inferences for Regression Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com

Simple Linear Regression. Material from Devore s book (Ed 8), and Cengagebrain.com 12 Simple Linear Regression Material from Devore s book (Ed 8), and Cengagebrain.com The Simple Linear Regression Model The simplest deterministic mathematical relationship between two variables x and

More information

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania

Chapter 10. Regression. Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Chapter 10 Regression Understandable Statistics Ninth Edition By Brase and Brase Prepared by Yixun Shi Bloomsburg University of Pennsylvania Scatter Diagrams A graph in which pairs of points, (x, y), are

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 370 Regression models are used to study the relationship of a response variable and one or more predictors. The response is also called the dependent variable, and the predictors

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 6 Jakub Mućk Econometrics of Panel Data Meeting # 6 1 / 36 Outline 1 The First-Difference (FD) estimator 2 Dynamic panel data models 3 The Anderson and Hsiao

More information

Statistical inference

Statistical inference Statistical inference Contents 1. Main definitions 2. Estimation 3. Testing L. Trapani MSc Induction - Statistical inference 1 1 Introduction: definition and preliminary theory In this chapter, we shall

More information

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable,

PART I. (a) Describe all the assumptions for a normal error regression model with one predictor variable, Concordia University Department of Mathematics and Statistics Course Number Section Statistics 360/2 01 Examination Date Time Pages Final December 2002 3 hours 6 Instructors Course Examiner Marks Y.P.

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

Lecture 19 Multiple (Linear) Regression

Lecture 19 Multiple (Linear) Regression Lecture 19 Multiple (Linear) Regression Thais Paiva STA 111 - Summer 2013 Term II August 1, 2013 1 / 30 Thais Paiva STA 111 - Summer 2013 Term II Lecture 19, 08/01/2013 Lecture Plan 1 Multiple regression

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Estadística II Chapter 4: Simple linear regression

Estadística II Chapter 4: Simple linear regression Estadística II Chapter 4: Simple linear regression Chapter 4. Simple linear regression Contents Objectives of the analysis. Model specification. Least Square Estimators (LSE): construction and properties

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

CHAPTER EIGHT Linear Regression

CHAPTER EIGHT Linear Regression 7 CHAPTER EIGHT Linear Regression 8. Scatter Diagram Example 8. A chemical engineer is investigating the effect of process operating temperature ( x ) on product yield ( y ). The study results in the following

More information

ISQS 5349 Final Exam, Spring 2017.

ISQS 5349 Final Exam, Spring 2017. ISQS 5349 Final Exam, Spring 7. Instructions: Put all answers on paper other than this exam. If you do not have paper, some will be provided to you. The exam is OPEN BOOKS, OPEN NOTES, but NO ELECTRONIC

More information

Robustness and Distribution Assumptions

Robustness and Distribution Assumptions Chapter 1 Robustness and Distribution Assumptions 1.1 Introduction In statistics, one often works with model assumptions, i.e., one assumes that data follow a certain model. Then one makes use of methodology

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators

MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators MFin Econometrics I Session 4: t-distribution, Simple Linear Regression, OLS assumptions and properties of OLS estimators Thilo Klein University of Cambridge Judge Business School Session 4: Linear regression,

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

Chapter 9. Correlation and Regression

Chapter 9. Correlation and Regression Chapter 9 Correlation and Regression Lesson 9-1/9-2, Part 1 Correlation Registered Florida Pleasure Crafts and Watercraft Related Manatee Deaths 100 80 60 40 20 0 1991 1993 1995 1997 1999 Year Boats in

More information

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University

The Bootstrap: Theory and Applications. Biing-Shen Kuo National Chengchi University The Bootstrap: Theory and Applications Biing-Shen Kuo National Chengchi University Motivation: Poor Asymptotic Approximation Most of statistical inference relies on asymptotic theory. Motivation: Poor

More information

Inference about the Indirect Effect: a Likelihood Approach

Inference about the Indirect Effect: a Likelihood Approach Discussion Paper: 2014/10 Inference about the Indirect Effect: a Likelihood Approach Noud P.A. van Giersbergen www.ase.uva.nl/uva-econometrics Amsterdam School of Economics Department of Economics & Econometrics

More information

You can compute the maximum likelihood estimate for the correlation

You can compute the maximum likelihood estimate for the correlation Stat 50 Solutions Comments on Assignment Spring 005. (a) _ 37.6 X = 6.5 5.8 97.84 Σ = 9.70 4.9 9.70 75.05 7.80 4.9 7.80 4.96 (b) 08.7 0 S = Σ = 03 9 6.58 03 305.6 30.89 6.58 30.89 5.5 (c) You can compute

More information

The Simple Regression Model. Part II. The Simple Regression Model

The Simple Regression Model. Part II. The Simple Regression Model Part II The Simple Regression Model As of Sep 22, 2015 Definition 1 The Simple Regression Model Definition Estimation of the model, OLS OLS Statistics Algebraic properties Goodness-of-Fit, the R-square

More information

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern.

STAT 501 Assignment 2 NAME Spring Chapter 5, and Sections in Johnson & Wichern. STAT 01 Assignment NAME Spring 00 Reading Assignment: Written Assignment: Chapter, and Sections 6.1-6.3 in Johnson & Wichern. Due Monday, February 1, in class. You should be able to do the first four problems

More information

appstats27.notebook April 06, 2017

appstats27.notebook April 06, 2017 Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves

More information

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017

UNIVERSITY OF MASSACHUSETTS. Department of Mathematics and Statistics. Basic Exam - Applied Statistics. Tuesday, January 17, 2017 UNIVERSITY OF MASSACHUSETTS Department of Mathematics and Statistics Basic Exam - Applied Statistics Tuesday, January 17, 2017 Work all problems 60 points are needed to pass at the Masters Level and 75

More information

Simple Linear Regression Analysis

Simple Linear Regression Analysis LINEAR REGRESSION ANALYSIS MODULE II Lecture - 6 Simple Linear Regression Analysis Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Prediction of values of study

More information

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0

Correlation. Bivariate normal densities with ρ 0. Two-dimensional / bivariate normal density with correlation 0 Correlation Bivariate normal densities with ρ 0 Example: Obesity index and blood pressure of n people randomly chosen from a population Two-dimensional / bivariate normal density with correlation 0 Correlation?

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Kerby Shedden, Ph.D., 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Share Alike 3.0 License: http://creativecommons.org/licenses/by-sa/3.0/

More information

Important note: Transcripts are not substitutes for textbook assignments. 1

Important note: Transcripts are not substitutes for textbook assignments. 1 In this lesson we will cover correlation and regression, two really common statistical analyses for quantitative (or continuous) data. Specially we will review how to organize the data, the importance

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014 ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What

More information

Econometrics of Panel Data

Econometrics of Panel Data Econometrics of Panel Data Jakub Mućk Meeting # 2 Jakub Mućk Econometrics of Panel Data Meeting # 2 1 / 26 Outline 1 Fixed effects model The Least Squares Dummy Variable Estimator The Fixed Effect (Within

More information

The Simple Linear Regression Model

The Simple Linear Regression Model The Simple Linear Regression Model Lesson 3 Ryan Safner 1 1 Department of Economics Hood College ECON 480 - Econometrics Fall 2017 Ryan Safner (Hood College) ECON 480 - Lesson 3 Fall 2017 1 / 77 Bivariate

More information

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises

LINEAR REGRESSION ANALYSIS. MODULE XVI Lecture Exercises LINEAR REGRESSION ANALYSIS MODULE XVI Lecture - 44 Exercises Dr. Shalabh Department of Mathematics and Statistics Indian Institute of Technology Kanpur Exercise 1 The following data has been obtained on

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Correlation and Linear Regression

Correlation and Linear Regression Correlation and Linear Regression Correlation: Relationships between Variables So far, nearly all of our discussion of inferential statistics has focused on testing for differences between group means

More information

Linear Regression with Multiple Regressors

Linear Regression with Multiple Regressors Linear Regression with Multiple Regressors (SW Chapter 6) Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS 4. Measures of fit 5. Sampling distribution

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Linear regression and correlation

Linear regression and correlation Faculty of Health Sciences Linear regression and correlation Statistics for experimental medical researchers 2018 Julie Forman, Christian Pipper & Claus Ekstrøm Department of Biostatistics, University

More information

Advanced Quantitative Methods: maximum likelihood

Advanced Quantitative Methods: maximum likelihood Advanced Quantitative Methods: Maximum Likelihood University College Dublin 4 March 2014 1 2 3 4 5 6 Outline 1 2 3 4 5 6 of straight lines y = 1 2 x + 2 dy dx = 1 2 of curves y = x 2 4x + 5 of curves y

More information

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y

Regression and correlation. Correlation & Regression, I. Regression & correlation. Regression vs. correlation. Involve bivariate, paired data, X & Y Regression and correlation Correlation & Regression, I 9.07 4/1/004 Involve bivariate, paired data, X & Y Height & weight measured for the same individual IQ & exam scores for each individual Height of

More information

review session gov 2000 gov 2000 () review session 1 / 38

review session gov 2000 gov 2000 () review session 1 / 38 review session gov 2000 gov 2000 () review session 1 / 38 Overview Random Variables and Probability Univariate Statistics Bivariate Statistics Multivariate Statistics Causal Inference gov 2000 () review

More information

Specification Errors, Measurement Errors, Confounding

Specification Errors, Measurement Errors, Confounding Specification Errors, Measurement Errors, Confounding Kerby Shedden Department of Statistics, University of Michigan October 10, 2018 1 / 32 An unobserved covariate Suppose we have a data generating model

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models

Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Selection Criteria Based on Monte Carlo Simulation and Cross Validation in Mixed Models Junfeng Shang Bowling Green State University, USA Abstract In the mixed modeling framework, Monte Carlo simulation

More information

STAT5044: Regression and Anova. Inyoung Kim

STAT5044: Regression and Anova. Inyoung Kim STAT5044: Regression and Anova Inyoung Kim 2 / 47 Outline 1 Regression 2 Simple Linear regression 3 Basic concepts in regression 4 How to estimate unknown parameters 5 Properties of Least Squares Estimators:

More information