Correlation analysis. Contents

Size: px
Start display at page:

Download "Correlation analysis. Contents"

Transcription

1 Correlation analysis Contents 1 Correlation analysis Distribution function and independence of random variables Measures of statistical links between two random variables Variance and correlation matrix Verification of independence Sample variance matrix Excercises 8 3 Sample correlation coefficient The mean and variance of the sample correlation coefficient The confidence interval for the correlation coefficient Testing hypotheses about the correlation coefficient Coefficient of multiple correlation The introduction of the multiple correlation coefficient The sample multiple correlation coefficient Partial correlation coefficient The introduction of the partial correlation coefficient The sample partial correlation coefficient Excercises 16

2 1 Correlation analysis In practical situations it is very common to decide what is the relationship between two or more random variables we are talking about the statistical linkage between these random variables. Correlation analysis methods are often used for a description of the intensity and its numeric expression. Regression analysis methods are used for analytical description of this linkage. Motivation 1 The figure shows the rise in prices in Taiwan in years Independent variable X represents the year of observation, the dependent variable (regressor) Y is the index describing the rise in prices. It is clear that the points follow approximately linear trend (with one exception point at 1943) which is described in the figure. The variation of points around this line is considerable. Motivation 2 The picture shows the dependence of the stopping distance of a car Y (measured in meters) and its speed X (measured at km/hr). Data was obtained when testing the quality of manufactured tires. It could be seen that the stopping distance follows the nonlinear trend and the variability of the measured values around the fitted curve is small. Simply, the task of correlation analysis is the statistical description of the size for the linkage between X and Y, and objectives of regression analysis the description of the stochastic relationship by mathematical functions. 1.1 Distribution function and independence of random variables When describing the statistical links between random variables X and Y there are two extreme situations: Deterministic linkage Independence of variables 2

3 Joint distribution function F (x, y) is equal to the probability of events X x and Y y. Thus F (x, y) = P (X x Y y). In general, it is said that the distribution function F (x, y) exhaustively describes the probabilistic behavior of the random variables X and Y. Distribution function of two-dimensional normal distribution can be taken as an example. This distribution depends on the mean values EX = µ X, EY = µ Y, variances DX = σx 2, DY = σy 2 and the parameter ρ. The distribution is a generalized form of the previously defined one-dimensional normal distribution, we can denote N 2 (µ X, µ Y, σ 2 X, σ 2 Y, ρ). A graph of the density function is the well known bell-shaped function. Figure 1: N 2 (µ X, µ Y, σ 2 X, σ2 Y, ρ) for different values of the parameters µ X, µ Y, σ 2 X, σ2 Y and ρ Definition 1.1. Random variables X and Y are independent if and only if for the joint distribution function and distribution functions F x (x) of the random variables X and F Y (y) of the random variable Y (marginal distribution functions) applies multiplicative relationship for arbitrary values of variables x and y. F (x, y) = F X (x) F Y (y) Similarly, independence can be expressed by the joint density function in the continuous case, or by the joint probability function in the discrete case. In the discrete case, when a range of values of a random variable X is at most countable set M 1 and a range of the random variable Y is at most countable set M 2, we introduce a joint probability function for X and Y by p(x, y) = P (X = x Y = y) for (x, y) M 1 M 2. 3

4 If p 1 (x) = P (X = x) and p 2 (y) = P (Y = y) are probability functions of variables X and Y, respectively, then independence of discrete random variables X and Y could be simply characterized by p(x, y) = p 1 (x)p 2 (y) for (x, y) M 1 M 2. Analogously, probabilistic behavior of the random variable can be described by the density function in the continuous case. The joint distribution function F (x, y) corresponds with the density f(x, y) which can be determined by the formula f(x, y) = 2 F (x,y) for all real x and y. x y Independence of continuous random variables X and Y can be characterized by the relation f(x, y) = f 1 (x)f 2 (y) for any real values of x and y, f 1 (x) is the density of the random variable X and f 2 (y) is the density of the random variable Y. When describing group independence of k random variables X 1, X 2,..., X k we introduce the joint distribution function F (x 1, x 2,..., x k ) = P (X 1 x 1 X 2 x 2 X k x k ). Then the random variables X 1, X 2,..., X k are independent, if and only if F (x 1, x 2,..., X k ) = F 1 (x 1 ) F 2 (x 2 ) F k (X k ), where the distribution functions on the right side are the marginal distribution functions of random variables X 1, X 2,..., x k. The k-tuple of random variables (X 1, X 2,..., X k ) is called a random vector and is denoted X. We will write random vectors as columns, i.e., X 1 X 2 X = (X 1, X 2,..., X k ) =., where (X 1, X 2,..., X k ) denotes the transposition of the vector (X 1, X 2,..., X k ). Analogously, independence can be expressed also by discrete (or continuous) random variables (X 1, X 2,..., X k ) via the joint probability function (or the joint density function). 1.2 Measures of statistical links between two random variables First, we will focus on the statistical relationship between two random variables X and Y. We will describe it using the covariance and correlation coefficient. Definition 1.2. Covariance cov(x, Y ) of random variables X and Y is defined as cov(x, Y ) = E(X EX)(Y EY ) = E(X µ X )(Y µ Y ). The covariance cov(x, Y ) of random variables takes values between σ X σ Y and σ X σ Y. For a random variable X is cov(x, X) = DX. If X and Y are independent, their covariance is zero. In case that the joint distribution of random variables X and Y is normal, cov(x, Y ) is equal to zero, if and only if the random variables X and Y are independent. Using covariance we introduce the correlation coefficient of random variables X and Y, sometimes called Pearson correlation coefficient, denoted as ρ or precisely ρ(x, Y ). Definition 1.3. The correlation coefficient is defined by the formula ρ(x, Y ) = cov(x, Y ) σ X σ Y. The correlation coefficient is the commonly used measure of statistical link between random variables X and Y. Its advantage (in comparison with the covariance) is that it takes values between 1 and 1. X k 4

5 If the value is 1, between X and Y is a direct linear relationship. The value is 1 between X and Y is indirect linear relationship. In both cases, the process of the statistical linkage between Y and X can be described by the line and the observed values of couples of X and Y lie on the line. Thus, between Y and X is the deterministic linear relationship in this situation. In the case that the value of the correlation coefficient is equal to zero, we say that the random variables X and Y are uncorrelated. For a random variable X, the correlation coefficient is ρ(x, X) = 1. For N 2 (µ X, µ Y, σx 2, σ2 Y, ρ) is the parameter ρ equal to the correlation coefficient ρ(x, Y ). The correlation coefficient ρ(x, Y ) = 0, if and only if the two variables X and Y are independent. The square of the correlation coefficient is called the coefficient of determination. Its value (expressed in percents) is denoted by d and indicates the percentage of variability of the variable Y which can be explained by the variability of the variable X. Thus d = 100ρ Variance and correlation matrix A description of statistical links between the k random variables X 1, X 2,..., X k is often done simply by describing the statistical links between pairs of these variables. Let us introduce the covariance and correlation coefficients between the variables X i and X j. Definition 1.4. The matrix DX 1 cov(x 1, X 2 ) cov(x 1, X k ) cov(x 2, X 1 ) DX 2 cov(x 2, X k ) var(x) = cov(x k, X 1 ) cov(x k, X 2 ) DX k is called the variance (covariance) matrix of a random vector X = (X 1, X 2,..., X k ). Definition 1.5. The matrix of correlation coefficients 1 ρ(x 1, X 2 ) ρ(x 1, X k ) ρ(x 2, X 1 ) 1 ρ(x 2, X k ) cor(x) = ρ(x k, X 1 ) ρ(x k, X 2 ) 1 is called the correlation matrix of a random vector X = (X 1, X 2,..., X k ). The variance-covariance matrix var(x) describes the probabilistic behavior of a random vector, similarly to the fact that the variance describes the probabilistic behavior of the random variable. The correlation matrix cor(x) describes the structure of the statistical links between studied random variables. Other correlation coefficients 5

6 For a description of the statistical links of random variable Y and a random vector X the multiple correlation coefficient ρ(y, X) can be introduced. It is a correlation coefficient between the random variable Y and the best linear prediction of Y which is obtained from the random vector X. Finally, we mention the description of statistical links between random variables Y and Z while eliminating the effect of other variables X 1, X 2,..., X k the so-called the partial correlation coefficients ρ(y, Z X). In addition, there are a number of other statistical linkages measures (e.g. Spearman s correlation coefficient, Kendall s correlation coefficient, etc.) which are used depending on what type of random variables are available. 1.4 Verification of independence Assume two random variables X and Y, and the task is to verify their independence. We accomplish the random sample, the sample size is n, from a joint distribution of random variables X and Y, denote x i and y i the observation of the pair X and Y which was identified for the i-th statistical unit, i = 1, 2,..., n. From these values we calculate: Sample mean x of X and the sample mean ȳ of Y by the formulas x = 1 n n x i and ȳ = 1 n i=1 n y i. i=1 It can be shown that E x = µ x, Eȳ = µ y. This means that the values of x and ȳ fluctuate around unknown mean values µ x, µ y and such estimates are called unbiased. Sample variances s 2 x and s 2 y by the formulas s 2 x = 1 n 1 n i=1 (x i x) 2 and s 2 y = 1 n 1 n (y i ȳ) 2. Similarly, the sample variances s 2 x and s 2 y are an unbiased estimates of σ 2 X and σ2 Y. The sample covariance s xy is given by the formula The estimate is again unbiased. s xy = 1 n 1 n (x i x)(y i ȳ). The sample correlation coefficient r xy is given by the formula i=1 r xy = s xy s x s y. This estimate is not unbiased, but for large values of n is approximately unbiased. It means that the values fluctuate around the unknown value of correlation coefficient ρ(x, Y ). i=1 6

7 Example 1.1. Suppose that the joint distribution of random variables X and Y is normal N 2 (µ X, µ Y, σx 2, σ2 Y, ρ). Then the independence is equivalent to uncorrelation. It can be verified by the statistical test which is based on the test statistic t = r xy 1 r 2 xy n 2. If t > t 1 α (n 2) we reject the hypothesis of independence for random variables X and Y 2 at the significance level α. In other words, the dependence of X and Y is statistically proven at the significance level α. Example 1.2. Assume a shift operation of a firm. We observe the number of operation hours for a production line variable X and also the monthly cost of its maintenance in thousands CZK variable Y. The results are given in the table below. The aim is to determine how the number of hours correlates with the cost of its maintenance and also to test whether the statistical link between these two variables is statistically significant. x i y i Sample variance matrix We concentrate on the calculation of the sample variance and correlation matrix of a random vector X = (X 1, X 2,..., X k ). As in the case of two random variables, we assume that for n statistical units is observed vector X. The result of these observations is a data matrix x 11 x 1k D =..... x n1 x nk In the i-th row is the observation of vector X for the i-th statistical unit and in the j-th column are observations of the variable X j for all statistical units. The sample variance matrix is a matrix var(x), where covariances cov(x i, X j ) are replaced with sample equivalents s ij. Example 1.3. The sample variance matrix is denoted S and can be determined from the formula S = 1 (I n 1 D 1n ) E D, where I stands for the identity matrix of the type n n and E is a matrix of ones with the type is n n. Example 1.4. Similarly, the sample correlation matrix is denoted by R and can be estimated by the formula R = D 1 (s 1, s 2,..., s k ) S D 1 (s 1, s 2,..., s k ), where D 1 (s 1, s 2,..., s k ) denotes the inverse matrix to the diagonal matrix D(s 1, s 2,..., s k ), elements on the main diagonal are s i = s ii, i = 1, 2..., n.. 7

8 2 Excercises 1. Random variables X and Y have the joint density { x + y for0 < x < 1, 0 < y < 1, f(x, y) = 0 otherwise. Determine the correlation coefficient ρ(x, Y ). 2. It was examined how many mg of lactic acid is in 100 ml of blood of primiparous mothers (values X i ) and their newborns (values Y i ). Obtained results are shown in the following table: X i Y i Calculate the sample correlation coefficient and determine whether there is a statistically significant difference in the amount of lactic acid in the blood of mothers and their newborns. 3. In chosen European countries was examined the relationship between an alcohol consumption (variable X) and mortality from cirrhosis of the liver (the number of deaths per inhabitants variable Y ). The data are taken from the monograph Andel: Statisticke metody. Obtained results are shown in the following table: Country FIN NOR IRL NLD SWE GBR BEL AUT DEU ITA FRA X 3,9 4,2 5,6 5,7 6,6 7,2 10,8 10,9 12,3 15,7 24,7 Y 3,6 4,3 3,4 3,7 7,2 3,0 12,3 7,0 23,7 23,6 46,1 Calculate the sample correlation coefficient and decide whether there is a statistically significant difference between the amount of alcohol consumption and mortality from the cirrhosis of the liver. 4. The table below shows the data according to the number of deaths in London (values of variable Y ) from 1 Dec to 15 Dec 1952, when London suffered a particularly severe fog. Furthermore, there are values of the variable X, which represents the average air pollution in Hall County in mg/m 3 and the values of the variable Z, which represents the average content of sulfur dioxide (the number of parts per one million). Day Y i x i z i Day Y i x i z i ,30 0, ,22 0, ,49 0, ,22 0, ,61 0, ,32 0, ,49 0, ,29 0, ,64 0, ,50 0, ,45 0, ,32 0, ,46 1, ,32 0, ,46 1,34 Determine the correlation coefficients r(x, Y ), r(x, Z) and r(y, Z) and test the hypothesis that between each pair of these variables is a statistically significant linkage. 8

9 3 Sample correlation coefficient 3.1 The mean and variance of the sample correlation coefficient The sample correlation coefficient r is defined by formula r = r xy = s xy s 2 x s 2 y and its use for the independence testing of random variables X and Y, assuming their twodimensional normal distribution, was already presented. There will be introduced its other properties and its use, especially those based on Fisher s z-transformation. First of all, let us remind the simple formula for calculating the sample correlation coefficient Xi Y i n r = XȲ ( X 2 i n X 2) ( Yi 2 nȳ 2). The basic properties of r are: 1. Assuming normality, if ρ = 0, then for n 3 Er = 0, var r = 1 n In general, there is only approximate equality Er. = ρ, thus r is not an unbiased estimate of correlation coefficient ρ. 3. The variance of sample correlation coefficient is var r. = (1 ρ2 ) 2 n and it can be shown that for an increasing sample size n, the variance var r converges to zero. Definition 3.1. The variable Z given by the formula Z = 1 2 ln 1 + r 1 r is called the Fisher s z-transformation of the correlation coefficient r. Fisher s z-transformation of the correlation coefficient r has the mean value EZ. = 1 2 ln 1 + ρ 1 ρ, where ρ is the correlation coefficient of the distribution from which the random sample was taken. The variance of the variable Z is approximately var r. = 1 n 3. 9

10 The distribution of variable Z approaches a normal distribution with increasing n and and Then the random variable EZ. = 1 2 ln 1 + ρ 1 ρ var r. = 1 n 3. U = n 3(Z EZ) has approximately the normal distribution N(0, 1). 3.2 The confidence interval for the correlation coefficient We can also construct (approximately) the confidence interval for ρ via the z-transformation. If ρ is the actual value of the correlation coefficient, then [ n P 3 Z 1 2 ln 1 + ρ ].= 1 ρ < u 1 α 2 1 α. From this formula can be derived the 100(1 α)% confidence interval for ρ ( D 1 D + 1, H 1 ), H + 1 where [ D = exp 2Z 2u ] [ 1 α 2, H = exp 2Z + 2u ] 1 α 2. n 3 n Testing hypotheses about the correlation coefficient To test the hypothesis H 0 : ρ = ρ 0, where ρ 0 ( 1, 1) is a given number, against the alternative H 1 : ρ ρ 0, at first we calculate Z = 1 2 ln 1 + r 1 r, z 0 = 1 2 ln 1 + ρ 0 1 ρ 0. Under the condition of validity H 0 the random variable U = n 3(Z z 0 ) has approximately normal distribution N(0, 1). Thus H 0 is rejected if U u 1 α 2. If there are two independent random samples from the bivariate normal distributions with correlation coefficients ρ 1 and ρ 2, the testing of the hypothesis H 0 : ρ 1 = ρ 2 against the alternative H 1 : ρ 1 ρ 2, can be based on statistics U = Z 1 Z n 1 3 n 2 3, where Z i = 1 2 ln 1 + r i for i = 1, 2. 1 r i If U u 1 α, then the null hypothesis H 0 is reject at the significance level α. 2 10

11 If there are k independent random samples from bivariate normal distributions with correlation coefficients ρ i for i = 1,..., k, the testing of the null hypothesis H 0 : ρ 1 = ρ 2 = = ρ k against the alternative, that H 0 does not apply, can be based on statistics Q = k (n i 3)(Z i b) 2, i=1 where b = n 3k k i=1 (n i 3)Z i and Z i = 1 2 ln 1 + r i 1 r i for i = 1, 2..., k. Then H 0 is rejected at the significance level α, if Q > χ 2 1 α. Example 3.1. It was examined how many mg of lactic acid is in 100 ml of blood of primiparous mothers (values X i ) and their newborns (values Y i ). Obtained results are shown in the following table: X i Y i Determine the 95% confidence interval for ρ and using Fisher s z-transformation test the hypothesis H 0 : ρ = 0 against the alternative that H 0 does not apply. 11

12 4 Coefficient of multiple correlation 4.1 The introduction of the multiple correlation coefficient The commonly used correlation coefficient ρ measures dependence between two random variables. But there are many situations where it is necessary to describe the relationship between the random variable Y and the random vector X = (X 1,..., X k ). Definition 4.1. Let V = var X be a regular matrix and P = cor X the correlation matrix. Further, let cor(y, X) = (ρ Y X1, ρ Y X2,... ρ Y Xk ) be the vector of correlation coefficients (so called the correlation matrix Y and X). Then the coefficient of multiple correlation ρ 2 Y,X is introduced by the formula ρ 2 Y,X = cor(y, X)P 1 (cor(y, X)). Theorem 4.1. The multiple correlation coefficient ρ Y,X is the greatest of all correlation coefficients between Y and any nonzero linear function of the vector X. The statement very well describes the meaning of the multiple correlation coefficient. It also describes the way in which the multiple correlation coefficient measures the statistical link between Y and X. The theorem also ensures that the coefficient ρ Y,X is never less than absolute value of any correlation coefficient ρ Y,Xi, i = 1,..., k. This is widely used as a rule for checking the calculations. There are very often cases when the vector X has 2 components. Let us simplify the notation: X 0 = Y. Then ρ ij, i, j = 0, 1, 2, is the correlation coefficient between X i and X j and instead of ρ Y,X it is written ρ 0.1,2. Substituting into the theorem we get ρ 2 0.1,2 = ρ ρ ρ 01 ρ 02 ρ ρ 2 12 The coefficient ρ 0.1,2 measures the total dependence of the variable Y on the vector (X 1, X 2 ). It should be emphasized that this dependency can be very strong, even if the dependence of Y on each component X 1, X 2 separately is small. We illustrate the mentioned fact on an example. Example 4.1. Let ρ 01 = 0 and ρ Then ρ 0.1,2 = ρ ρ 2 12 It is clear from the previous theorem that the fraction ρ 2 02/(1 ρ 2 12) is never greater than 1. But the number ρ 2 0.1,2 can be arbitrarily close to 1, if ρ 2 12 is sufficiently large. More specifically, it can be explained as follows. Let Y and X 1 are independent variables, put X 2 = Y X 1 and denote var Y = σ 2 0, var X 1 = σ 2 1. Then ρ 01 = ρ 02 = (σ σ 2 1) 1/2 σ 0, ρ 12 = (σ σ 2 1) 1/2 σ 1, ρ 2 0.1,2 = 1. If σ 0 is sufficiently small and σ 1 large enough, ρ 02 will have arbitrarily small value. Thus the variable Y is not correlated with X 1 at all, with X 2 very few, but Y is via two variables X 1 and X 2 determined unambiguously, because Y = X 1 + X 2. 12

13 4.2 The sample multiple correlation coefficient The sample multiple correlation coefficient R Y,X will be introduced with the same formula as for the multiple correlation coefficient ρ Y,X, only instead of correlation coefficients will be used sample correlation coefficients. Definition 4.2. The sample multiple correlation coefficient is defined R 2 Y,X = R Y,X R 1 (R Y,X ). where R is a matrix of sample correlation coefficients between the components of the vector X and R Y,X is the vector of sample correlation coefficients between Y and components of vector X. Theorem 4.2. Assume that the multiple correlation coefficient is determined by samples from the (k + 1)-dimensional normal distribution (i.e. k is the number of components X), which has the multiple correlation coefficient ρ Y,X equal to zero. Then the statistics Z = n k 1 k R 2 Y,X 1 R 2 Y,X has the Fisher-Snedecor s distribution F with k and n k 1 degrees of freedom. The null hypothesis H 0 that the multiple correlation coefficient ρ Y,X is equal to zero can be tested against the alternative that H 0 does not apply using the statistics Z. The null hypothesis H 0 is rejected at the significance level α, when Z > F 1 α (k, n k 1). 13

14 5 Partial correlation coefficient 5.1 The introduction of the partial correlation coefficient The correlation coefficient measures the dependence of two random variables. This dependence may not be causal and it could be influenced by other associated parameters. Consider two random variables Y and Z and a random vector X = (X 1,..., X k ). In general, the vector X can influence Y and Z. But the question is, what would be the relationship between Y and Z if the vector X did not influence these two variables. One solution is to create a situation where the vector X remains constant. However, in most cases it is impossible, and it requires the mathematical solution. Let us imagine that the effect of the vector X is subtracted from the variable Y formally expressed this means that we subtract the best linear prediction of Y determined by the vector X. Similarly, we "clean" variable Z from the influence of the vector X. Then we calculate the correlation coefficient between modified (in described manner) random variables. Definition 5.1. This correlation coefficient is called the partial correlation coefficient between Y and Z, under the assumption of given vector X. Let us denote it ρ Y,Z.X and define by the formula ρ Y,Z.X = ρ Y Z cor(y, X)P 1 cor(x, Z) [1 cor(y, X)P 1 cor(x, Y )] [1 cor(z, X)P 1 cor(x, Z)]. Unlike the multiple correlation coefficient there is not any inequality between the ordinary correlation coefficient and the partial correlation coefficient. Therefore, ρ Y,Z.X can sometimes be lower and sometimes greater than ρ Y Z. If n = 1, so that the vector X has only one component, the partial correlation coefficient ρ Y,Z.X can be written as ρ Y,Z.X, ρ Y,Z.X = ρ Y Z cor(y, X)P 1 cor(x, Z) [1 cor(y, X)P 1 cor(x, Y )] [1 cor(z, X)P 1 cor(x, Z)]. Example 5.1. At the end of the last century it was examined the dependence of a crop hay (variable Y, in England, measured in quintals per acre) on the temperature (variable Z, which is equal to the sum of day temperatures above 42 F =. 5.6 C during the considered spring period). It was found that ρ Y Z = The negative correlation is contrary to the the experience in that area, i.e. grass is growing more if the weather is warmer. Then was taken into account also spring precipitation (value X expressed in inches) and the paradox was explained. Other correlation coefficients were ρ XY = 0.80, ρ XZ = This is consistent with the fact that for higher pouring grow more grass, but if it rains more, the weather is colder. Then ρ Y,Z.X = 0.10 if the influence of precipitation is excluded, the dependency between Y and Z is positive, although not too large. 14

15 5.2 The sample partial correlation coefficient Definition 5.2. Assume the random sample with sample size n from the joint k +2 dimensional distribution of random variables Y, Z and the random vector X. The sample partial correlation coefficient R Y,Z.X is given by the formula for the partial correlation coefficient ρ Y,Z.X = ρ Y Z cor(y, X)P 1 cor(x, Z) [1 cor(y, X)P 1 cor(x, Y )] [1 cor(z, X)P 1 cor(x, Z)], but instead of theoretical correlation coefficients there are used sample correlation coefficients from the sample correlation matrix. For testing a zero value of the partial correlation coefficient could be used the test statistic T = R2 Y,Z.X 1 R 2 Y,Z.X n k 2. Statistics T has the Student s distribution t(n k) under the condition that the considered random sample is from the joint k + 2 dimensional normal distribution with ρ Y,Z.X = 0 and n > k + 2. Thus, the null hypothesis ρ Y,Z.X = 0 is rejected at the significance level α, if T > t 1 α (n k). 2 15

16 6 Excercises 1. In chosen European countries was examined the relationship between an alcohol consumption (variable X) and a mortality from cirrhosis of the liver (the number of deaths per inhabitants variable Y ). The data are taken from the monograph Andel: Statisticke metody. Obtained results are shown in the following table: Country FIN NOR IRL NLD SWE GBR BEL AUT DEU ITA FRA X 3,9 4,2 5,6 5,7 6,6 7,2 10,8 10,9 12,3 15,7 24,7 Y 3,6 4,3 3,4 3,7 7,2 3,0 12,3 7,0 23,7 23,6 46,1 Calculate the interval estimate of the theoretical correlation coefficient and using the z- transformation decide whether there is a statistically significant difference between the amount of alcohol consumption and mortality from the cirrhosis of the liver. 2. It was examined how many mg of lactic acid is in 100 ml of blood of primiparous mothers (values X i ) and their newborns (values Y i ). Obtained results are shown in the following table: X i Y i (a) Using Fisher s z-transformation decide whether there is a statistically significant difference in the amount of lactic acid in the blood of mothers and their newborns. (b) Determine the 95% confidence interval for the theoretical correlation coefficient ρ. 3. Medical research has dealt with the concentrations monitoring of compounds A and B in the urine of patients with kidney disease. For 142 people suffering from disease type 1 was the sample correlation coefficient between compounds A and B equal to r 1 = 0.37 and for 175 people suffering from diseases type 2 was the sample correlation coefficient equal to r 2 = For 100 healthy subjects was selective correlation coefficient between the concentrations of both compounds A and B equal to r 3 = Using Fisher s z- transform: (a) For each pair of indexes (i, j) = (1, 2), (1, 3), (2, 3), test the null hypothesis that the correlation coefficients r i and r j correspond to the pair of samples with the same theoretical correlation coefficients, i.e. the hypothesis that ρ i = ρ j. (b) Determine a 95% confidence interval for each correlation coefficient ρ i i = 1, 2, 3. (c) Test the hypothesis H 0 : ρ 1 = ρ 2 = ρ The table below shows the data according to the number of deaths in London (values of variable Y ) from 1 Dec to 15 Dec 1952, when London suffered a particularly severe fog. Furthermore, there are values of the variable X, which represents the average air pollution in Hall County in mg/m 3 and the values of the variable Z, which represents the average content of sulfur dioxide (the number of parts per one million). 16

17 Day Y i x i z i Day Y i x i z i ,30 0, ,22 0, ,49 0, ,22 0, ,61 0, ,32 0, ,49 0, ,29 0, ,64 0, ,50 0, ,45 0, ,32 0, ,46 1, ,32 0, ,46 1,34 (a) Determine the confidence intervals for the correlation coefficients ρ(x, Y ), ρ(x, Z) and ρ(y, Z). (b) Using the z-transformation test the hypothesis that between each pair of these variables is a statistically significant linkage. (c) Determine the coefficient of multiple correlation r Y,(X,Z). (d) Test the hypothesis that the theoretical coefficient of multiple correlation ρ Y,(X,Z) = 0 and decide whether Y depends on the random vector (X, Z). Interpret test results. (e) Determine the partial correlation coefficient r Y,X.Z and the partial correlation coefficient r Y,Z.X. (f) Test the hypothesis that the partial correlation coefficient ρ Y,X.Z = 0 Interpret the results. (g) Test the hypothesis that the partial correlation coefficient ρ Y,Z.X = 0 Interpret the test result. 17

Joint Gaussian Graphical Model Review Series I

Joint Gaussian Graphical Model Review Series I Joint Gaussian Graphical Model Review Series I Probability Foundations Beilun Wang Advisor: Yanjun Qi 1 Department of Computer Science, University of Virginia http://jointggm.org/ June 23rd, 2017 Beilun

More information

18 Bivariate normal distribution I

18 Bivariate normal distribution I 8 Bivariate normal distribution I 8 Example Imagine firing arrows at a target Hopefully they will fall close to the target centre As we fire more arrows we find a high density near the centre and fewer

More information

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013

Applied Regression. Applied Regression. Chapter 2 Simple Linear Regression. Hongcheng Li. April, 6, 2013 Applied Regression Chapter 2 Simple Linear Regression Hongcheng Li April, 6, 2013 Outline 1 Introduction of simple linear regression 2 Scatter plot 3 Simple linear regression model 4 Test of Hypothesis

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

Introduction to bivariate analysis

Introduction to bivariate analysis Introduction to bivariate analysis When one measurement is made on each observation, univariate analysis is applied. If more than one measurement is made on each observation, multivariate analysis is applied.

More information

WISE International Masters

WISE International Masters WISE International Masters ECONOMETRICS Instructor: Brett Graham INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This examination paper contains 32 questions. You are

More information

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2

Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 You can t see this text! Introduction to Computational Finance and Financial Econometrics Probability Review - Part 2 Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Probability Review - Part 2 1 /

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

Statistical Hypothesis Testing

Statistical Hypothesis Testing Statistical Hypothesis Testing Dr. Phillip YAM 2012/2013 Spring Semester Reference: Chapter 7 of Tests of Statistical Hypotheses by Hogg and Tanis. Section 7.1 Tests about Proportions A statistical hypothesis

More information

Bivariate Paired Numerical Data

Bivariate Paired Numerical Data Bivariate Paired Numerical Data Pearson s correlation, Spearman s ρ and Kendall s τ, tests of independence University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

Multivariate probability distributions and linear regression

Multivariate probability distributions and linear regression Multivariate probability distributions and linear regression Patrik Hoyer 1 Contents: Random variable, probability distribution Joint distribution Marginal distribution Conditional distribution Independence,

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

MULTIVARIATE PROBABILITY DISTRIBUTIONS

MULTIVARIATE PROBABILITY DISTRIBUTIONS MULTIVARIATE PROBABILITY DISTRIBUTIONS. PRELIMINARIES.. Example. Consider an experiment that consists of tossing a die and a coin at the same time. We can consider a number of random variables defined

More information

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline. Practitioner Course: Portfolio Optimization September 10, 2008 Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x,

More information

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses.

Review: General Approach to Hypothesis Testing. 1. Define the research question and formulate the appropriate null and alternative hypotheses. 1 Review: Let X 1, X,..., X n denote n independent random variables sampled from some distribution might not be normal!) with mean µ) and standard deviation σ). Then X µ σ n In other words, X is approximately

More information

Review of probability and statistics 1 / 31

Review of probability and statistics 1 / 31 Review of probability and statistics 1 / 31 2 / 31 Why? This chapter follows Stock and Watson (all graphs are from Stock and Watson). You may as well refer to the appendix in Wooldridge or any other introduction

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx

f X, Y (x, y)dx (x), where f(x,y) is the joint pdf of X and Y. (x) dx INDEPENDENCE, COVARIANCE AND CORRELATION Independence: Intuitive idea of "Y is independent of X": The distribution of Y doesn't depend on the value of X. In terms of the conditional pdf's: "f(y x doesn't

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Homework 10 (due December 2, 2009)

Homework 10 (due December 2, 2009) Homework (due December, 9) Problem. Let X and Y be independent binomial random variables with parameters (n, p) and (n, p) respectively. Prove that X + Y is a binomial random variable with parameters (n

More information

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R In probabilistic models, a random variable is a variable whose possible values are numerical outcomes of a random phenomenon. As a function or a map, it maps from an element (or an outcome) of a sample

More information

ECON 4160, Autumn term Lecture 1

ECON 4160, Autumn term Lecture 1 ECON 4160, Autumn term 2017. Lecture 1 a) Maximum Likelihood based inference. b) The bivariate normal model Ragnar Nymoen University of Oslo 24 August 2017 1 / 54 Principles of inference I Ordinary least

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

STAT 4385 Topic 03: Simple Linear Regression

STAT 4385 Topic 03: Simple Linear Regression STAT 4385 Topic 03: Simple Linear Regression Xiaogang Su, Ph.D. Department of Mathematical Science University of Texas at El Paso xsu@utep.edu Spring, 2017 Outline The Set-Up Exploratory Data Analysis

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Simple Linear Regression Estimation and Properties

Simple Linear Regression Estimation and Properties Simple Linear Regression Estimation and Properties Outline Review of the Reading Estimate parameters using OLS Other features of OLS Numerical Properties of OLS Assumptions of OLS Goodness of Fit Checking

More information

ENGG2430A-Homework 2

ENGG2430A-Homework 2 ENGG3A-Homework Due on Feb 9th,. Independence vs correlation a For each of the following cases, compute the marginal pmfs from the joint pmfs. Explain whether the random variables X and Y are independent,

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Evaluation. Andrea Passerini Machine Learning. Evaluation

Evaluation. Andrea Passerini Machine Learning. Evaluation Andrea Passerini passerini@disi.unitn.it Machine Learning Basic concepts requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain

More information

18.440: Lecture 26 Conditional expectation

18.440: Lecture 26 Conditional expectation 18.440: Lecture 26 Conditional expectation Scott Sheffield MIT 1 Outline Conditional probability distributions Conditional expectation Interpretation and examples 2 Outline Conditional probability distributions

More information

5 Operations on Multiple Random Variables

5 Operations on Multiple Random Variables EE360 Random Signal analysis Chapter 5: Operations on Multiple Random Variables 5 Operations on Multiple Random Variables Expected value of a function of r.v. s Two r.v. s: ḡ = E[g(X, Y )] = g(x, y)f X,Y

More information

Stat 704 Data Analysis I Probability Review

Stat 704 Data Analysis I Probability Review 1 / 39 Stat 704 Data Analysis I Probability Review Dr. Yen-Yi Ho Department of Statistics, University of South Carolina A.3 Random Variables 2 / 39 def n: A random variable is defined as a function that

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 MA 575 Linear Models: Cedric E Ginestet, Boston University Revision: Probability and Linear Algebra Week 1, Lecture 2 1 Revision: Probability Theory 11 Random Variables A real-valued random variable is

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

conditional cdf, conditional pdf, total probability theorem?

conditional cdf, conditional pdf, total probability theorem? 6 Multiple Random Variables 6.0 INTRODUCTION scalar vs. random variable cdf, pdf transformation of a random variable conditional cdf, conditional pdf, total probability theorem expectation of a random

More information

Evaluation requires to define performance measures to be optimized

Evaluation requires to define performance measures to be optimized Evaluation Basic concepts Evaluation requires to define performance measures to be optimized Performance of learning algorithms cannot be evaluated on entire domain (generalization error) approximation

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Bivariate Distributions

Bivariate Distributions STAT/MATH 395 A - PROBABILITY II UW Winter Quarter 17 Néhémy Lim Bivariate Distributions 1 Distributions of Two Random Variables Definition 1.1. Let X and Y be two rrvs on probability space (Ω, A, P).

More information

Multivariate Random Variable

Multivariate Random Variable Multivariate Random Variable Author: Author: Andrés Hincapié and Linyi Cao This Version: August 7, 2016 Multivariate Random Variable 3 Now we consider models with more than one r.v. These are called multivariate

More information

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47

ECON2228 Notes 2. Christopher F Baum. Boston College Economics. cfb (BC Econ) ECON2228 Notes / 47 ECON2228 Notes 2 Christopher F Baum Boston College Economics 2014 2015 cfb (BC Econ) ECON2228 Notes 2 2014 2015 1 / 47 Chapter 2: The simple regression model Most of this course will be concerned with

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Chapter 5 continued. Chapter 5 sections

Chapter 5 continued. Chapter 5 sections Chapter 5 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16) 1 2 Model Consider a system of two regressions y 1 = β 1 y 2 + u 1 (1) y 2 = β 2 y 1 + u 2 (2) This is a simultaneous equation model

More information

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u

So far our focus has been on estimation of the parameter vector β in the. y = Xβ + u Interval estimation and hypothesis tests So far our focus has been on estimation of the parameter vector β in the linear model y i = β 1 x 1i + β 2 x 2i +... + β K x Ki + u i = x iβ + u i for i = 1, 2,...,

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1

Chapter 10. Correlation and Regression. McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Correlation and Regression McGraw-Hill, Bluman, 7th ed., Chapter 10 1 Chapter 10 Overview Introduction 10-1 Scatter Plots and Correlation 10- Regression 10-3 Coefficient of Determination and

More information

Statistics. Statistics

Statistics. Statistics The main aims of statistics 1 1 Choosing a model 2 Estimating its parameter(s) 1 point estimates 2 interval estimates 3 Testing hypotheses Distributions used in statistics: χ 2 n-distribution 2 Let X 1,

More information

Lecture 2: Review of Probability

Lecture 2: Review of Probability Lecture 2: Review of Probability Zheng Tian Contents 1 Random Variables and Probability Distributions 2 1.1 Defining probabilities and random variables..................... 2 1.2 Probability distributions................................

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei)

UCSD ECE153 Handout #34 Prof. Young-Han Kim Tuesday, May 27, Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei) UCSD ECE53 Handout #34 Prof Young-Han Kim Tuesday, May 7, 04 Solutions to Homework Set #6 (Prepared by TA Fatemeh Arbabjolfaei) Linear estimator Consider a channel with the observation Y XZ, where the

More information

Review of Basic Probability Theory

Review of Basic Probability Theory Review of Basic Probability Theory James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) 1 / 35 Review of Basic Probability Theory

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model

Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Wooldridge, Introductory Econometrics, 4th ed. Chapter 2: The simple regression model Most of this course will be concerned with use of a regression model: a structure in which one or more explanatory

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A

WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, Academic Year Exam Version: A WISE MA/PhD Programs Econometrics Instructor: Brett Graham Spring Semester, 2016-17 Academic Year Exam Version: A INSTRUCTIONS TO STUDENTS 1 The time allowed for this examination paper is 2 hours. 2 This

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n

P (x). all other X j =x j. If X is a continuous random vector (see p.172), then the marginal distributions of X i are: f(x)dx 1 dx n JOINT DENSITIES - RANDOM VECTORS - REVIEW Joint densities describe probability distributions of a random vector X: an n-dimensional vector of random variables, ie, X = (X 1,, X n ), where all X is are

More information

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics.

Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics. Bus 216: Business Statistics II Introduction Business statistics II is purely inferential or applied statistics. Study Session 1 1. Random Variable A random variable is a variable that assumes numerical

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation)

MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) MATHEMATICS 154, SPRING 2009 PROBABILITY THEORY Outline #11 (Tail-Sum Theorem, Conditional distribution and expectation) Last modified: March 7, 2009 Reference: PRP, Sections 3.6 and 3.7. 1. Tail-Sum Theorem

More information

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual

Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual Mock Exam - 2 hours - use of basic (non-programmable) calculator is allowed - all exercises carry the same marks - exam is strictly individual Question 1. Suppose you want to estimate the percentage of

More information

Multivariate Distribution Models

Multivariate Distribution Models Multivariate Distribution Models Model Description While the probability distribution for an individual random variable is called marginal, the probability distribution for multiple random variables is

More information

Probability and Statistics Notes

Probability and Statistics Notes Probability and Statistics Notes Chapter Five Jesse Crawford Department of Mathematics Tarleton State University Spring 2011 (Tarleton State University) Chapter Five Notes Spring 2011 1 / 37 Outline 1

More information

Lecture 2: Repetition of probability theory and statistics

Lecture 2: Repetition of probability theory and statistics Algorithms for Uncertainty Quantification SS8, IN2345 Tobias Neckel Scientific Computing in Computer Science TUM Lecture 2: Repetition of probability theory and statistics Concept of Building Block: Prerequisites:

More information

Bivariate Distributions. Discrete Bivariate Distribution Example

Bivariate Distributions. Discrete Bivariate Distribution Example Spring 7 Geog C: Phaedon C. Kyriakidis Bivariate Distributions Definition: class of multivariate probability distributions describing joint variation of outcomes of two random variables (discrete or continuous),

More information

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance

Covariance. Lecture 20: Covariance / Correlation & General Bivariate Normal. Covariance, cont. Properties of Covariance Covariance Lecture 0: Covariance / Correlation & General Bivariate Normal Sta30 / Mth 30 We have previously discussed Covariance in relation to the variance of the sum of two random variables Review Lecture

More information

STAT/MATH 395 PROBABILITY II

STAT/MATH 395 PROBABILITY II STAT/MATH 395 PROBABILITY II Bivariate Distributions Néhémy Lim University of Washington Winter 2017 Outline Distributions of Two Random Variables Distributions of Two Discrete Random Variables Distributions

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v }

Statistics 351 Probability I Fall 2006 (200630) Final Exam Solutions. θ α β Γ(α)Γ(β) (uv)α 1 (v uv) β 1 exp v } Statistics 35 Probability I Fall 6 (63 Final Exam Solutions Instructor: Michael Kozdron (a Solving for X and Y gives X UV and Y V UV, so that the Jacobian of this transformation is x x u v J y y v u v

More information

Lecture 25: Review. Statistics 104. April 23, Colin Rundel

Lecture 25: Review. Statistics 104. April 23, Colin Rundel Lecture 25: Review Statistics 104 Colin Rundel April 23, 2012 Joint CDF F (x, y) = P [X x, Y y] = P [(X, Y ) lies south-west of the point (x, y)] Y (x,y) X Statistics 104 (Colin Rundel) Lecture 25 April

More information

Bivariate distributions

Bivariate distributions Bivariate distributions 3 th October 017 lecture based on Hogg Tanis Zimmerman: Probability and Statistical Inference (9th ed.) Bivariate Distributions of the Discrete Type The Correlation Coefficient

More information

Statistics, Data Analysis, and Simulation SS 2015

Statistics, Data Analysis, and Simulation SS 2015 Statistics, Data Analysis, and Simulation SS 2015 08.128.730 Statistik, Datenanalyse und Simulation Dr. Michael O. Distler Mainz, 27. April 2015 Dr. Michael O. Distler

More information

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2

, 0 x < 2. a. Find the probability that the text is checked out for more than half an hour but less than an hour. = (1/2)2 Math 205 Spring 206 Dr. Lily Yen Midterm 2 Show all your work Name: 8 Problem : The library at Capilano University has a copy of Math 205 text on two-hour reserve. Let X denote the amount of time the text

More information

Statistics Introductory Correlation

Statistics Introductory Correlation Statistics Introductory Correlation Session 10 oscardavid.barrerarodriguez@sciencespo.fr April 9, 2018 Outline 1 Statistics are not used only to describe central tendency and variability for a single variable.

More information

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty

STATISTICS 3A03. Applied Regression Analysis with SAS. Angelo J. Canty STATISTICS 3A03 Applied Regression Analysis with SAS Angelo J. Canty Office : Hamilton Hall 209 Phone : (905) 525-9140 extn 27079 E-mail : cantya@mcmaster.ca SAS Labs : L1 Friday 11:30 in BSB 249 L2 Tuesday

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Psychology 282 Lecture #4 Outline Inferences in SLR

Psychology 282 Lecture #4 Outline Inferences in SLR Psychology 282 Lecture #4 Outline Inferences in SLR Assumptions To this point we have not had to make any distributional assumptions. Principle of least squares requires no assumptions. Can use correlations

More information

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review

Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review You can t see this text! Introduction to Computational Finance and Financial Econometrics Matrix Algebra Review Eric Zivot Spring 2015 Eric Zivot (Copyright 2015) Matrix Algebra Review 1 / 54 Outline 1

More information

Ref.: Spring SOS3003 Applied data analysis for social science Lecture note

Ref.:   Spring SOS3003 Applied data analysis for social science Lecture note SOS3003 Applied data analysis for social science Lecture note 05-2010 Erling Berge Department of sociology and political science NTNU Spring 2010 Erling Berge 2010 1 Literature Regression criticism I Hamilton

More information

Chapter 5. Chapter 5 sections

Chapter 5. Chapter 5 sections 1 / 43 sections Discrete univariate distributions: 5.2 Bernoulli and Binomial distributions Just skim 5.3 Hypergeometric distributions 5.4 Poisson distributions Just skim 5.5 Negative Binomial distributions

More information

Applied Econometrics - QEM Theme 1: Introduction to Econometrics Chapter 1 + Probability Primer + Appendix B in PoE

Applied Econometrics - QEM Theme 1: Introduction to Econometrics Chapter 1 + Probability Primer + Appendix B in PoE Applied Econometrics - QEM Theme 1: Introduction to Econometrics Chapter 1 + Probability Primer + Appendix B in PoE Warsaw School of Economics Outline 1. Introduction to econometrics 2. Denition of econometrics

More information

Chapter Eight: Assessment of Relationships 1/42

Chapter Eight: Assessment of Relationships 1/42 Chapter Eight: Assessment of Relationships 1/42 8.1 Introduction 2/42 Background This chapter deals, primarily, with two topics. The Pearson product-moment correlation coefficient. The chi-square test

More information

MAS223 Statistical Inference and Modelling Exercises

MAS223 Statistical Inference and Modelling Exercises MAS223 Statistical Inference and Modelling Exercises The exercises are grouped into sections, corresponding to chapters of the lecture notes Within each section exercises are divided into warm-up questions,

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima

Hypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s

More information

Theory of probability and mathematical statistics

Theory of probability and mathematical statistics Theory of probability and mathematical statistics Tomáš Mrkvička Bibliography [1] J. [2] J. Andďż l: Matematickďż statistika, SNTL/ALFA, Praha 1978 Andďż l: Statistickďż metody, Matfyzpress, Praha 1998

More information

Unit 10: Simple Linear Regression and Correlation

Unit 10: Simple Linear Regression and Correlation Unit 10: Simple Linear Regression and Correlation Statistics 571: Statistical Methods Ramón V. León 6/28/2004 Unit 10 - Stat 571 - Ramón V. León 1 Introductory Remarks Regression analysis is a method for

More information

A Probability Review

A Probability Review A Probability Review Outline: A probability review Shorthand notation: RV stands for random variable EE 527, Detection and Estimation Theory, # 0b 1 A Probability Review Reading: Go over handouts 2 5 in

More information