School of Education, Culture and Communication Division of Applied Mathematics

Size: px
Start display at page:

Download "School of Education, Culture and Communication Division of Applied Mathematics"

Transcription

1 School of Education, Culture and Communication Division of Applied Mathematics MASTER THESIS IN MATHEMATICS / APPLIED MATHEMATICS Estimation and Testing the Quotient of Two Models by Marko Dimitrov Masterarbete i matematik / tillämpad matematik DIVISION OF APPLIED MATHEMATICS MÄLARDALEN UNIVERSITY SE VÄSTERÅS, SWEDEN

2 School of Education, Culture and Communication Division of Applied Mathematics Master thesis in mathematics / applied mathematics Date: Project name: Estimation and Testing the Quotient of Two Models Author: Marko Dimitrov Supervisor(s): Christopher Engström Reviewer: Milica Rančíć Examiner: Sergei Silvestrov Comprising: 15 ECTS credits

3 Abstract In the thesis, we introduce linear regression models such as Simple Linear Regression, Multiple Regression, and Polynomial Regression. We explain basic methods of the model parameters estimation, Ordinary Least Squares (OLS) and Maximum Likelihood Estimation (MLE). The properties of the estimates, and what assumptions need to be made for the model for the estimates to be the Best Linear Unbiased Estimates (BLUE) are given. The basic Bootstrap methods are introduced. The real world problem is simulated in order to see how measurement error affects the quotient of two estimated models.

4 Acknowledgments I would like to thank my supervisor Senior Lecturer Christopher Engström of the School of Education, Culture and Communication at Mälardalen University. Prof. Engström s consistently allowed this paper to be my own work but steered me in the right direction whenever he thought I needed it. I would also like to thank Prof. Dr. Miodrag Ðor dević who was involved in the validation survey for this master thesis. Without his participation and input, the validation survey could not have been successfully conducted. I would also like to acknowledge Senior Lecturer Milica Rančíć, School of Education, Culture and Communication at Mälardalen University as the reviewer, and I am gratefully indebted to her for her very valuable comments on this thesis. The data used in the master thesis comes from ship log data gathered at Qtagg AB, from one ship gathered over roughly half a month and I wish to acknowledge Qtagg AB for the data. Finally, I must express my very profound gratitude to my friends and girlfriend for providing me with unfailing support and continuous encouragement throughout my years of study and through the process of researching and writing this thesis. This accomplishment would not have been possible without them. Thank you. Author: Marko Dimitrov

5 Contents List of Figures 3 List of Tables 4 Introduction 7 1 Simple Linear Regression The Model Estimation of the Model Parameters Ordinary Least Squares Properties of the Ordinary Least Square Estimators An Estimator of the Variance and Estimated Variances Hypothesis Testing, Confidence Intervals and t-test The Coefficient of Determination Maximum Likelihood Estimation Multiple Regression The Model Estimation of the Model Parameters Ordinary Least Squares Properties of the Ordinary Least Squares Estimators An Estimator of the Variance and Estimated Variance Maximum Likelihood Estimation Properties of the Maximum Likelihood Estimators Polynomial Regression Orthogonal Polynomials The Bootstrap Introduction Statistics The Bootstrap Estimates Parametric Simulation Approximations Non-parametric Simulation

6 3.5 Confidence Intervals Simulation and Evaluation Mathematical Description of the Problem An Analogy With the Real World Parameter Estimation Confidence Intervals True Values of the Quotient Evaluation of the Results Discussion Conclusions Future Work Fulfillment of Thesis Objectives A Definitions 51 A.1 Linear Algebra A.2 Matrix Calculus A.3 Statistics B Probability Distributions 53 B.1 Binomial Distribution B.2 Uniform Distribution B.3 Generalized Pareto Distribution B.4 Normal Distribution B.5 Log-normal Distribution B.6 Gamma Distribution B.7 Student Distribution B.8 Chi-Square Distribution Bibliography 57 Index 58 3

7 List of Figures 4.1 The data (velocity) without measurement errors Case 1 - The data (fuel efficiency) without measurement errors Case 2 - The data (fuel efficiency) without measurement errors Case 3 - The data (fuel efficiency) without measurement errors A sample of data taken from the Uniform Distribution A sample of data taken from the Generalized Pareto Distribution A sample of data taken from the Normal Distribution A sample of data taken from the Log-normal Distribution A sample of data taken from the Gamma Distribution A sample of data taken from the Student s t-distribution A sample of data taken from the Gamma Distribution

8 List of Tables 4.1 Table of the confidence intervals for the mean of the quotient

9 Abbreviations and Acronyms SS R Sum of Squares due to Regression. 9 SS T Total Sum of Squares. 9 BLUE Best Linear Unbiased Estimates. 7 MLE Maximum Likelihood Estimation. 9 OLS Ordinary Least Squares. 5 RSS Residual Sum of Squares. 5, 16 CDF Cumulative Density Function. 21 EDF Empirical Distribution Function. 21 PDF Probability Density Function. 21 i.i.d. Independent and Identically Distributed. 21 df Degrees of Freedom. 7 6

10 Introduction Regression analysis is a statistical technique used for analyzing data, and for finding a relationship between two or more variables. Behind the regression analysis lays elegant mathematics and statistical theory. It can be used in many fields, in engineering, economy, biology, medicine etc. In the book, Dougherty [4], there is a good example of where the regression can be used and how to use it. We explain Simple Linear Regression, for much more information, proofs, theorems, and examples the author refers the reader to the book by Weisberg [10]. There are also good examples of Simple Linear regression in book by Dougherty [4]. Multiple Regression analysis is, perhaps, more important than the Simple Linear Regression. There are a lot of results and books about the Multiple Regression, starting with Rencher and Schaalje [6] which author refers to the reader. I would like to mention books by Wasserman [9], Montgomery et al. [5], Seber and Lee [7], Casella and Berger [1], and Weisberg [10] which contain a lot more information. Besides the simple linear regression and multiple regression, books such Casella and Berger [1], and Weisberg [10] contain other linear regression models as well as nonlinear models. For the better understanding of the Polynomial Regression, the author refers the book by Wasserman [9] (Chapter 7) which will give you enough information on why Regression Analysis is good, and why it is not. The problem we mention in the thesis, ill-conditioning, is well-explained and solved in Chapter 7 Wasserman [9]. An excellent introduction to the bootstrap methods and confidence intervals is given in Davis [2] and Davison and Hinkley [3]. However, in the book by Van Der Vaart and Wellner [8] (Section 3.6 to 3.9), they mention the bootstrap empirical process and take the bootstrap method to the next level. Our goal is to simulate the real world problem and use the methods we mention. We will make some assumptions, estimate two models (Simple Linear, Multiple or Polynomial regression models), and look at the quotient of the two models. One would search for the distribution of the quotient of two models, but that could be really complicated, that s why we introduce the bootstrap method. Computing the confidence intervals for the mean of the quotient, we get the results. We introduce different types of measurement errors to the data and see how does that affect the quotient. 7

11 Formulation of Problem and Goal of the Thesis This project is inspired by my supervisor Christopher Engström. The formulation of the problem studied in the project and the goal of the project, given by the supervisor, are below. When creating a new control system or new hardware for a vehicle or some other machine there is also need to test it in practice. For example, if you want to evaluate if one method is more fuel efficient then another. The standard method to do this is by doing the testing in a controlled environment where you can limit the number of outside influences on the system. However, doing the tests in a controlled environment is not always possible - either because of cost considerations or because the thing you want to test is something that is hard to archive in a controlled environment. The goal of the thesis is to evaluate how well a quotient between two models, for example, the fuel efficiency of two engines, behaves when the data is taken in the non-controlled environment where two engines cannot be tested simultaneously and the effects of outside factors are large. Mathematically, the problem can be described as follows: 1) Given two sets of data, a model is constructed for each in order to predict one of the variables given the others (regression problem); 2) From these two models try to predict the quotient between the two predicted variables if they were given the same input parameters, for example by computing confidence intervals; 3) Introduce bias or different types of errors with known distributions into the data and determine how this affects the randomness in the result; 4) The project should be made using a mixed theoretical and experimental approach but may lean towards one or the other. 8

12 Chapter 1 Simple Linear Regression 1.1 The Model Regression is a method of finding the relationship between two variables Y and X. The variable Y is called a response variable and the variable X is called a covariate. The variable X is also called a predictor variable or a feature. In the simple linear regression we have only one covariate, but, as we will see later, there could be more covariates. Let s assume that we have a set of data D = {(y i,x i )} N i=1. To find relationship between Y and X we estimate the regression function r(x) = E(Y X = x) = y f (y x)dy. (1.1) The simplest is to assume that the regression function is a linear function: r(x) = θ 0 + θ 1 x. where x is a scalar (not a vector). Beside the regression function (mean function), the simple linear regression model consists of an another function Var(Y X = x) = σ 2 which is the variance function. By changing the parameters θ 0 and θ 1 we can get every possible line. To us, the parameters are unknown and we have to estimate them by using the data D. Since the variance σ 2 is positive, the observed value, in general, will not be the same as the expected value. Because of that, to account for the difference between those values, we look at the error ξ i = y i (θ 0 + θ 1 x i ) for every i {1,2,...,N}. The errors depend on the parameters and are not observable, therefore they are random variables. We can write simple linear regression model as y i = θ 0 + θ 1 x i + ξ i, i = 1,2,...,N. (1.2) 9

13 The model is called simple because there is only one feature to predict the predictor variable, and the linear part means that the model (1.2) is linear in parameters θ 0 and θ 1, to be precise, the assumption that the regression function (1.1) is linear. Considering that ξ i are random variables, y i are random variables as well. For the model to be complete, we have to make following assumptions about the errors ξ i, i = 1,2,...,N. 1. E(ξ i x i ) = 0 for all i = 1,2,...,N; 2. Var(ξ i x i ) = σ 2 for all i = 1,2,...,N; 3. Cov(ξ i,ξ j x i ) = 0 for all i j, i, j = 1,2,...,N. First assumption guarantees that the model (1.2) is well defined. It is equivalent to E(y i x i ) = θ 0 + θ 1 x i which means that y i depends only on x i and all other factors are random, contained in ξ i. The second assumption implies Var(y i x i ) = σ 2, the variance is constant, it does not depend on values of x i. Third assumption is equivalent to Cov(y i,y j x i ) = 0. The errors, as well as variables y i, are uncorrelated with each other. Under the assumption of normality, this would mean that the errors are independent. 1.2 Estimation of the Model Parameters Ordinary Least Squares One of many methods to estimate unknown parameters θ 0 and θ 1 in (1.2) is the Ordinary Least Squares (OLS) method. Let ˆθ 0 and ˆθ 1 be the estimates of θ 0 and θ 1. We define the fitted line by ˆr(x) = ˆθ 0 + ˆθ 1 x the fitted values as the residuals as ŷ i = ˆr(x i ) ˆξ i = y i ŷ i = y i ( ˆθ 0 + ˆθ 1 x i ) and the residual sums of squares or (RSS) by RSS = N ˆξ i 2. (1.3) i=1 By minimizing the residual sums of squares we get the estimates ˆθ 0 and ˆθ 1. Those estimates are called the least square estimates. The function we want to minimize is RSS(θ 0,θ 1 ) = N i=1 (y i (θ 0 + θ 1 x i )) 2 (1.4) 10

14 and by solving the linear system RSS(θ 0,θ 1 ) θ 0 = 0 RSS(θ 0,θ 1 ) θ 1 = 0 we get ˆθ 0 and ˆθ 1. When we differentiate, linear system (1.5) becomes (1.5) 2 2 N i=1 N i=1 (y i ((θ 0 + θ 1 x i )) = 0 (y i ((θ 0 + θ 1 x i ))x i = 0 Solving the linear system we get the least square estimates N ˆθ 0 = ȳ ˆθ 1 x, ˆθ 1 = N i=1 x iy i N xȳ N i=1 x2 i N x2 = N i=1 (x i x)(y i ȳ) N i=1 (x i x) 2 (1.6) N where ȳ = 1 i and x = N i=1y 1 x i. N i=1 The estimates given in (1.6) will be the estimates which minimize the function (1.4) if we prove that the second derivatives are positive. We could also notice that the function (1.4) has no maximum, therefore the estimates are the minimum Properties of the Ordinary Least Square Estimators To estimate parameters θ 0 and θ 1, the three assumptions on page 10 were not used. Even if the assumption E(y i x i ) = 0 for all i = 1,2,...,N does not hold, we can define ŷ i = θ 0 + θ 1 x i to fit the data D = {y i,x i } N i=1. The estimates ˆθ 0 and ˆθ 1 are also random variables, because they depend on statistical errors. If the assumption on page 10 hold, from the Gauss-Markov theorem (see Theorem (1) on page 19), the estimators ˆθ 0 and ˆθ 1 are unbiased and have the minimum variance among all linear unbiased estimators of parameters θ 0 and θ 1, E( ˆθ 0 X) = θ 0 E( ˆθ 1 X) = θ 1 also the variance of the estimates are σ Var( ˆθ 2 0 X) = N i=1 x2 i [ N x2 ] Var( ˆθ 1 X) = σ 2 1 N + x 2 N i=1 (x i x) 2. (1.7) 11

15 Since θ 0 depends on θ 1, it s obvious that the estimates are correlated Cov( ˆθ 0, ˆθ 1 X) = σ 2 x N i=1 (x i x) 2. The estimates ˆθ 0 and ˆθ 1 are called Best Linear Unbiased Estimates (BLUE) An Estimator of the Variance and Estimated Variances Ordinary Least Squares does not yield the estimation of the variance. Naturally, estimation ˆσ [ ] 2 should be obtained by averaging the squared residuals because σ 2 2. = E y i E(y i x i ) x i From the assumption 2, on page 10, we have the constant variance σ 2 for every y i, i = 1,2,...,N. Also, we use ŷ i to estimate E(y i x i ). To get the unbiased estimation ˆσ 2 of σ 2, we divide RSS ((1.3)) by its degrees of freedom (df), where residual df is number of cases in data D (N) minus the number of parameters, which is 2. The estimate is ˆσ 2 = RSS N 2, (1.8) this quantity is called the residual mean square. To estimate the variances of ˆθ 0 and ˆθ 1 we simply change σ 2 with ˆσ 2 in (1.7). Therefore, are the estimated variances. ˆσ Var( ˆθ 2 0 X) = N i=1 x2 i [ N x2 ] Var( ˆθ 1 X) = ˆσ 2 1 N + x 2 N i=1 (x i x) Hypothesis Testing, Confidence Intervals and t-test Until now, we didn t need to make any assumptions about the error s distribution besides the three assumption on page 10. Suppose that we add the following assumption ξ i x i : N(0,σ 2 ), i = 1,2,...,N. Since the predictions are linear combination of the error, we have y i x i : N(θ 0 + θ 1 x i,σ 2 ), i = 1,2,...,N. With this assumption we can construct confidence intervals about the model parameters and test hypotheses. Perhaps we are more interested in hypotheses about θ 1, by doing so we can determine if there is actually a linear relationship between X and Y by testing the hypotheses H 0 : θ 1 = 0, H 1 : θ 1 0. (1.9) 12

16 In general, we can test the hypothesis H 0 : θ 1 = c, H 1 : θ 1 c. (1.10) where c is an arbitrary constant. Depending on what we need to determine, we choose the constant c. Before we examine the hypothesis (1.9) and (1.10) we need the following properties: ( ) ˆθ 1 : N θ 1,σ 2 1 N i=1 (x, i x) 2 (N 2) ˆσ 2 1 σ 2 : χ 2 (N 2), ˆθ 1 and ˆσ 2 are independent random variables, where ˆσ 2 is given by (1.8). Using these properties, the hypothesis test of (1.10) is obtained by computing the t-statistics ˆθ 1 c t = (1.11) Var( ˆθ 1 X) where Var( ˆθ 1 X) is standard deviation. The t-statistic given by (1.11) has distribution t(n 2, δ). The non-centrality parameter δ is given by δ = E( ˆθ 1 X) Var( ˆθ 1 X) = θ 1 1 σ 1 N i=1 (x i x) 2. (1.12) Hypothesis (1.9) is just a special case of hypothesis (1.10), which means the t-statistics for (1.9) is ˆθ 1 t = ˆσ 2, 1 N i=1 (x i x) 2 where t is distributed as t(n 2), because from (1.12), if H 0 : θ 1 = 0 we have δ = 0. For two-sided alternative hypothesis given in (1.9), we reject the null hypothesis H 0 with the significance α when t t α 2,N 2, where t α 2,N 2 is the upper α 2 percentage point of the central student s distribution. Probability p, which fits for the absolute value of observed t (as the inverse of distribution function), is called the p value. Considering the following p > α = p 2 > α 2 = t < t α 2,N 2 which means that we accept the null hypothesis H 0. Alternatively, if p α we reject the H 0. Finally, to get the confidence interval, starting with P{ t t α 2,N 2} = 1 α using transformations, a 100(1 α)% confidence interval for θ 1 is given by ˆθ 1 t α 2,N 2 ˆσ 2 1 θ 1 ˆθ 1 +t α N i=1 (x i x) 2 2,N 2 ˆσ 2 1 N i=1 (x i x) 2 13

17 1.4 The Coefficient of Determination We define the coefficient of determination as R 2 = SS R SS T where SS R = N i=1 (ŷ i ȳ) 2 is a sum of square due to regression and SS T = N i=1 (y i ȳ) 2 is a total sum of squares. Since it can be proved that SS T = RSS + SS R the total sum of squares is in fact total amount of the variation in y i. Considering this, we have 1 = SS T = RSS + SS R = RSS + R 2, SS T SS T SS T which means that R 2 is a proportion of how much of the variation is explained by the model (by the regression). From 0 RSS SS T, it follows that R 2 [0,1]. The bigger the R 2 is, the more variability of Y are explained by the model. We can always add more variables to the model and the coefficient would not decrease, but that doesn t mean that the new model is better. The error sum of squares should be reduced to get the better model. Some of computer packages use adjusted coefficient of determination given by R 2 adj = 1 RSS/d f SS T /(N 1). 1.5 Maximum Likelihood Estimation While the OLS method does not require assumption about the errors to estimate the parameters, the maximum likelihood estimation (MLE) method can be used if the error s distribution is known. For the set of data D = {y i,x i } N i=1, if we assume that the errors in the simple regression model are normally distributed ξ i x i : N(0,σ 2 ), i = 1,2,...,N then y i x i : N(θ 0 + θ 1 x i,σ 2 ), i = 1,2,...,N. Since the parameters θ 0, θ 1 and σ 2 are unknown, the likelihood function is given by { } L(y i,x i ;θ 0,θ 1,σ 2 N ) = (2πσ 2 ) 1 2 exp 1 i=1 2σ 2 (y i θ 0 θ 1 x i ) 2 { } = (2πσ 2 ) N 2 exp 1 (1.13) N 2σ 2 (y i θ 0 θ 1 x i ) 2 i=1 14

18 Values ˆθ 0, ˆθ 1 and ˆσ 2 that maximize function (1.13) are called the I. To find the maximum value of the function (1.13) is the same as finding maximum of its natural logarithm, { }) lnl(y i,x i ;θ 0,θ 1,σ 2 ) = ln ((2πσ 2 ) N2 exp 1 N 2σ 2 (y i θ 0 θ 1 x i ) 2 = N 2 ln(2π) N 2 lnσ2 1 2σ 2 To find maximum of (1.14) we solve system i=1 N i=1 (y i θ 0 θ 1 x i ) 2. (1.14) or equivalently lnl(θ 0,θ 1,σ 2 ) θ 0 = 0 lnl(θ 0,θ 1,σ 2 ) θ 1 = 0 lnl(θ 0,θ 1,σ 2 ) σ 2 = 0 1 σ 2 1 σ 2 N i=1 N i=1 N 2σ σ 4 (y i θ 0 θ 1 x i ) = 0 (y i θ 0 θ 1 x i )x i = 0 N i=1 (y i θ 0 θ 1 x i ) 2 = 0. (1.15) The solution to (1.15) gives us the maximum likelihood estimates ˆθ 0 = ȳ ˆθ 1 x, ˆθ 1 = N i=1 (x i x)(y i ȳ) N i=1 (x i x) 2 ˆσ 2 = N i=1 (y i ˆθ 0 ˆθ 1 x 1 ) 2 N (1.16) which are the same as the estimates in (1.6), which we obtained using the OLS method. From ˆσ 2 we can get unbiased estimator for the parameter σ 2 and the ˆσ 2 is asymptotically unbiased itself. Since the MLE method requires more assumptions, naturally, with more assumption comes better properties. The estimators have the minimum variance among all other unbiased estimators. Therefore, under the assumption of normality, the maximum likelihood estimates are the same as the OLS estimates. 15

19 Chapter 2 Multiple Regression In this chapter, we generalize the methods for estimating parameters from the Chapter (1). Namely, we want to predict the variable Y using several features X 1, X 2,..., X k, k N. Basically, we add features to explain parts of Y that have not been explained by the other features. 2.1 The Model The regression function (1.1), under the assumption of linearity, for this problem becomes r(x) = E(Y X = x) = θ 0 + θ 1 x 1 + θ 2 x θ k x k where x is a vector x = (x 1,x 2,...,x k ). Therefore, the multiple regression model can be written as y = θ 0 + θ 1 x 1 + θ 2 x θ k x k + ξ. In order to estimate parameters in the model, θ 0,θ 1,...,θ k, we need N observations, a data set D. Suppose that we have data D = {y i,x i } N i=1 where x i is a vector, x i = (x i1,x i2,...,x ik ), k N, k is number of feature variables, i = 1,2,...,N. Hence, we can write the model for the i th observation as y i = θ 0 + θ 1 x i1 + θ 2 x i θ k x ik + ξ i, i = 1,2,...,N. (2.1) By saying linear model, we mean linear in the parameters. There are many examples where a model is not linear in x i j s but it is linear in θ i s. For k = 1, we get the simple regression model, so it is not a surprise that the three assumptions on page 10 should hold for the multiple regression as well, i.e., 1. E(ξ i x i ) = 0 for all i = 1,2,...,N; 2. Var(ξ i x i ) = σ 2 for all i = 1,2,...,N; 3. Cov(ξ i,ξ j x i,x j ) = 0 for all i j, i, j = 1,2,...,N. 16

20 Interpretation of these assumptions is similar as the interception for (1.1). For k = 2, the mean function E(Y X) = θ 0 + θ 1 X 1 + θ 2 X θ k X k is a plane in 3 dimensions. If k > 2, we get a hyperplane. We can not imagine or draw a k dimensional plane for k > 2. Notice that the mean function given above means that we are conditioning on all values of the covariates. For easier interpretation of the results, we would like to write the model (2.1) in a matrix form. Start by writing (2.1) as y 1 = θ 0 + θ 1 x 11 + θ 1 x θ k x 1k + ξ 1 y 2 = θ 0 + θ 1 x 21 + θ 1 x θ k x 2k + ξ 2. y N = θ 0 + θ 1 x N1 + θ 1 x N θ k x Nk + ξ N which gives us clear view of how to write the model in the matrix form. Simply, y 1 1 x 11 x 12 x x 1k θ 0 ξ 1 y 2. = 1 x 21 x 22 x x 2k θ ξ 2. y N 1 x N1 x N2 x N3... x Nk ξ N θ k and if we denote y 1 1 x 11 x 12 x x 1k θ 0 ξ 1 y 2 y =. ; X = 1 x 21 x 22 x x 2k ; θ = θ 1. ; ξ = ξ 2. y N 1 x N1 x N2 x N3... x Nk θ k ξ N (2.2) it becomes y = Xθ + ξ. The three assumptions can be expressed as E(ξ X) = 0, Cov(ξ X) = σ 2 I where Var(ξ i x i ) = σ 2 and Cov(ξ i,ξ j x i,x j ) is contained in Cov(ξ X) = σ 2 I. X is N (k + 1) matrix. We require the full column rank of the matrix, which means that N has to be greater than the number of columns (k + 1). Otherwise, there could happen that one of the columns is a linear combination of other columns. Through this chapter N will be greater than k + 1 and that rank of matrix X is k + 1, rank(x) = k + 1. Parameters θ are called regression coefficients. 2.2 Estimation of the Model Parameters Our goal is to estimate unknown parameters θ and σ 2 from the data D. Depending on the error s distribution, we can use different methods to estimate the parameters. 17

21 2.2.1 Ordinary Least Squares The method that does not require any assumptions about the exact distribution of errors is Ordinary Least Squares. The fitted values ŷ i are given by ŷ i = ˆθ 0 + ˆθ 1 x i1 + ˆθ 2 x i ˆθ k x ik, i = 1,2,...N. To obtain the OLS estimators for the parameters in θ, we seek for ˆθ 0, ˆθ 1, ˆθ 2,..., ˆθ k that minimize N ˆξ i 2 = i=1 = N i=1 N i=1 (y i ŷ i ) 2 (y i ( ˆθ 0 + ˆθ 1 x i1 + ˆθ 2 x i ˆθ k x ik )) 2 (2.3) One way to minimize the function is finding the partial derivatives with respect to each ˆθ j, j = 0,1,...,k, set the results to be equal to zero and solve k + 1 equations to find the estimates. One of the reasons why we wrote the model in the matrix form is to simplify these calculations. Therefore, since we assumed rank(x) = k + 1 < N, the following procedure holds. Firstly, (2.3) can be written in matrix form as ˆξ ˆξ = N i=1 (y i x i ˆθ) 2 where ξˆ 1 ˆθ 0 1 ˆ ξ 2 ˆξ =. ; ˆθ 1 ˆθ =. ; x x i1 i =.. ξˆ N ˆθ k x ik So, the function (2.3) we want to minimize now becomes N ˆξ i 2 = ˆξ ˆξ = i=1 N i=1 (y i x i ˆθ) 2 = (y X ˆθ) (y X ˆθ) = y y (X ˆθ) y y X ˆθ + (X ˆθ) X ˆθ = y y 2y X ˆθ + ˆθ X X ˆθ where we used basic matrix operations. Now we use matrix calculus to obtain the estimates. Differentiating ˆξ ˆξ with respect to ˆθ and setting the result to equal zero, we get 0 2Xy + 2X X ˆθ = 0 from which we have X X ˆθ = X y. 18

22 From the assumption rank(x) = k + 1, matrix X X positive-definite matrix, therefore the matrix is nonsingular, so (X X) 1 exists. Now we have the solution ˆθ = (X X) 1 X y. (2.4) Checking if the hessian of ˆξ ˆξ is positive-definite matrix, gives us that ˆθ is actually the minimum. The hessian is 2X X which is a positive-definite matrix because of the assumption of the rank. Since ˆθ minimizes the sum of squares, we call it the ordinary least square estimator Properties of the Ordinary Least Squares Estimators Even without the three (two) assumptions we could obtain the OLS estimators, but their properties wouldn t be as nice as with the assumptions. Let s assume that E(y X) = Xθ. The following holds E( ˆθ X) = E((X X) 1 X y X) = (X X) 1 X E(y X) = (X X) 1 X Xθ = θ (2.5) which means that ˆθ is unbiased estimator of θ. Let s now assume that Cov(ξ X) = σ 2 I. Under this assumption we can find the covariance matrix of ˆθ, Cov( ˆθ X) = Cov((X X) 1 X y X) = (X X) 1 X Cov(y X)((X X) 1 X ) = (X X) 1 X σ 2 IX(X X) 1 = σ 2 (X X) 1 X X(X X) 1 = σ 2 (X X) 1 (2.6) Using this two properties, we can prove one of the most important theorem, also known as Gauss-Markov Theorem. Theorem 1. (Gauss-Markov Theorem) If y = Xθ + ξ, E(ξ X) = 0, Cov(ξ X) = σ 2 I and rank(x) = k + 1, then the ordinary least square estimator given by (2.4) is the Best Linear Unbiased Estimator (BLUE), the estimator has minimum variance among all unbiased estimators. Proof. The linearity of the estimator is easy to notice from the (2.4). The proof that the estimator is unbiased is given in (2.5). Let s prove now that the variance σ 2 (X X) 1 of the least squares estimator is the minimum among all unbiased estimators. 19

23 Assume that we have an linear estimator ˆβ = B 1 y of θ. Without losing the generality, there exists a non zero matrix B such that B 1 = (X X) 1 X + B. Besides the linearity, the estimator ˆβ should also be unbiased, so the following holds and also E( ˆβ X) = θ E( ˆβ X) = E(B 1 y X) = B 1 E(y X) = ((X X) 1 X + B)E(Xθ + ξ X) = ((X X) 1 X + B)Xθ = (X X) 1 X Xθ + BXθ = (I + BX)θ which implies that BX = 0. The estimator was arbitrary, let s prove that its variance is greater or equal to the variance of the OLS estimator. If we prove that Cov( ˆβ X) Cov( ˆθ X), it will imply that the variances of ˆθ i are the minimum among all others because the diagonal elements of the matrices are the variances of the estimators. Note that above means that Cov( ˆβ X) Cov( ˆθ X) is a positive semi-definite matrix. The following holds Cov( ˆβ X) = Cov(B 1 y X) = B 1 Cov(y X)B 1 = σ 2 B 1 B 1 = σ 2 ((X X) 1 X + B)((X X) 1 X + B) = σ 2 ((X X) 1 X X(X X) 1 + (X X) 1 X B + BX(X X) 1 + BB ) = σ 2 ((X X) 1 + BB ) where we used that BX = 0 (also X B = 0). From (2.6), we have Cov( ˆθ X) = σ 2 (X X) 1, so Cov( ˆβ X) Cov( ˆθ X) = BB 0 is an positive definite matrix because of assumption that B is non zero matrix. Considering the comment above, the OLS estimator is BLUE An Estimator of the Variance and Estimated Variance Under the assumptions on page 16, the variance of y i is constant for all i = 1,2,...,N. Therefore, Var(y i x i ) = σ 2 = E(y i E(y i x i ) x i ) 2 and also, E(y i x i ) = x iθ. Naturally, based on the data D = {y i,x i } N i=1 we estimate the variance as it follows ˆσ 2 = 1 N k 1 N i=1 (y i x i ˆθ) 2 20

24 or in the matrix form ˆσ 2 = RSS N k 1 (2.7) where RSS = (y X ˆθ) (y X ˆθ) is Residual Sum of Squares. The statistic (2.7), under the assumptions on page 16, is an unbiased estimator of the parameter σ 2, i.e. E( ˆσ 2 X) = σ 2. Using (2.6) and (2.7), the unbiased estimator for Cov( ˆθ) is Ĉov(θ) = ˆσ 2 (X X) 1. If we add one assumption to the Gauss-Markov theorem from page 19, which is E(ξ 4 i x i) = 3σ 4, then the estimated variance (2.7) has minimum variance among all quadratic unbiased estimators, which can be proven. See Theorem 7.3g. in [6]. 2.3 Maximum Likelihood Estimation So far, there were no assumption made about the distribution of the errors. To obtain Maximum Likelihood Estimator, we need to make those assumptions. In this section, we will assume normality of the random variable ξ. So, let ξ : N N (0,σ 2 I), where N N stands for N dimensional normal distribution. From the covariate matrix we have that the errors are uncorrelated which, under the assumption of normality, means that they are independent as well. The random variable y is normally distributed with expectation Xθ and covariate matrix σ 2 I), which implies that the joint probability density function, which we denote with ϕ(y;x,θ,σ 2 ), is ϕ(y,x;θ,σ 2 ) = N i=1 ϕ(y i ;x i,θ,σ 2 ) because y i are independent random variables. Or equivalently, we can write it as ( ϕ(y,x;θ,σ 2 ) = (2π) N 2 σ 2 I 2 1 exp 1 ) 2 (y Xθ) (σ 2 I) 1 (y Xθ) from the definition of the density of multivariate normal distribution. When y and X are known, the density function is treated as a function of parameters θ and σ 2 and in this case we call it the likelihood function, and we denote it as ( L(y,X;θ,σ 2 ) = (2π) N 2 σ 2 I 2 1 exp 1 ) 2 (y Xθ) (σ 2 I) 1 (y Xθ). (2.8) By maximizing the function (2.8) for given y and X we obtain the maximum likelihood estimators θ and σ 2. Maximizing logarithm of the function (2.8) is the same as maximizing the likelihood function, so, for easier calculation, we maximize the logarithm of the likelihood function. The likelihood function now becomes lnl(y,x;θ,σ 2 ) = N 2 ln(2π) N 2 lnσ2 1 2σ 2 (y Xθ) (y Xθ). (2.9) 21

25 When we find the gradient of the function lnl(y,x;θ,σ 2 ) and equalize it with 0, we get the Maximum Likelihood Estimator ˆθ, which is given by ˆθ = (X X) 1 X y and it is the same as the estimator obtained with the OLS method. And the biased estimator of the variance σ 2, which we get from this, is given by ˆσ 2 b = 1 N (y X ˆθ) (y X ˆθ). The unbiased estimator of the variance is ˆσ 2 = 1 N k 1 (y X ˆθ) (y X ˆθ). To verify that the estimator ˆθ actually maximizes function (2.9), we calculate hessian matrix of the function (2.9) and prove that it is negative definite matrix. Since the hessian matrix is X X, under the assumption from the beginning of the chapter that rank(x) = k +1, we have X X 0 which proves the claim Properties of the Maximum Likelihood Estimators The following properties of the estimators hold under the assumption of the normality of error s distribution. We will just state them without proofs. ˆθ : N k+1 (θ,σ 2 (X X) 1 ); (N k 1) ˆσ 2 /σ 2 : χ 2 (N k 1); ˆθ and ˆσ 2 are independent; ˆθ and ˆσ 2 are jointly sufficient statistics for θ and σ 2 ; The estimators ˆθ and ˆσ 2 have minimum variance among all unbiased estimators. 2.4 Polynomial Regression In this section we introduce Polynomial Regression model as a special case of Multiple Regression model with only few properties and short descriptions. You can read more about the Polynomial Regression in the book [7, Chapter 8], and in the book [10, Section 5.3]. Namely, if we set x i j = x j i in (2.1), j = 1,2,...,k, k N 1, we get y i = θ 0 + θ 1 x i + θ 2 x 2 i θ k x k i + ξ i, i = 1,2,...,N. (2.10) which is k th degree or (k + 1) th order polynomial regression model. 22

26 The inspiration for such model arises from Weierstrass approximation theorem (see [2, Chapter VI]), which claims that every continuous on a finite interval can be uniformly approximated as closely as desired by a polynomial function. Although it seems like a great solution, the better approximation requires higher polynomial degree, which means more unknown parameters to estimate. Theoretically k can go up to N 1, but, when k is greater then approximately 6, the matrix X X becomes ill-conditioned and other problems arise. The matrix X now becomes 1 x 1 x1 2 x x k 1 1 x 2 x2 2 x x k 2 X = (2.11). 1 x N xn 2 x3 N... xk N and matrices y, θ, and ξ are the same. The model (2.10) can be written as y = Xθ + ξ. (2.12) Even thought the problem of finding the unknown parameters in Polynomial Regression is similar to the problem in Multiple Regression, Polynomial Regression has special features. The model (2.10) is the k th order polynomial model in one variable. When k = 2 the model is called quadratic, when k = 3 the model is called cubic and so on. The model can also be in two or more variables, for example, a second order polynomial is given by y = θ 0 + θ 1 x 1 + θ 2 x 2 + θ 11 x θ 22 x θ 12 x 1 x 2 + ξ. which is known as the response surface. For our purposes, we will study only Polynomial Regression in one variable. We want to keep order of the model as low as possible. By fitting higher order polynomial we will most likely over fit the model, which means that such a model will not be a good predictor or will enhance understanding of the unknown function. From our assumption of rank(x) = k + 1, full column rank, in polynomial regression models, when we increase the order of the polynomial, as we mentioned, the matrix X X becomes ill-conditioned. This implies that the parameters will be estimated with error, because (X X) 1 might not be accurate Orthogonal Polynomials Before computers were made, people had problem calculating the powers x 0,x 1,...x k manually (by hand), but in order to fit the Polynomial Regression this is necessary. Assume we fit the Simple Linear Regression model to some data. We want to increase the order of the model, but not to start from the very start. What we want to do is to create situation where adding an extra term merely refines the previous model. We can archive that using the system of orthogonal polynomials. Now, with computers, this has less use. The system of orthogonal 23

27 polynomials can mathematically be obtained using Gram-Schmidt method. The k th orthogonal polynomial has degree k. As we mentioned, ill-conditioning is a problem as well. In polynomial regression model, the assumption that all independent variables are independent is not satisfied. This issue can also be solved by orthogonal polynomials. There are continuous orthogonal polynomials and discrete orthogonal polynomials. The continuous orthogonal polynomials are classic orthogonal polynomials such as Hermite polynomials, Laguerre polynomials, Jacobi polynomials. We use discrete orthogonal polynomials, where the orthogonality relation involves summation. The columns in matrix X in the model (2.12) are not orthogonal. So, if we want to add another term θ k+1 xi k+1, the matrix (X X) 1 will change (we need to calculate it again). Also, the lower order parameter ˆθ i, i = 0,1,...,k will change. Let s instead fit the model y i = θ 0 P 0 (x i ) + θ 1 P 1 (x i ) + θ 2 P 2 (x i ) θ k P k (x i ) + ξ i, i = 1,2,...,N, (2.13) where P j (x i ) are orthogonal polynomials, P j (x i ) is j th order polynomial, j = 0,1,...,k, (P 0 (x i ) = 1). From orthogonality we have N i=1 P m (x i )P n (x i ) = 0, m n, m,n = 0,1,...,k, The model (2.13) can be written in matrix form y = Xθ + ξ. Considering that matrix X is now P 0 (x 1 ) P 1 (x 1 ) P 2 (x 1 )... P k (x 1 ) P 0 (x 2 ) P 1 (x 2 ) P 2 (x 2 )... P k (x 2 ) X = P 0 (x N ) P 1 (x N ) P 2 (x N )... P k (x N ) and from orthogonality, the following holds N i=1 P2 0 (x i) X 0 N i=1 X = P2 1 (x i) N i=1 P2 k (x i) We know that the ordinary least square estimator is given by ˆθ = (X X) 1 X y, or equivalently ˆθ j = N i=1 P j(x i )y i N i=1 P2 j (x i), From (2.6) we have the variance, or equivalently Var( ˆθ j x j ) = j = 0,1,2,...,k. σ 2 N i=1 P2 j (x i). 24

28 It is interesting to notice that ˆθ 0 = N i=1 P 0(x i )y i N i=1 P2 0 (x i) = N i=1 y i = ȳ N Perhaps we want to add a term θ k+1 P k+1 (x i ) to model (2.13), then the estimator for θ k+1 will be ˆθ k+1 = N i=1 P k+1(x i )y i N i=1 P2 k+1 (x i). To obtain the estimator, we didn t change other terms in the model, we only look at newly added term. Because of the orthogonality, there is no need for finding (X X) 1 or any other estimators again. This is a way to easily fit higher order polynomial regression model. We can terminate the process when we find optimal (for our purpose) model. 25

29 Chapter 3 The Bootstrap 3.1 Introduction Let x 1,x 2,...,x N be a homogeneous sample of data, which can be observed as the outcomes of independent and identically distributed (i.i.d.) random variables X 1,X 2,...,X N, with Probability Density Function (PDF) f, and Cumulative Distribution Function (CDF) F. Using the sample, we can make inferences about parameter θ (population characteristic). To do that, we need a statistic S, which we assume that we have already chosen and that it is an estimate of θ (which is a scalar). For our needs, we are focused on how to calculate confidence intervals for parameters θ using the PDF of statistic S. We also could be interested in its bias, standard error, or its quantiles. There are two situations, the one we are interested in the non-parametric, and the parametric. Statistical methods based on mathematical model with known parameter τ that fully determines PDF f are called parametric methods, and the model is called parametric model. In this case, the parameter θ is a function of parameter τ. The statistical methods where we use only the fact that random variables are i.i.d. are non-parametric methods, and the models are called non-parametric models. For the non-parametric analysis, empirical distribution is important. The empirical distribution sets equal probabilities to each element of the sample, x i, i = 1,2,...,N. The probabilities are 1 N. Empirical Distribution Function (EDF) ˆF as an estimate of CDF F is defined as: ˆF(x) = The function ˆF can also be written as: number of elements in the sample x. N ˆF(x) = 1 N N i=1 where I Ai is the indicator of event A i and A i = {ω X i (ω) x}. I Ai (3.1) 26

30 Because of the importance of the EDF, we will define it more formally. Define function ν(x) as: ν(x) = { j : X j x, j = 1,2,...,N}, x R. The function represents cardinality of a set. Now we can define EDF as: ˆF(x) = ν(x) N, x R. (3.2) Random variable ˆF(x) is a statistic with values in a set { 0, 1 N, 2 N,..., N 1 N,1 }. The distribution of random variable will be: ( P ˆF(x) = k ) ( ) N = P(ν(x) = k) = F(x) k (1 F(x)) N k, N k k = 0,1,2,...,N, where F is the CDF, which means that ˆF(x) follows a Binomial Distribution with parameters p = P(X x) = F(x), x R and N. Considering the fact E(I Ai ) = F(x), for x R, ˆF n F almost certain or P( ˆF n F) = 1, which can be proven by Borel s law of large numbers Statistics Many statistics can be represented as a property of EDF. For example, x = N 1 N i=1 x i (the sample average) is the mean of the EDF. Generally, the statistic s is a function of x 1,x 2,...,x N and will not be affected by reordering the data, which implies that statistic s will depend on the EDF ˆF. So, statistic s can be written as a function of ˆF, s = s( ˆF). The statistical function s( ) can be perceived as a way for computing statistic s from function ˆF. This function is useful in non-parametric case since the parameter θ is defined by the function as s(f) = θ. The mean and the variance can be observed as a statistical functions: s(f) = xdf(x) ( 2. s(f) = x 2 df(x) xdf(x)) For the parametric methods we often define θ as a function of the model parameter τ, but the same definition stands for them too. 27

31 Notation S = s( ) will be used as a function and notation s as the estimate of θ, which is based on the data x 1,x 2,...,x N. The estimate can usually be expressed as s = s( ˆF), which actually represents the relation between parameter θ and the CDF F. From the definition (3.1), ˆF n F, as we mentioned before, then if s( ) is continuous, S converges to θ when n (consistency). We will not go into more details, applying bootstrap does not require such formality. We will assume that S = s( ˆF). 3.2 The Bootstrap Estimates Finding the distribution of statistics S can help us inference about estimates θ. For example, if we want to obtain 100(1 2α)% confidence interval for θ, we could possibly show that statistic S has approximately normal distribution with mean θ + β and standard deviation σ. The β is the bias of S. When we have assumption that bias and variance are known then: where function Φ is: ( s (θ + β) ) P(S s F) Φ, σ Φ(z) = 1 2π z e t2 2 dt, z R If the α quantile of the standard normal distribution is z α = Φ 1 (α), then 100(1 2α)% confidence interval for θ is: s β σ z 1 α θ s β σ z α (3.3) which we obtained from: ) P (β + σ z α S θ β + σ z 1 α 1 2α. However, the bias and the variance will almost never be known. Therefore, we need to estimate them. Express β and σ as: β = b(f) = E(S F) s(f), σ 2 = v(f) = Var(S F), where we note that S F means that random variables from which S is calculated have distribution F (X 1,X 2,...,X N are i.i.d. with CDF F). Assume that ˆF is estimation of function F, then we can obtain the estimates of β and σ as: B = b( ˆF) = E(S ˆF) s( ˆF) V = v( ˆF) = Var(S ˆF), (3.4) This estimates are called the bootstrap estimates. 28

32 3.3 Parametric Simulation The bootstrap idea has two steps, first estimating parameters, and then approximate them using simulation. We do that because sometimes we cannot simply express the formula for calculating parameter estimates. The practical alternative is re-sampling the data from a fitted parametric model, and calculation of properties of S which we need. Let F τ be CMF and f τ be PDF. Suppose that we have data x 1,x 2,...,x N and a parametric model for the distribution of the data. Let ˆF(x) = Fˆτ (x) be CMF of a fitted model which we get when we estimate τ (usually) using Maximum Likelihood Estimate with ˆτ. Note random variable distributed accordingly to ˆF as X Approximations Suppose now that calculation is for some reason too complicated. As we mentioned the alternative is to simulate data sets (re-sample) and estimate the properties. Let X1,...,X N be a data set i.i.d. from distributed ˆF. Denote with S statistic calculated from simulated data set. By repeating the process R times, we obtain R values S1,S 2,...,S R. The estimator of the bias will now become: B = b( ˆF) = E(S ˆF) s = E (S ) s and this is estimated by: B R = 1 R R Sr s = S s. r=1 Here, s is parameter value for the model, so S s is analog to S θ. Similarly, the estimator of the variance of S is: V R = 1 R 1 R r=1 (S r S ) 2. As R increase, by the law of large numbers, B R converges to B (the exact value under the fitted model) as well as V R to V. 3.4 Non-parametric Simulation Supposed that we have X 1,X 2,...,X N for which it is sensible to assume that they are i.i.d. from unknown distribution F. Using EDF ˆF we estimate CDF F, and we use ˆF as we would use it in a parametric model. First we see if we can calculate it easily, if not, we simulate the data sets (re-sampling) and approximate. Empirical calculations of the properties we require. Simulation using EDF is based on the fact that EDF puts equal probabilities to each values of data set x 1,x 2,...,x N. So, every simulated sample (re-sample) X 1,X 2,...,X N is taken at random. This re-sampling method is called the non-parametric bootstrap. 29

33 3.5 Confidence Intervals The distributed of S can be used to calculate confidence intervals, which is the main goal of the bootstrap for our needs. There are multiple ways to use bootstrap simulation, we will describe two methods. We could use the normal approximation of distribution S. This means that we will need to estimate limits (3.3) using the bootstrap estimates of bias and variance. Using the bootstrap method, we can estimate the quantiles for S θ with s (R+1)p s where we assume that (R+1)p is a whole number, so the p quantile of S θ is (R + 1)p th ordered value of s s, that is s (R+1)p s. So, an 100(1 2α)% confidence interval is: which can be obtained from: 2s s (R+1)(1 α) θ 2s s (R+1)α (3.5) P(a S θ b) = 1 2α = P(S b θ S b) = 1 2α. The interval (3.5) is called basic bootstrap confidence interval. The bigger R the more accurate the confidence interval will be. Typically, you take R > 1000, but there are more factors that accuracy depends on, for more details you can check the books mentioned in the Bibliography. When the distribution of S θ depends on unknowns, we try to mimic Student s-t statistic, therefore we define a studentized version of S θ as: Z = S θ V where V is an estimate of Var(S F). With this, we eliminate the unknown standard deviation when making inference about the normal mean. Student-t 100(1 2α)% confidence interval for mean is: x ˆσ t N 1 (1 α) θ x ˆσ t N 1 (α) where ˆσ is estimated standard deviation of the mean, and t N (α) is quantile of the Student-t distribution with N degrees of freedom. We can obtain 100(1 2α)% confidence interval for θ analogously, for distribution Z, as follows: s ˆσ z 1 α θ s ˆσ z α here z p is p quantile of Z. To estimate the quantiles of Z, we use replicates of the studentized bootstrap statistic: Z = S s V where we obtain values from re-samples X1,X 2,...,X N. When we use simulated values z 1,z 2,...z R to estimate z α, then we obtain studentized bootstrap confidence interval for θ: s ˆσ z (R+1)(1 α) θ s ˆσ z (R+1)α. (3.6) 30

34 The studentized bootstrap method is used to obtain confidence intervals in our non-parametric problem. 31

35 Chapter 4 Simulation and Evaluation In this chapter, we will simulate a real-world problem using data created in the program language MatLab. The goal is to estimate two models and test the quotient between them. In order to do that, we are assuming that we know the true relationship between the variables we observe - we know the true models. The data from which we need to estimate a model has measurement errors. Therefore, suppose that we know the real data and the data with the measurement error. We want to see how the assumption of the measurement error s distribution affects the quotient. 4.1 Mathematical Description of the Problem Let D 1 = {ỹ i1, x i1 } N 1 i=1 and D 2 = {ỹ j2, x j2 } N 2 j=1 be two sets of data. We assume that the data x i1, x j2, ỹ i1 and ỹ j2 are known to us, and that there is an error in measurement when obtaining the data, which means x i1 = x i1 + ξ i1, i = 1,2,...,N 1 ; x j2 = x j2 + ξ j2, j = 1,2,...,N 2, ỹ i1 = y i1 + ε i1, i = 1,2,...,N 1 ; ỹ j2 = y j2 + ε j2, j = 1,2,...,N 2, where ξ i1 and ξ i2, as well as ε i2 and ε j2, follow the same distribution for every i = 1,2,...,N 1, j = 1,2,...,N 2. For the purpose of the problem, we will assume that we also know the true values of the data, x i1, x j2, y i1 and y j2. We know the true relationship between observed variables Y and X, which means that we know the true models that fit the data. Let D 1 = {y i1,x i1 } N 1 i=1 and D 2 = {y j2,x j2 } N 2 j=1. Depending on the problem, we will either use the Simple Linear Regression or the Polynomial Regression (OLS method) to create the models. We will set that the parameters in one of the two real models to be 5% smaller than the parameters in the other model. From the data D 1 and D 2, using the OLS method to obtain the parameter estimates, we 32

36 will get two models y 1 (x) and y 2 (x), and we are interested in the quotient: y 1 (x) y 2 (x). (4.1) Since the goal is to see how and if the assumption of the measurement s error distribution affects the quotient (4.1), we will repeat this process for different errors but from the same distribution and obtain the confidence interval for the quotient using the bootstrap method. We know the true quotient, the true ratio between the models, which we will use to see if it belongs to the confidence interval we obtain. 4.2 An Analogy With the Real World We want to simulate the relationship between velocity and fuel efficiency in the ships. The true data always comes with measurement errors due to many factors, which is the reason why we add the measurement error to our data. The assumption that we know the true models can help us understand the relationship between the velocity and fuel consumption. Suppose, for example, we want to see which of two engines is better and how much (does it spend more or less fuel). This can be hard because of the measurement errors, results might lead us in the wrong direction. By testing the quotient of the two models, assuming different errors, we can see if the assumption of particular error should affect our decision of which of two engines spends less and how certain can we be in our decision. 4.3 Parameter Estimation The velocity and fuel consumption data we use comes from ship log data gathered at Qtagg AB, from one ship gathered over roughly half a month. Our real sets of data D 1 and D 2 consists of velocity measured in knots and fuel efficiency measured in liters per hour. The plotted data for the real velocity (without measurement errors) is given in figure (4.1). 33

37 Figure 4.1: The data (velocity) without measurement errors Our assumption of knowing the true models will give us insight of how the fuel efficiency data should look. We will consider only three cases. 1. the true models are the following: y 1 (x) = θ 11 x y 2 (x) = θ 12 x (4.2) where we choose the θ 11 to be θ , and we set the other one to be 5%, bigger, θ As we mentioned, we will always set the parameters to be 5% bigger for one model. Using the models, we can obtain the true data y i1 and y i j. The data is given in figure (4.2). 2. the true models are the following: y 1 (x) = θ 31 x 3 y 2 (x) = θ 32 x 3 (4.3) where θ , θ The true data y i1 and y i j, using those models are given in figure (4.3). 3. the true models are the following: y 1 (x) = θ 01 + θ 11 x + θ 21 x 2 + θ 31 x 3 y 2 (x) = θ 02 + θ 12 x + θ 22 x 2 + θ 32 x 3 (4.4) where θ 1 = [ , , ,0.9254] and θ 2 has coefficients 5% greater then θ 1. The data is given in figure (4.4). 34

38 Figure 4.2: Case 1 - The data (fuel efficiency) without measurement errors The data in the all of the three cases are without any measurement errors. In the next sections, we will add errors to those data. When we add errors, we will get new data from which we will create adequate models. We will estimate the parameters using OLS method described earlier. Depending on the case, we estimate the parameters to get the same type model as the true model. We will describe how to do that. Assume that we have data D 1 and D 2, using results for the OLS method, we obtain estimates for each case as follows: 1. in this case we have for each model to estimate only one parameter. Let: The OLS estimates are: X i = x 1i x 2i. x Ni,i, Ỹ i = ỹ 1i ỹ 2i. ỹ Ni,i, i = 1,2. ˆθ 1i = ( X i X i ) 1 X iỹi, i = 1,2. 2. this case does not differ a lot from the first case. Using OLS method, and results in polynomial regression, we get: ˆθ 3i = ( X i X i ) 1 X iỹi, i = 1,2, 35

39 Figure 4.3: Case 2 - The data (fuel efficiency) without measurement errors where X i = x 3 1i x 3 2i. x 3 N i,i, Ỹ i = ỹ 1i ỹ 2i. ỹ Ni,i, i = 1,2. 3. this case is classic polynomial regression, there is no need to describe it. Now that we know how to estimate parameters, we will just state the results in the next sections. 4.4 Confidence Intervals In order to obtain confidence intervals for the mean of the quotient, we need data. In first and second part, we actually need to obtain confidence interval for the mean of the quotient of the coefficients. To simulate the data, we need to do the following: For the first and second part we do the same. First, we add error from the same distribution to the real data sets. From those two sets, we estimate two models as explained above. After the estimation, we save the quotient of the estimated parameters - because it is actually the quotient of the two models. In order to obtain more of the quotients, we repeat the process. Denote the data set of the quotients as Q. The third part is quite different. Here, the quotient is not just a simple ratio of two parameters. Of course, we first estimate models using D 1 and D 2. After estimation, 36

40 Figure 4.4: Case 3 - The data (fuel efficiency) without measurement errors using the same input parameters x {x i1,x j2,i = 1,2,...,N 1, j = 1,2,...,N 2 } in (4.1) we obtain a set of data S 1. This set of data is not enough to see how the error affects the quotient. The same way as we obtained set S 1, we repeat the process and obtain sets S 1,S 2,...,S 9000 (we decided that 9000 sets should be enough). Now we compute the mean of every data set S i, i = 1,2,...,9000, and the union of the means is the data set of the quotients Q. In both cases, in order to obtain the confidence interval, we use the bootstrap method on the data set Q. We want to see how does the assumption that the measurement error has a particular distribution affects the quotient - see how the confidence intervals of the mean behave. For each assumption of the distribution, we will state the confidence intervals. Assuming the random variables ξ i1,ξ j2,ε i1,ε j2 have Uniform Distribution on interval ( 0.3,1) (see (B.2)), ξ i1,ξ j2,ε i1,ε j2 : U( 0.3,1), we get the following confidence intervals of the mean of the quotient: 1. in the first case, when the true models are given in (4.2), we get: [ , ] 2. when the true models are given in (4.3): [ , ] 37

41 3. and the true models are given in (4.4): [0.9136, ] A sample of data taken from the Uniform Distribution is given in figure (4.5). Figure 4.5: A sample of data taken from the Uniform Distribution Assuming the random variables ξ i1,ξ j2,ε i1,ε j2 have Generalized Pareto Distribution with parameters ξ = 0.1, µ = 0.2 and σ = 0.2 (see (B.3)), we get the following confidence intervals of the mean of the quotient: 1. in the first case, when the true models are given in (4.2), we get: [ , ] 2. when the true models are given in (4.3): [ , ] 3. and the true models are given in (4.4): [ , ] A sample of data taken from the Generalized Pareto Distribution is given in figure (4.6). 38

42 Figure 4.6: A sample of data taken from the Generalized Pareto Distribution Assuming the random variables ξ i1,ξ j2,ε i1,ε j2 have Normal Distribution with parameters µ = 0, σ 2 = 0.4 (see (B.4)), we get the following confidence intervals of the mean of the quotient: 1. in the first case, when the true models are given in (4.2), we get: [ , ] 2. when the true models are given in (4.3): [ , ] 3. and the true models are given in (4.4): [0.5428, ] A sample of data taken from the Normal Distribution is given in figure (4.7). 39

43 Figure 4.7: A sample of data taken from the Normal Distribution Assuming the random variables ξ i1,ξ j2,ε i1,ε j2 have Log-Normal Distribution with parameters µ = 0, σ 2 = 0.1 (see (B.5)), we get the following confidence intervals of the mean of the quotient: 1. in the first case, when the true models are given in (4.2), we get: [ , ] 2. when the true models are given in (4.3): [ , ] 3. and the true models are given in (4.4): [0.9204, ] A sample of data taken from the Normal Distribution is given in figure (4.8). 40

44 Figure 4.8: A sample of data taken from the Log-normal Distribution Assuming the random variables ξ i1,ξ j2,ε i1,ε j2 have Gamma Distribution with parameters α = 21, β = 0.02 (see (B.6)), we get the following confidence intervals of the mean of the quotient: 1. in the first case, when the true models are given in (4.2), we get: [ , ] 2. when the true models are given in (4.3): [ , ] 3. and the true models are given in (4.4): [0.9467, ] A sample of data taken from the Gamma Distribution is given in figure (4.9). 41

45 Figure 4.9: A sample of data taken from the Gamma Distribution Assuming the random variables ξ i1,ξ j2,ε i1,ε j2 have Student s t-distribution with degrees of freedom d f = 15, (see (B.7)), we get the following confidence intervals of the mean of the quotient: 1. in the first case, when the true models are given in (4.2), we get: [ , ] 2. when the true models are given in (4.3): [ , ] 3. and the true models are given in (4.4): [0.6981, ] A sample of data taken from the Student s t-distribution is given in figure (4.10). 42

46 Figure 4.10: A sample of data taken from the Student s t-distribution Assuming the random variables ξ i1,ξ j2,ε i1,ε j2 have Chi-Square Distribution with degrees of freedom d f = 0.8, (see (B.8)), we get the following confidence intervals of the mean of the quotient: 1. in the first case, when the true models are given in (4.2), we get: [ , ] 2. when the true models are given in (4.3): [ , ] 3. and the true models are given in (4.4): [0.9467, ] A sample of data taken from the Student s t-distribution is given in figure (4.11). 43

47 Figure 4.11: A sample of data taken from the Gamma Distribution 4.5 True Values of the Quotient From the (4.2) and (4.3) we know the true quotient of the models, which is: θ 11 = θ = , θ 31 = θ = , which is obviously the same, because we choose the parameters. Similarly, the true quotient of the model (4.4) is the same: θ true = Evaluation of the Results For easier interpretation of the results, see table (4.1). 44

Multivariate Regression

Multivariate Regression Multivariate Regression The so-called supervised learning problem is the following: we want to approximate the random variable Y with an appropriate function of the random variables X 1,..., X p with the

More information

Ma 3/103: Lecture 24 Linear Regression I: Estimation

Ma 3/103: Lecture 24 Linear Regression I: Estimation Ma 3/103: Lecture 24 Linear Regression I: Estimation March 3, 2017 KC Border Linear Regression I March 3, 2017 1 / 32 Regression analysis Regression analysis Estimate and test E(Y X) = f (X). f is the

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 11: Maximum likelihood (v2) Ramesh Johari MS&E 226: Small Data Lecture 11: Maximum likelihood (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 18 The likelihood function 2 / 18 Estimating the parameter This lecture develops the methodology behind

More information

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8

Peter Hoff Linear and multilinear models April 3, GLS for multivariate regression 5. 3 Covariance estimation for the GLM 8 Contents 1 Linear model 1 2 GLS for multivariate regression 5 3 Covariance estimation for the GLM 8 4 Testing the GLH 11 A reference for some of this material can be found somewhere. 1 Linear model Recall

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Bias Variance Trade-off

Bias Variance Trade-off Bias Variance Trade-off The mean squared error of an estimator MSE(ˆθ) = E([ˆθ θ] 2 ) Can be re-expressed MSE(ˆθ) = Var(ˆθ) + (B(ˆθ) 2 ) MSE = VAR + BIAS 2 Proof MSE(ˆθ) = E((ˆθ θ) 2 ) = E(([ˆθ E(ˆθ)]

More information

Association studies and regression

Association studies and regression Association studies and regression CM226: Machine Learning for Bioinformatics. Fall 2016 Sriram Sankararaman Acknowledgments: Fei Sha, Ameet Talwalkar Association studies and regression 1 / 104 Administration

More information

Advanced Econometrics I

Advanced Econometrics I Lecture Notes Autumn 2010 Dr. Getinet Haile, University of Mannheim 1. Introduction Introduction & CLRM, Autumn Term 2010 1 What is econometrics? Econometrics = economic statistics economic theory mathematics

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance:

Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = The errors are uncorrelated with common variance: 8. PROPERTIES OF LEAST SQUARES ESTIMATES 1 Basic Distributional Assumptions of the Linear Model: 1. The errors are unbiased: E[ε] = 0. 2. The errors are uncorrelated with common variance: These assumptions

More information

Lecture 3: Multiple Regression

Lecture 3: Multiple Regression Lecture 3: Multiple Regression R.G. Pierse 1 The General Linear Model Suppose that we have k explanatory variables Y i = β 1 + β X i + β 3 X 3i + + β k X ki + u i, i = 1,, n (1.1) or Y i = β j X ji + u

More information

STAT 100C: Linear models

STAT 100C: Linear models STAT 100C: Linear models Arash A. Amini June 9, 2018 1 / 56 Table of Contents Multiple linear regression Linear model setup Estimation of β Geometric interpretation Estimation of σ 2 Hat matrix Gram matrix

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that Linear Regression For (X, Y ) a pair of random variables with values in R p R we assume that E(Y X) = β 0 + with β R p+1. p X j β j = (1, X T )β j=1 This model of the conditional expectation is linear

More information

Master s Written Examination

Master s Written Examination Master s Written Examination Option: Statistics and Probability Spring 016 Full points may be obtained for correct answers to eight questions. Each numbered question which may have several parts is worth

More information

Introduction to Simple Linear Regression

Introduction to Simple Linear Regression Introduction to Simple Linear Regression Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Introduction to Simple Linear Regression 1 / 68 About me Faculty in the Department

More information

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014

Regression. ECO 312 Fall 2013 Chris Sims. January 12, 2014 ECO 312 Fall 2013 Chris Sims Regression January 12, 2014 c 2014 by Christopher A. Sims. This document is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License What

More information

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018

Econometrics I KS. Module 2: Multivariate Linear Regression. Alexander Ahammer. This version: April 16, 2018 Econometrics I KS Module 2: Multivariate Linear Regression Alexander Ahammer Department of Economics Johannes Kepler University of Linz This version: April 16, 2018 Alexander Ahammer (JKU) Module 2: Multivariate

More information

Regression Models - Introduction

Regression Models - Introduction Regression Models - Introduction In regression models there are two types of variables that are studied: A dependent variable, Y, also called response variable. It is modeled as random. An independent

More information

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes Maximum Likelihood Estimation Econometrics II Department of Economics Universidad Carlos III de Madrid Máster Universitario en Desarrollo y Crecimiento Económico Outline 1 3 4 General Approaches to Parameter

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Central Limit Theorem ( 5.3)

Central Limit Theorem ( 5.3) Central Limit Theorem ( 5.3) Let X 1, X 2,... be a sequence of independent random variables, each having n mean µ and variance σ 2. Then the distribution of the partial sum S n = X i i=1 becomes approximately

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Review. December 4 th, Review

Review. December 4 th, Review December 4 th, 2017 Att. Final exam: Course evaluation Friday, 12/14/2018, 10:30am 12:30pm Gore Hall 115 Overview Week 2 Week 4 Week 7 Week 10 Week 12 Chapter 6: Statistics and Sampling Distributions Chapter

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

The regression model with one fixed regressor cont d

The regression model with one fixed regressor cont d The regression model with one fixed regressor cont d 3150/4150 Lecture 4 Ragnar Nymoen 27 January 2012 The model with transformed variables Regression with transformed variables I References HGL Ch 2.8

More information

Stat 5102 Final Exam May 14, 2015

Stat 5102 Final Exam May 14, 2015 Stat 5102 Final Exam May 14, 2015 Name Student ID The exam is closed book and closed notes. You may use three 8 1 11 2 sheets of paper with formulas, etc. You may also use the handouts on brand name distributions

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Lecture 14 Simple Linear Regression

Lecture 14 Simple Linear Regression Lecture 4 Simple Linear Regression Ordinary Least Squares (OLS) Consider the following simple linear regression model where, for each unit i, Y i is the dependent variable (response). X i is the independent

More information

Statistical Inference with Regression Analysis

Statistical Inference with Regression Analysis Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Steven Buck Lecture #13 Statistical Inference with Regression Analysis Next we turn to calculating confidence intervals and hypothesis testing

More information

Ch 3: Multiple Linear Regression

Ch 3: Multiple Linear Regression Ch 3: Multiple Linear Regression 1. Multiple Linear Regression Model Multiple regression model has more than one regressor. For example, we have one response variable and two regressor variables: 1. delivery

More information

Practice Problems Section Problems

Practice Problems Section Problems Practice Problems Section 4-4-3 4-4 4-5 4-6 4-7 4-8 4-10 Supplemental Problems 4-1 to 4-9 4-13, 14, 15, 17, 19, 0 4-3, 34, 36, 38 4-47, 49, 5, 54, 55 4-59, 60, 63 4-66, 68, 69, 70, 74 4-79, 81, 84 4-85,

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf

Qualifying Exam in Probability and Statistics. https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part : Sample Problems for the Elementary Section of Qualifying Exam in Probability and Statistics https://www.soa.org/files/edu/edu-exam-p-sample-quest.pdf Part 2: Sample Problems for the Advanced Section

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

Multivariate Analysis and Likelihood Inference

Multivariate Analysis and Likelihood Inference Multivariate Analysis and Likelihood Inference Outline 1 Joint Distribution of Random Variables 2 Principal Component Analysis (PCA) 3 Multivariate Normal Distribution 4 Likelihood Inference Joint density

More information

Correlation in Linear Regression

Correlation in Linear Regression Vrije Universiteit Amsterdam Research Paper Correlation in Linear Regression Author: Yura Perugachi-Diaz Student nr.: 2566305 Supervisor: Dr. Bartek Knapik May 29, 2017 Faculty of Sciences Research Paper

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

3 Multiple Linear Regression

3 Multiple Linear Regression 3 Multiple Linear Regression 3.1 The Model Essentially, all models are wrong, but some are useful. Quote by George E.P. Box. Models are supposed to be exact descriptions of the population, but that is

More information

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Statistics - Lecture One. Outline. Charlotte Wickham  1. Basic ideas about estimation Statistics - Lecture One Charlotte Wickham wickham@stat.berkeley.edu http://www.stat.berkeley.edu/~wickham/ Outline 1. Basic ideas about estimation 2. Method of Moments 3. Maximum Likelihood 4. Confidence

More information

Mathematical statistics

Mathematical statistics October 4 th, 2018 Lecture 12: Information Where are we? Week 1 Week 2 Week 4 Week 7 Week 10 Week 14 Probability reviews Chapter 6: Statistics and Sampling Distributions Chapter 7: Point Estimation Chapter

More information

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models

SCHOOL OF MATHEMATICS AND STATISTICS. Linear and Generalised Linear Models SCHOOL OF MATHEMATICS AND STATISTICS Linear and Generalised Linear Models Autumn Semester 2017 18 2 hours Attempt all the questions. The allocation of marks is shown in brackets. RESTRICTED OPEN BOOK EXAMINATION

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama

Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Qualifying Exam CS 661: System Simulation Summer 2013 Prof. Marvin K. Nakayama Instructions This exam has 7 pages in total, numbered 1 to 7. Make sure your exam has all the pages. This exam will be 2 hours

More information

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed

Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 18.466 Notes, March 4, 2013, R. Dudley Maximum likelihood estimation: actual or supposed 1. MLEs in exponential families Let f(x,θ) for x X and θ Θ be a likelihood function, that is, for present purposes,

More information

Problem 1 (20) Log-normal. f(x) Cauchy

Problem 1 (20) Log-normal. f(x) Cauchy ORF 245. Rigollet Date: 11/21/2008 Problem 1 (20) f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.0 0.2 0.4 0.6 0.8 4 2 0 2 4 Normal (with mean -1) 4 2 0 2 4 Negative-exponential x x f(x) f(x) 0.0 0.1 0.2 0.3 0.4 0.5

More information

Simple and Multiple Linear Regression

Simple and Multiple Linear Regression Sta. 113 Chapter 12 and 13 of Devore March 12, 2010 Table of contents 1 Simple Linear Regression 2 Model Simple Linear Regression A simple linear regression model is given by Y = β 0 + β 1 x + ɛ where

More information

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics

Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics Test Code: STA/STB (Short Answer Type) 2013 Junior Research Fellowship for Research Course in Statistics The candidates for the research course in Statistics will have to take two shortanswer type tests

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff IEOR 165 Lecture 7 Bias-Variance Tradeoff 1 Bias-Variance Tradeoff Consider the case of parametric regression with β R, and suppose we would like to analyze the error of the estimate ˆβ in comparison to

More information

Regression Estimation Least Squares and Maximum Likelihood

Regression Estimation Least Squares and Maximum Likelihood Regression Estimation Least Squares and Maximum Likelihood Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 3, Slide 1 Least Squares Max(min)imization Function to minimize

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

The outline for Unit 3

The outline for Unit 3 The outline for Unit 3 Unit 1. Introduction: The regression model. Unit 2. Estimation principles. Unit 3: Hypothesis testing principles. 3.1 Wald test. 3.2 Lagrange Multiplier. 3.3 Likelihood Ratio Test.

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Mathematical statistics

Mathematical statistics October 18 th, 2018 Lecture 16: Midterm review Countdown to mid-term exam: 7 days Week 1 Chapter 1: Probability review Week 2 Week 4 Week 7 Chapter 6: Statistics Chapter 7: Point Estimation Chapter 8:

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley

Review of Classical Least Squares. James L. Powell Department of Economics University of California, Berkeley Review of Classical Least Squares James L. Powell Department of Economics University of California, Berkeley The Classical Linear Model The object of least squares regression methods is to model and estimate

More information

Week 2: Review of probability and statistics

Week 2: Review of probability and statistics Week 2: Review of probability and statistics Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ALL RIGHTS RESERVED

More information

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata'

Business Statistics. Tommaso Proietti. Linear Regression. DEF - Università di Roma 'Tor Vergata' Business Statistics Tommaso Proietti DEF - Università di Roma 'Tor Vergata' Linear Regression Specication Let Y be a univariate quantitative response variable. We model Y as follows: Y = f(x) + ε where

More information

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood

Regression Estimation - Least Squares and Maximum Likelihood. Dr. Frank Wood Regression Estimation - Least Squares and Maximum Likelihood Dr. Frank Wood Least Squares Max(min)imization Function to minimize w.r.t. β 0, β 1 Q = n (Y i (β 0 + β 1 X i )) 2 i=1 Minimize this by maximizing

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression In simple linear regression we are concerned about the relationship between two variables, X and Y. There are two components to such a relationship. 1. The strength of the relationship.

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

Masters Comprehensive Examination Department of Statistics, University of Florida

Masters Comprehensive Examination Department of Statistics, University of Florida Masters Comprehensive Examination Department of Statistics, University of Florida May 6, 003, 8:00 am - :00 noon Instructions: You have four hours to answer questions in this examination You must show

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Chapter 1 Linear Regression with One Predictor

Chapter 1 Linear Regression with One Predictor STAT 525 FALL 2018 Chapter 1 Linear Regression with One Predictor Professor Min Zhang Goals of Regression Analysis Serve three purposes Describes an association between X and Y In some applications, the

More information

Lecture 15. Hypothesis testing in the linear model

Lecture 15. Hypothesis testing in the linear model 14. Lecture 15. Hypothesis testing in the linear model Lecture 15. Hypothesis testing in the linear model 1 (1 1) Preliminary lemma 15. Hypothesis testing in the linear model 15.1. Preliminary lemma Lemma

More information

This paper is not to be removed from the Examination Halls

This paper is not to be removed from the Examination Halls ~~ST104B ZA d0 This paper is not to be removed from the Examination Halls UNIVERSITY OF LONDON ST104B ZB BSc degrees and Diplomas for Graduates in Economics, Management, Finance and the Social Sciences,

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 15: Examples of hypothesis tests (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 32 The recipe 2 / 32 The hypothesis testing recipe In this lecture we repeatedly apply the

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information

Review of Econometrics

Review of Econometrics Review of Econometrics Zheng Tian June 5th, 2017 1 The Essence of the OLS Estimation Multiple regression model involves the models as follows Y i = β 0 + β 1 X 1i + β 2 X 2i + + β k X ki + u i, i = 1,...,

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

Lectures on Simple Linear Regression Stat 431, Summer 2012

Lectures on Simple Linear Regression Stat 431, Summer 2012 Lectures on Simple Linear Regression Stat 43, Summer 0 Hyunseung Kang July 6-8, 0 Last Updated: July 8, 0 :59PM Introduction Previously, we have been investigating various properties of the population

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Econometrics Summary Algebraic and Statistical Preliminaries

Econometrics Summary Algebraic and Statistical Preliminaries Econometrics Summary Algebraic and Statistical Preliminaries Elasticity: The point elasticity of Y with respect to L is given by α = ( Y/ L)/(Y/L). The arc elasticity is given by ( Y/ L)/(Y/L), when L

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Introduction to Maximum Likelihood Estimation

Introduction to Maximum Likelihood Estimation Introduction to Maximum Likelihood Estimation Eric Zivot July 26, 2012 The Likelihood Function Let 1 be an iid sample with pdf ( ; ) where is a ( 1) vector of parameters that characterize ( ; ) Example:

More information

MEI Exam Review. June 7, 2002

MEI Exam Review. June 7, 2002 MEI Exam Review June 7, 2002 1 Final Exam Revision Notes 1.1 Random Rules and Formulas Linear transformations of random variables. f y (Y ) = f x (X) dx. dg Inverse Proof. (AB)(AB) 1 = I. (B 1 A 1 )(AB)(AB)

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Chapter 3: Multiple Regression. August 14, 2018

Chapter 3: Multiple Regression. August 14, 2018 Chapter 3: Multiple Regression August 14, 2018 1 The multiple linear regression model The model y = β 0 +β 1 x 1 + +β k x k +ǫ (1) is called a multiple linear regression model with k regressors. The parametersβ

More information

Introduction to Estimation Methods for Time Series models. Lecture 1

Introduction to Estimation Methods for Time Series models. Lecture 1 Introduction to Estimation Methods for Time Series models Lecture 1 Fulvio Corsi SNS Pisa Fulvio Corsi Introduction to Estimation () Methods for Time Series models Lecture 1 SNS Pisa 1 / 19 Estimation

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Regression and Statistical Inference

Regression and Statistical Inference Regression and Statistical Inference Walid Mnif wmnif@uwo.ca Department of Applied Mathematics The University of Western Ontario, London, Canada 1 Elements of Probability 2 Elements of Probability CDF&PDF

More information