STAT 350. Assignment 6 Solutions

Size: px
Start display at page:

Download "STAT 350. Assignment 6 Solutions"

Transcription

1 STAT 350 Assignment 6 Solutions 1. For the Nitrogen Output in Wallabies data set from Assignment 3 do forward, backward, stepwise all subsets regression. Here is code for all the methods with all subsets done both using C p using adjusted R 2. data nit; infile nit.dat ; input nitexc weight dryin wetin nitin ; proc reg data=nit; model nitexc = weight dryin wetin nitin /selection=forward; run ; proc reg data=nit; model nitexc = weight dryin wetin nitin /selection=backward; run ; proc reg data=nit; model nitexc = weight dryin wetin nitin /selection=stepwise; run ; proc reg data=nit; model nitexc = weight dryin wetin nitin /selection=cp; run ; proc reg data=nit; model nitexc = weight dryin wetin nitin /selection=adjrsq; run ; The conclusion of the output Forward Selection Procedure for Dependent Variable NITEXC Step 1 Variable NITIN Entered R-square = C(p) = Regression Error

2 INTERCEP NITIN Bounds on condition number: 1, 1 Step 2 Variable WETIN Entered R-square = C(p) = Regression Error INTERCEP WETIN NITIN Bounds on condition number: , No other variable met the significance level for entry into the model. Summary of Forward Selection Procedure for Dependent Variable NITEXC Variable Number Partial Model Step Entered In R**2 R**2 C(p) F Prob>F 1 NITIN WETIN Backward Elimination Procedure for Dependent Variable NITEXC Step 0 All Variables Entered R-square = C(p) =

3 Regression Error INTERCEP WEIGHT DRYIN WETIN NITIN Bounds on condition number: , Step 1 Variable WEIGHT Removed R-square = C(p) = Regression Error INTERCEP DRYIN WETIN NITIN Bounds on condition number: , Step 2 Variable DRYIN Removed R-square = C(p) = Regression Error

4 INTERCEP WETIN NITIN Bounds on condition number: , Step 3 Variable WETIN Removed R-square = C(p) = Regression Error INTERCEP NITIN Bounds on condition number: 1, 1 All variables left in the model are significant at the level. Summary of Backward Elimination Procedure for Dependent Variable NITEXC Variable Number Partial Model Step Removed In R**2 R**2 C(p) F Prob>F 1 WEIGHT DRYIN WETIN Stepwise Procedure for Dependent Variable NITEXC Step 1 Variable NITIN Entered R-square = C(p) =

5 Regression Error INTERCEP NITIN Bounds on condition number: 1, 1 All variables left in the model are significant at the level. No other variable met the significance level for entry into the model. Summary of Stepwise Procedure for Dependent Variable NITEXC Variable Number Partial Model Step Entered Removed In R**2 R**2 C(p) F Prob>F 1 NITIN N = 25 Regression Models for Dependent Variable: NITEXC C(p) R-square Variables in Model In NITIN WETIN NITIN DRYIN NITIN WEIGHT NITIN DRYIN WETIN NITIN WEIGHT WETIN NITIN WEIGHT DRYIN NITIN WEIGHT DRYIN WETIN NITIN WEIGHT DRYIN WETIN DRYIN WETIN DRYIN WEIGHT DRYIN WEIGHT WETIN WETIN WEIGHT 5

6 N = 25 Regression Models for Dependent Variable: NITEXC Adjusted R-square Variables in Model R-square In WETIN NITIN NITIN DRYIN NITIN WEIGHT NITIN DRYIN WETIN NITIN WEIGHT WETIN NITIN WEIGHT DRYIN NITIN WEIGHT DRYIN WETIN NITIN WEIGHT DRYIN WETIN DRYIN WETIN DRYIN WEIGHT DRYIN WETIN WEIGHT WETIN WEIGHT is that BACKWARD STEPWISE settle for the model containing only Nitrogen Intake as a predictor. The forward selection method also includes Wet Intake because of the very high level of α (0.5) to enter. The all subsets method using C p would settle on the model using only nitrogen intake but the adjusted R 2 method also includes Wet Intake. However, overall there seems little reason to include Wet Intake since it improves the fit very little is not very significant at all. 2. Suppose X 1, X 2, X 3 are independent N(µ, σ 2 ) rom variables, so that X i = µ + σz i with Z 1, Z 2, Z 3 independent stard normals. (a) If X T = (X 1, X 2, X 3 ) Z T = (Z 1, Z 2, Z 3 ) express X in the form AZ + b for a suitable matrix A vector b. We have A = σi b T = [µ, µ, µ]. (b) Show that X is MV N 3 (µ X, Σ X ) identify µ X Σ X. The definition of MVN is that X be of the form AZ + b then µ X = b Σ X = AA T. So µ T x = [µ, µ, µ] Σ X = σ 2 I. (c) Let Y i = X i X for i = 1, 2, 3 Y 4 = X. Show that Y MV N 4 (µ Y, Σ Y ) find µ Y Σ Y. 6

7 The definition of MVN is that X be of the form AZ +b. Then if Y = BX we have Y = B(AZ + b) = (BA)Z + Bb so that Y is MVN with mean µ Y = Bb = E(Y ) Σ Y = (BA)(BA) T = BAA T B T. In this case we find So B = µ Y = E(Y ) = Σ Y = σ 2 BB T = µ 2/3 1/3 1/3 1/3 2/3 1/3 1/3 1/3 2/3 1/3 1/3 1/3 2/3 1/3 1/3 0 1/3 2/3 1/3 0 1/3 1/3 2/ /3 (d) In class I may have stated that the if the covariance between two components of a multivariate normal vector is 0 then the components are independent, but I indicated a proof only when the multivariate normal distribution in question has a density. In this case the variance matrix is singular so there is no density. However, in terms of the original Z it is possible to find two independent functions of Z such that Y 1, Y 2, Y 3 are a function of the first function while Y 4 is a function of the second. i. Let U 1 = (Z 1 Z 2 )/ 2, U 2 = (Z 1 + Z 2 2Z 3 )/ 6 U 3 = (Z 1 + Z 2 + Z 3 )/3. Show that U = (U 1, U 2, U 3 ) T has a multivariate normal distribution identify the mean variance of U. We have U = AZ where 1/ 2 1/ 2 0 A = 1/ 6 1/ 6 2/ 6 1/3 1/3 1/3 Thus U is multivariate normal with mean 0 variance covariance matrix AA T. Multiply this out to check this is Σ = AA T = /3 ii. Use the result in class, for multivariate normals which have a density to show that (U 1, U 2 ) is independent of U 3. Since the variance covariance matrix is not singular we need only check that Cov(U 1, U 3 ) = Cov(U 2, U 3 ) = 0. These two entries in Σ are, indeed, 0. 7.

8 iii. Express Y 3 as a function of U. We have Y 3 = X 3 X = σ(z 3 Z). Now 3U 3 6U 2 = 3Z 3 Z = U 3. Thus Y 3 = σ(u 3 6U 2 /3 U 3 ) = σ 6U 2 /3. iv. Use the fact that if X 1 X 2 are independent then so are G(X 1 ) H(X 2 ) for any functions G H to show that Y 1, Y 2, Y 3 is independent of Y 4. We see that Y 4 is a function of U 3. I claim that Y 1, Y 2 Y 3 are each functions of U 1, U 2. You did Y 3 in the last part. Notice that Write then These show Y 2 = X 2 X = σ(z 2 Z) Y 1 = X 1 X = σ(z 1 Z). 2U 3 + 6U 2 /3 = Z 1 + Z 2 2U 3 + 6U 2 /3 + 2U 1 = 2Z 1 2U 3 + 6U 2 /3 2U 1 = 2Z 2 Z 1 Z = 6U 2 /6 + 2U 1 /2 Z 2 Z = 6U 2 /6 2U 1 /2 so both Y 1 Y 2 are functions of U 1 U 2. Since Y 1, Y 2, Y 3 is a function of U 1, U 2 Y 4 is a function of U 3 we see the desired independence. v. Express the sample variance of the X i, i = 1, 2, 3 in terms of U use this to show that (n 1)s 2 X/σ 2 has a χ 2 distribution on 2 degrees of freedom (with n = 3). Note: in fact the sample variance of X 1, X 2 is a function of U 1. Generalizations of this idea can be used to develop an identity of the form (n 1)s 2 n = (n 2)s 2 n 1 + U 2 n for a suitable U n where s 2 n is the sample variance for X 1,..., X n. 8

9 The sample variance is s 2 = Y Y2 2 + Y3 2 2 Replace each Y i by the formulas in the previous part to get that 2s 2 σ 2 = (Z 1 Z) 2 + (Z 2 Z) 2 + (Z 3 Z) 2 Write this in terms of U 1 U 2 to get 2s 2 σ 2 = { 6U2 /6 + 2U 1 /2 } 2 + { 6U2 /6 2U 1 /2 } 2 + { 6U2 /3 } 2 = U 2 1 +U 2 2 This is a sum of squares of two independent normals each with mean 0 variance 1 so we are done. 3. In class I discussed the general formula for a multivariate normal density. Suppose that Z 1 Z 2 are independent stard normal variables. Assume that X 1 = az 1 +bz 2 +c X 2 = dz 1 +ez 2 +f. Find the joint density of X 1 X 2 by evaluating the formulas I gave in class. Express P (X 1 t) as a double integral. I want to see the integr the limits of integration but you need not try to do the integral. The density in class was [ 1 2π det(aa T ) exp 1 ] 2 (x µ)t Σ 1 (x µ) where Σ = AA T. The entries in µ are c f we have [ a b A = d e ] Then Σ = AA T = (AA T ) 1 = [ a 2 + b 2 ad + be ad + be d 2 + e 2 d 2 +e 2 (ae bd) 2 Putting together all the algebra gives [ 1 f(x 1, x 2 ) = 2π ae bd exp where the exponent is the quadratic function ad+be (ae bd) 2 ad+be (ae bd) 2 a 2 +b 2 (ae bd) ] ] q(x 1, x 2, a, b, c, d, e, f). (ae bd) 2 q(x 1, x 2, a, b, c, d, e, f) = [(x 1 c) 2 (d 2 + e 2 ) 9 2(ad + be)(x 1 c)(x 2 f) +(x 2 f) 2 (a 2 + b 2 )].

10 To compute P (X 1 t) we take the joint density of X 1 X 2 integrate it over the set of (x 1, x 2 ) such that x 1 t to get P (X 1 t) = t f(x 1, x 2 )dx 1 dx Power sample size calculations must be done before the data are gathered. However: pilot studies are often used to determine the size of the unknown parameters which are needed for these calculations. Use the S Fibre Hardness data discussed in class as follows. (a) Consider the model Y i = β 0 + β 1 S i + β 2 F i + β 3 F 2 i Fit this model to get estimates of all the βs of σ. Use these fitted values as if they were the true parameter values in the following. i. Compute the power of a two sided t test (at the 5% level) of the hypothesis that β 3 = 0 for We need the non-centrality parameter + ɛ i β 3 { 0.006, 0.003, 0, 0.003, 0.006}. δ = β 3 σ a T (X T X) 1 a where a T = [ ]. We get β 3 values from the question the denominator from the stard error of ˆβ 3. I find that stard error to be This gives δ values equal to -3,-1.5,0,1.5,3 (close enough). Turning to Table B.5 with 14 degrees of freedom for error I get powers 0.8, somewhere between , 0.05, somewhere between The somewheres are, I guess in the range of ii. Find a number m of copies of the basic design (each combination of S Fibre tried twice) to guarantee that the power of the t-test of the hypothesis β 3 = 0 is 0.9 when the true parameter values are as in the fit above. The fitted value of β 3 is the corresponding value of δ is just the t statistic in the computer output for testing β 3 = 0. This is though we use in the tables because our tests are to be two sided. We need m to give us a power of 0.9. For large numbers of degrees of freedom for error this power is about half way between the figure under δ = 4 the figure under δ = 3 so we solve m = 3.5 get m = 3.5. We would have to use m = 4 giving δ = 3.74 a power closer to However we could treat the basic design as having 9 points try 63 points in total (7 at each combination of S F) get a power quite close to

11 (b) Now consider the model Y i = β 0 + β 1 S i + β 2 F i + β 3 F 2 i + β 4 S 2 i + β 5 S i F i + ɛ i Fit this model to get estimates of all the βs of σ. Use these fitted values as if they were the true parameter values in the following. Find a number m of copies of the basic design (each combination of S Fibre tried twice) to guarantee that the power of the F -test of the hypothesis β 3 = β 4 = β 5 = 0 is 0.9 when the true parameter values are as in this fit. We can use table B.11 on page 1338 since we will have 3 numerator degrees of freedom for this F test. We will have 18m 6 degrees of freedom for error so will need a φ, in the notation of the table, quite close to 2. This makes δ 2 = (p+1)2 2 = 16. The estimates are ˆβ 3 = , ˆβ 4 = ˆβ 5 = To get our value of δ 2 for the basic design I regress ˆβ 3 F 2 + ˆβ 4 S 2 + ˆβ 5 SF on S F find the error sum of squares is Divide this by ˆσ 2 = 6.77 to find δ 2 = I need to get up to 16 so I need m = 4. 11

MLES & Multivariate Normal Theory

MLES & Multivariate Normal Theory Merlise Clyde September 6, 2016 Outline Expectations of Quadratic Forms Distribution Linear Transformations Distribution of estimates under normality Properties of MLE s Recap Ŷ = ˆµ is an unbiased estimate

More information

STAT 350. Assignment 4

STAT 350. Assignment 4 STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2

Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution. e z2 /2. f Z (z) = 1 2π. e z2 i /2 Next tool is Partial ACF; mathematical tools first. The Multivariate Normal Distribution Defn: Z R 1 N(0,1) iff f Z (z) = 1 2π e z2 /2 Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) (a column

More information

The Multivariate Normal Distribution. In this case according to our theorem

The Multivariate Normal Distribution. In this case according to our theorem The Multivariate Normal Distribution Defn: Z R 1 N(0, 1) iff f Z (z) = 1 2π e z2 /2. Defn: Z R p MV N p (0, I) if and only if Z = (Z 1,..., Z p ) T with the Z i independent and each Z i N(0, 1). In this

More information

Statistics 262: Intermediate Biostatistics Model selection

Statistics 262: Intermediate Biostatistics Model selection Statistics 262: Intermediate Biostatistics Model selection Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Today s class Model selection. Strategies for model selection.

More information

Tropical Polynomials

Tropical Polynomials 1 Tropical Arithmetic Tropical Polynomials Los Angeles Math Circle, May 15, 2016 Bryant Mathews, Azusa Pacific University In tropical arithmetic, we define new addition and multiplication operations on

More information

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices

Lecture 13: Simple Linear Regression in Matrix Format. 1 Expectations and Variances with Vectors and Matrices Lecture 3: Simple Linear Regression in Matrix Format To move beyond simple regression we need to use matrix algebra We ll start by re-expressing simple linear regression in matrix form Linear algebra is

More information

The Multivariate Gaussian Distribution

The Multivariate Gaussian Distribution The Multivariate Gaussian Distribution Chuong B. Do October, 8 A vector-valued random variable X = T X X n is said to have a multivariate normal or Gaussian) distribution with mean µ R n and covariance

More information

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B

Problems. Suppose both models are fitted to the same data. Show that SS Res, A SS Res, B Simple Linear Regression 35 Problems 1 Consider a set of data (x i, y i ), i =1, 2,,n, and the following two regression models: y i = β 0 + β 1 x i + ε, (i =1, 2,,n), Model A y i = γ 0 + γ 1 x i + γ 2

More information

STAT 450. Moment Generating Functions

STAT 450. Moment Generating Functions STAT 450 Moment Generating Functions There are many uses of generating functions in mathematics. We often study the properties of a sequence a n of numbers by creating the function a n s n n0 In statistics

More information

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:.

Statement: With my signature I confirm that the solutions are the product of my own work. Name: Signature:. MATHEMATICAL STATISTICS Homework assignment Instructions Please turn in the homework with this cover page. You do not need to edit the solutions. Just make sure the handwriting is legible. You may discuss

More information

Sampling Distributions

Sampling Distributions Merlise Clyde Duke University September 8, 2016 Outline Topics Normal Theory Chi-squared Distributions Student t Distributions Readings: Christensen Apendix C, Chapter 1-2 Prostate Example > library(lasso2);

More information

The Multivariate Gaussian Distribution [DRAFT]

The Multivariate Gaussian Distribution [DRAFT] The Multivariate Gaussian Distribution DRAFT David S. Rosenberg Abstract This is a collection of a few key and standard results about multivariate Gaussian distributions. I have not included many proofs,

More information

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method.

STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. STAT 135 Lab 13 (Review) Linear Regression, Multivariate Random Variables, Prediction, Logistic Regression and the δ-method. Rebecca Barter May 5, 2015 Linear Regression Review Linear Regression Review

More information

Model Selection Procedures

Model Selection Procedures Model Selection Procedures Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Model Selection Procedures Consider a regression setting with K potential predictor variables and you wish to explore

More information

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation.

Simple Regression Model Setup Estimation Inference Prediction. Model Diagnostic. Multiple Regression. Model Setup and Estimation. Statistical Computation Math 475 Jimin Ding Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html October 10, 2013 Ridge Part IV October 10, 2013 1

More information

A Significance Test for the Lasso

A Significance Test for the Lasso A Significance Test for the Lasso Lockhart R, Taylor J, Tibshirani R, and Tibshirani R Ashley Petersen May 14, 2013 1 Last time Problem: Many clinical covariates which are important to a certain medical

More information

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38

Preliminaries. Copyright c 2018 Dan Nettleton (Iowa State University) Statistics / 38 Preliminaries Copyright c 2018 Dan Nettleton (Iowa State University) Statistics 510 1 / 38 Notation for Scalars, Vectors, and Matrices Lowercase letters = scalars: x, c, σ. Boldface, lowercase letters

More information

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36 The Multivariate Normal Distribution Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 36 The Moment Generating Function (MGF) of a random vector X is given by M X (t) = E(e t X

More information

Probability Background

Probability Background Probability Background Namrata Vaswani, Iowa State University August 24, 2015 Probability recap 1: EE 322 notes Quick test of concepts: Given random variables X 1, X 2,... X n. Compute the PDF of the second

More information

Properties of the least squares estimates

Properties of the least squares estimates Properties of the least squares estimates 2019-01-18 Warmup Let a and b be scalar constants, and X be a scalar random variable. Fill in the blanks E ax + b) = Var ax + b) = Goal Recall that the least squares

More information

STA442/2101: Assignment 5

STA442/2101: Assignment 5 STA442/2101: Assignment 5 Craig Burkett Quiz on: Oct 23 rd, 2015 The questions are practice for the quiz next week, and are not to be handed in. I would like you to bring in all of the code you used to

More information

Mathematical Focus 1 Complex numbers adhere to certain arithmetic properties for which they and their complex conjugates are defined.

Mathematical Focus 1 Complex numbers adhere to certain arithmetic properties for which they and their complex conjugates are defined. Situation: Complex Roots in Conjugate Pairs Prepared at University of Georgia Center for Proficiency in Teaching Mathematics June 30, 2013 Sarah Major Prompt: A teacher in a high school Algebra class has

More information

Matrix Approach to Simple Linear Regression: An Overview

Matrix Approach to Simple Linear Regression: An Overview Matrix Approach to Simple Linear Regression: An Overview Aspects of matrices that you should know: Definition of a matrix Addition/subtraction/multiplication of matrices Symmetric/diagonal/identity matrix

More information

In this course we: do distribution theory when ǫ i N(0, σ 2 ) discuss what if the errors, ǫ i are not normal? omit proofs.

In this course we: do distribution theory when ǫ i N(0, σ 2 ) discuss what if the errors, ǫ i are not normal? omit proofs. Distribution Theory Question: What is distribution theory? Answer: How to compute the distribution of an estimator, test or other statistic, T : Find P(T t), the Cumulative Distribution Function (CDF)

More information

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36

The Multivariate Normal Distribution. Copyright c 2012 Dan Nettleton (Iowa State University) Statistics / 36 The Multivariate Normal Distribution Copyright c 2012 Dan Nettleton (Iowa State University) Statistics 611 1 / 36 The Moment Generating Function (MGF) of a random vector X is given by M X (t) = E(e t X

More information

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1)

Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Regression #5: Confidence Intervals and Hypothesis Testing (Part 1) Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #5 1 / 24 Introduction What is a confidence interval? To fix ideas, suppose

More information

Response Surface Methodology

Response Surface Methodology Response Surface Methodology Bruce A Craig Department of Statistics Purdue University STAT 514 Topic 27 1 Response Surface Methodology Interested in response y in relation to numeric factors x Relationship

More information

STAT 830 The Multivariate Normal Distribution

STAT 830 The Multivariate Normal Distribution STAT 830 The Multivariate Normal Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2013 Richard Lockhart (Simon Fraser University)STAT 830 The Multivariate Normal Distribution STAT 830

More information

Chapter 1. Linear Regression with One Predictor Variable

Chapter 1. Linear Regression with One Predictor Variable Chapter 1. Linear Regression with One Predictor Variable 1.1 Statistical Relation Between Two Variables To motivate statistical relationships, let us consider a mathematical relation between two mathematical

More information

Interpreting Regression Results

Interpreting Regression Results Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Notes on Random Vectors and Multivariate Normal

Notes on Random Vectors and Multivariate Normal MATH 590 Spring 06 Notes on Random Vectors and Multivariate Normal Properties of Random Vectors If X,, X n are random variables, then X = X,, X n ) is a random vector, with the cumulative distribution

More information

Stat 366 A1 (Fall 2006) Midterm Solutions (October 23) page 1

Stat 366 A1 (Fall 2006) Midterm Solutions (October 23) page 1 Stat 366 A1 Fall 6) Midterm Solutions October 3) page 1 1. The opening prices per share Y 1 and Y measured in dollars) of two similar stocks are independent random variables, each with a density function

More information

Partial Fraction Decomposition

Partial Fraction Decomposition Partial Fraction Decomposition As algebra students we have learned how to add and subtract fractions such as the one show below, but we probably have not been taught how to break the answer back apart

More information

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015

Part IB Statistics. Theorems with proof. Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua. Lent 2015 Part IB Statistics Theorems with proof Based on lectures by D. Spiegelhalter Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly)

More information

Data Mining Stat 588

Data Mining Stat 588 Data Mining Stat 588 Lecture 02: Linear Methods for Regression Department of Statistics & Biostatistics Rutgers University September 13 2011 Regression Problem Quantitative generic output variable Y. Generic

More information

STT 843 Key to Homework 1 Spring 2018

STT 843 Key to Homework 1 Spring 2018 STT 843 Key to Homework Spring 208 Due date: Feb 4, 208 42 (a Because σ = 2, σ 22 = and ρ 2 = 05, we have σ 2 = ρ 2 σ σ22 = 2/2 Then, the mean and covariance of the bivariate normal is µ = ( 0 2 and Σ

More information

Conjugate Analysis for the Linear Model

Conjugate Analysis for the Linear Model Conjugate Analysis for the Linear Model If we have good prior knowledge that can help us specify priors for β and σ 2, we can use conjugate priors. Following the procedure in Christensen, Johnson, Branscum,

More information

ECE Lecture #9 Part 2 Overview

ECE Lecture #9 Part 2 Overview ECE 450 - Lecture #9 Part Overview Bivariate Moments Mean or Expected Value of Z = g(x, Y) Correlation and Covariance of RV s Functions of RV s: Z = g(x, Y); finding f Z (z) Method : First find F(z), by

More information

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr.

Topic 2: Probability & Distributions. Road Map Probability & Distributions. ECO220Y5Y: Quantitative Methods in Economics. Dr. Topic 2: Probability & Distributions ECO220Y5Y: Quantitative Methods in Economics Dr. Nick Zammit University of Toronto Department of Economics Room KN3272 n.zammit utoronto.ca November 21, 2017 Dr. Nick

More information

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution Moment Generating Function STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution T. Linder Queen s University Winter 07 Definition Let X (X,...,X n ) T be a random vector and

More information

MATH 4211/6211 Optimization Basics of Optimization Problems

MATH 4211/6211 Optimization Basics of Optimization Problems MATH 4211/6211 Optimization Basics of Optimization Problems Xiaojing Ye Department of Mathematics & Statistics Georgia State University Xiaojing Ye, Math & Stat, Georgia State University 0 A standard minimization

More information

Core Mathematics 3 Algebra

Core Mathematics 3 Algebra http://kumarmathsweeblycom/ Core Mathematics 3 Algebra Edited by K V Kumaran Core Maths 3 Algebra Page Algebra fractions C3 The specifications suggest that you should be able to do the following: Simplify

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Simple linear regression tries to fit a simple line between two variables Y and X. If X is linearly related to Y this explains some of the variability in Y. In most cases, there

More information

( 3) ( ) ( ) ( ) ( ) ( )

( 3) ( ) ( ) ( ) ( ) ( ) 81 Instruction: Determining the Possible Rational Roots using the Rational Root Theorem Consider the theorem stated below. Rational Root Theorem: If the rational number b / c, in lowest terms, is a root

More information

Appendix A: Review of the General Linear Model

Appendix A: Review of the General Linear Model Appendix A: Review of the General Linear Model The generallinear modelis an important toolin many fmri data analyses. As the name general suggests, this model can be used for many different types of analyses,

More information

7x 5 x 2 x + 2. = 7x 5. (x + 1)(x 2). 4 x

7x 5 x 2 x + 2. = 7x 5. (x + 1)(x 2). 4 x Advanced Integration Techniques: Partial Fractions The method of partial fractions can occasionally make it possible to find the integral of a quotient of rational functions. Partial fractions gives us

More information

Regression. Oscar García

Regression. Oscar García Regression Oscar García Regression methods are fundamental in Forest Mensuration For a more concise and general presentation, we shall first review some matrix concepts 1 Matrices An order n m matrix is

More information

MULTIVARIATE DISTRIBUTIONS

MULTIVARIATE DISTRIBUTIONS Chapter 9 MULTIVARIATE DISTRIBUTIONS John Wishart (1898-1956) British statistician. Wishart was an assistant to Pearson at University College and to Fisher at Rothamsted. In 1928 he derived the distribution

More information

Lecture 6 Multiple Linear Regression, cont.

Lecture 6 Multiple Linear Regression, cont. Lecture 6 Multiple Linear Regression, cont. BIOST 515 January 22, 2004 BIOST 515, Lecture 6 Testing general linear hypotheses Suppose we are interested in testing linear combinations of the regression

More information

The Statistical Property of Ordinary Least Squares

The Statistical Property of Ordinary Least Squares The Statistical Property of Ordinary Least Squares The linear equation, on which we apply the OLS is y t = X t β + u t Then, as we have derived, the OLS estimator is ˆβ = [ X T X] 1 X T y Then, substituting

More information

The Multivariate Normal Distribution 1

The Multivariate Normal Distribution 1 The Multivariate Normal Distribution 1 STA 302 Fall 2017 1 See last slide for copyright information. 1 / 40 Overview 1 Moment-generating Functions 2 Definition 3 Properties 4 χ 2 and t distributions 2

More information

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim

Figure 1: The fitted line using the shipment route-number of ampules data. STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim 0.0 1.0 1.5 2.0 2.5 3.0 8 10 12 14 16 18 20 22 y x Figure 1: The fitted line using the shipment route-number of ampules data STAT5044: Regression and ANOVA The Solution of Homework #2 Inyoung Kim Problem#

More information

Analysis of Experimental Designs

Analysis of Experimental Designs Analysis of Experimental Designs p. 1/? Analysis of Experimental Designs Gilles Lamothe Mathematics and Statistics University of Ottawa Analysis of Experimental Designs p. 2/? Review of Probability A short

More information

ST 740: Linear Models and Multivariate Normal Inference

ST 740: Linear Models and Multivariate Normal Inference ST 740: Linear Models and Multivariate Normal Inference Alyson Wilson Department of Statistics North Carolina State University November 4, 2013 A. Wilson (NCSU STAT) Linear Models November 4, 2013 1 /

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

variability of the model, represented by σ 2 and not accounted for by Xβ

variability of the model, represented by σ 2 and not accounted for by Xβ Posterior Predictive Distribution Suppose we have observed a new set of explanatory variables X and we want to predict the outcomes ỹ using the regression model. Components of uncertainty in p(ỹ y) variability

More information

LIST OF FORMULAS FOR STK1100 AND STK1110

LIST OF FORMULAS FOR STK1100 AND STK1110 LIST OF FORMULAS FOR STK1100 AND STK1110 (Version of 11. November 2015) 1. Probability Let A, B, A 1, A 2,..., B 1, B 2,... be events, that is, subsets of a sample space Ω. a) Axioms: A probability function

More information

[y i α βx i ] 2 (2) Q = i=1

[y i α βx i ] 2 (2) Q = i=1 Least squares fits This section has no probability in it. There are no random variables. We are given n points (x i, y i ) and want to find the equation of the line that best fits them. We take the equation

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

STA 2101/442 Assignment 3 1

STA 2101/442 Assignment 3 1 STA 2101/442 Assignment 3 1 These questions are practice for the midterm and final exam, and are not to be handed in. 1. Suppose X 1,..., X n are a random sample from a distribution with mean µ and variance

More information

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1

Inverse of a Square Matrix. For an N N square matrix A, the inverse of A, 1 Inverse of a Square Matrix For an N N square matrix A, the inverse of A, 1 A, exists if and only if A is of full rank, i.e., if and only if no column of A is a linear combination 1 of the others. A is

More information

EX: Simplify the expression. EX: Simplify the expression. EX: Simplify the expression

EX: Simplify the expression. EX: Simplify the expression. EX: Simplify the expression SIMPLIFYING RADICALS EX: Simplify the expression 84x 4 y 3 1.) Start by creating a factor tree for the constant. In this case 84. Keep factoring until all of your nodes are prime. Two factor trees are

More information

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section #

1 Warm-Up: 2 Adjusted R 2. Introductory Applied Econometrics EEP/IAS 118 Spring Sylvan Herskowitz Section # Introductory Applied Econometrics EEP/IAS 118 Spring 2015 Sylvan Herskowitz Section #10 4-1-15 1 Warm-Up: Remember that exam you took before break? We had a question that said this A researcher wants to

More information

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project

MLR Model Selection. Author: Nicholas G Reich, Jeff Goldsmith. This material is part of the statsteachr project MLR Model Selection Author: Nicholas G Reich, Jeff Goldsmith This material is part of the statsteachr project Made available under the Creative Commons Attribution-ShareAlike 3.0 Unported License: http://creativecommons.org/licenses/by-sa/3.0/deed.en

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84

BIOS 2083 Linear Models Abdus S. Wahed. Chapter 2 84 Chapter 2 84 Chapter 3 Random Vectors and Multivariate Normal Distributions 3.1 Random vectors Definition 3.1.1. Random vector. Random vectors are vectors of random variables. For instance, X = X 1 X 2.

More information

Homoskedasticity. Var (u X) = σ 2. (23)

Homoskedasticity. Var (u X) = σ 2. (23) Homoskedasticity How big is the difference between the OLS estimator and the true parameter? To answer this question, we make an additional assumption called homoskedasticity: Var (u X) = σ 2. (23) This

More information

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression

AMS 315/576 Lecture Notes. Chapter 11. Simple Linear Regression AMS 315/576 Lecture Notes Chapter 11. Simple Linear Regression 11.1 Motivation A restaurant opening on a reservations-only basis would like to use the number of advance reservations x to predict the number

More information

Lecture 13: Simple Linear Regression in Matrix Format

Lecture 13: Simple Linear Regression in Matrix Format See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix

More information

Random Variables and Their Distributions

Random Variables and Their Distributions Chapter 3 Random Variables and Their Distributions A random variable (r.v.) is a function that assigns one and only one numerical value to each simple event in an experiment. We will denote r.vs by capital

More information

Computational functional genomics

Computational functional genomics Computational functional genomics (Spring 2005: Lecture 8) David K. Gifford (Adapted from a lecture by Tommi S. Jaakkola) MIT CSAIL Basic clustering methods hierarchical k means mixture models Multi variate

More information

Economics 620, Lecture 5: exp

Economics 620, Lecture 5: exp 1 Economics 620, Lecture 5: The K-Variable Linear Model II Third assumption (Normality): y; q(x; 2 I N ) 1 ) p(y) = (2 2 ) exp (N=2) 1 2 2(y X)0 (y X) where N is the sample size. The log likelihood function

More information

MS 2001: Test 1 B Solutions

MS 2001: Test 1 B Solutions MS 2001: Test 1 B Solutions Name: Student Number: Answer all questions. Marks may be lost if necessary work is not clearly shown. Remarks by me in italics and would not be required in a test - J.P. Question

More information

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is

Likelihood Methods. 1 Likelihood Functions. The multivariate normal distribution likelihood function is Likelihood Methods 1 Likelihood Functions The multivariate normal distribution likelihood function is The log of the likelihood, say L 1 is Ly = π.5n V.5 exp.5y Xb V 1 y Xb. L 1 = 0.5[N lnπ + ln V +y Xb

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012

Problem Set #6: OLS. Economics 835: Econometrics. Fall 2012 Problem Set #6: OLS Economics 835: Econometrics Fall 202 A preliminary result Suppose we have a random sample of size n on the scalar random variables (x, y) with finite means, variances, and covariance.

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information

Lecture 6: Linear Regression

Lecture 6: Linear Regression Lecture 6: Linear Regression Reading: Sections 3.1-3 STATS 202: Data mining and analysis Jonathan Taylor, 10/5 Slide credits: Sergio Bacallado 1 / 30 Simple linear regression Model: y i = β 0 + β 1 x i

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ST 370 The probability distribution of a random variable gives complete information about its behavior, but its mean and variance are useful summaries. Similarly, the joint probability

More information

Topics in Probability and Statistics

Topics in Probability and Statistics Topics in Probability and tatistics A Fundamental Construction uppose {, P } is a sample space (with probability P), and suppose X : R is a random variable. The distribution of X is the probability P X

More information

The program for the following sections follows.

The program for the following sections follows. Homework 6 nswer sheet Page 31 The program for the following sections follows. dm'log;clear;output;clear'; *************************************************************; *** EXST734 Homework Example 1

More information

Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Order of Operations Expression Variable Coefficient

More information

COS 424: Interacting with Data

COS 424: Interacting with Data COS 424: Interacting with Data Lecturer: Rob Schapire Lecture #14 Scribe: Zia Khan April 3, 2007 Recall from previous lecture that in regression we are trying to predict a real value given our data. Specically,

More information

Asymptotic standard errors of MLE

Asymptotic standard errors of MLE Asymptotic standard errors of MLE Suppose, in the previous example of Carbon and Nitrogen in soil data, that we get the parameter estimates For maximum likelihood estimation, we can use Hessian matrix

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, Linear Regression In this problem sheet, we consider the problem of linear regression with p predictors and one intercept, y = Xβ + ɛ, where y t = (y 1,..., y n ) is the column vector of target values,

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Stat 5101 Notes: Algorithms (thru 2nd midterm)

Stat 5101 Notes: Algorithms (thru 2nd midterm) Stat 5101 Notes: Algorithms (thru 2nd midterm) Charles J. Geyer October 18, 2012 Contents 1 Calculating an Expectation or a Probability 2 1.1 From a PMF........................... 2 1.2 From a PDF...........................

More information

Preliminary Statistics. Lecture 3: Probability Models and Distributions

Preliminary Statistics. Lecture 3: Probability Models and Distributions Preliminary Statistics Lecture 3: Probability Models and Distributions Rory Macqueen (rm43@soas.ac.uk), September 2015 Outline Revision of Lecture 2 Probability Density Functions Cumulative Distribution

More information

Table 1: Fish Biomass data set on 26 streams

Table 1: Fish Biomass data set on 26 streams Math 221: Multiple Regression S. K. Hyde Chapter 27 (Moore, 5th Ed.) The following data set contains observations on the fish biomass of 26 streams. The potential regressors from which we wish to explain

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

2018 2019 1 9 sei@mistiu-tokyoacjp http://wwwstattu-tokyoacjp/~sei/lec-jhtml 11 552 3 0 1 2 3 4 5 6 7 13 14 33 4 1 4 4 2 1 1 2 2 1 1 12 13 R?boxplot boxplotstats which does the computation?boxplotstats

More information