Galaxy Dataset Analysis

Similar documents
Problems from Chapter 3 of Shumway and Stoffer s Book

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

Introduction to Statistics and R

Introduction and Background to Multilevel Analysis

A brief introduction to mixed models

Handout 4: Simple Linear Regression

How to mathematically model a linear relationship and make predictions.

Ch 2: Simple Linear Regression

Holiday Assignment PS 531

Linear models and their mathematical foundations: Simple linear regression

Review of Statistics 101

Introduction and Single Predictor Regression. Correlation

Simple Linear Regression

For right censored data with Y i = T i C i and censoring indicator, δ i = I(T i < C i ), arising from such a parametric model we have the likelihood,

Business Statistics. Lecture 10: Course Review

Sociology 593 Exam 1 February 17, 1995

Economics 471: Econometrics Department of Economics, Finance and Legal Studies University of Alabama

Lectures on Simple Linear Regression Stat 431, Summer 2012

Topic 17 - Single Factor Analysis of Variance. Outline. One-way ANOVA. The Data / Notation. One way ANOVA Cell means model Factor effects model

Explore the data. Anja Bråthen Kristoffersen

lm statistics Chris Parrish

The Simple Linear Regression Model

Explore the data. Anja Bråthen Kristoffersen Biomedical Research Group

Topic 20: Single Factor Analysis of Variance

Chapter 8 Conclusion

9. Linear Regression and Correlation

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

The Application of California School

Tentative solutions TMA4255 Applied Statistics 16 May, 2015

22s:152 Applied Linear Regression

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Routines for fitting kinetic models to chemical degradation data

Gov 2000: 9. Regression with Two Independent Variables

df=degrees of freedom = n - 1

Using R in 200D Luke Sonnet

Package msir. R topics documented: April 7, Type Package Version Date Title Model-Based Sliced Inverse Regression

STATISTICS 1 REVISION NOTES

Nonstationary time series models

Statistical Modelling in Stata 5: Linear Models

Package sklarsomega. May 24, 2018

Introduction to Regression

A Handbook of Statistical Analyses Using R 3rd Edition. Torsten Hothorn and Brian S. Everitt

Using Relative Distribution Software

Introduction to the Analysis of Hierarchical and Longitudinal Data

Package horseshoe. November 8, 2016

Replication of Examples in Chapter 6

Applied Statistics and Econometrics

COMPARING SEVERAL MEANS: ANOVA

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

First steps of multivariate data analysis

Exercise 5.4 Solution

36-402/608 Homework #10 Solutions 4/1

Markov Chain Monte Carlo

BIOS 312: Precision of Statistical Inference

Linear Regression. Data Model. β, σ 2. Process Model. ,V β. ,s 2. s 1. Parameter Model

Comment about AR spectral estimation Usually an estimate is produced by computing the AR theoretical spectrum at (ˆφ, ˆσ 2 ). With our Monte Carlo

Period. End of Story?

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Regression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.

Prediction problems 3: Validation and Model Checking

Penalized Regression

(Re)introduction to Statistics Dan Lizotte

Math 141. Quantile-Quantile Plots. Albyn Jones 1. jones/courses/ Library 304. Albyn Jones Math 141

M. H. Gonçalves 1 M. S. Cabral 2 A. Azzalini 3

Lab 11 - Heteroskedasticity

Generating OLS Results Manually via R

Median Cross-Validation

Inference for a Population Proportion

Inference with Heteroskedasticity

STK 2100 Oblig 1. Zhou Siyu. February 15, 2017

Regression Clustering

SPSS Output. ANOVA a b Residual Coefficients a Standardized Coefficients

Analysis of 2x2 Cross-Over Designs using T-Tests

Outline. Mixed models in R using the lme4 package Part 3: Longitudinal data. Sleep deprivation data. Simple longitudinal data

Statistics - Lecture One. Outline. Charlotte Wickham 1. Basic ideas about estimation

Lecture 24: Weighted and Generalized Least Squares

GeoDa-GWR Results: GeoDa-GWR Output (portion only): Program began at 4/8/2016 4:40:38 PM

A Handbook of Statistical Analyses Using R 2nd Edition. Brian S. Everitt and Torsten Hothorn

Introduction. Chapter 1

Answer to exercise: Blood pressure lowering drugs

Chapter 4: Factor Analysis

Package ForwardSearch

Chapter 5 Exercises 1

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

Density Temp vs Ratio. temp

Solution pigs exercise

Lecture 3: Statistical sampling uncertainty

Ordinary Least Squares Regression

L6: Regression II. JJ Chen. July 2, 2015

Independent Samples t tests. Background for Independent Samples t test

Hypothesis Tests and Confidence Intervals Involving Fitness Landscapes fit by Aster Models By Charles J. Geyer and Ruth G. Shaw Technical Report No.

GB2 Regression with Insurance Claim Severities

Applied Statistics and Econometrics

Model Selection Procedures

Sociology 593 Exam 1 Answer Key February 17, 1995

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

COMP 551 Applied Machine Learning Lecture 20: Gaussian processes

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

UNIT 3 CONCEPT OF DISPERSION

1 Least Squares Estimation - multiple regression.

Transcription:

UNIVERSITY OF UTAH MATH 6010 LINEAR MODELS Galaxy Dataset Analysis Author: Curtis MILLER Instructor: Dr. Lajos HORVÀTH November 15, 2015

UNIVERSITY OF UTAH DEPARTMENT OF MATHEMATICS MATH 6010 Project 5 Curtis Miller November 15, 2015 1 INTRODUCTION This paper summarizes analysis done on the galaxy dataset described by Olmstead (2015). The model that Olmstead wishes to analyze, using statistical methods, is: 0 = m B + αx 1 βc + M B + ɛ (1.1) where m B is the observed magnitude of a supernova type Ia (or SN Ia) at maximum light, x 1 is the estimated relative rate of rise and fall of the light, c is the measured color of the SN, and M B effectively represents the true luminosity of the SN Ia. ɛ is an error term. α, β, and M B are unknowns, and m B is the variable we wish to estimate. In fact, Equation (1.1) translates into the statistical model: y i = β 0 + β 1 x i + β 2 c i + ɛ i (1.2) where y i corresponds to m B, β 0 corresponds to M B, β 1 corresponds to α, x i corresponds to x 1, β 2 corresponds to β, c i corresponds to c, and ɛ i corresponds to ɛ. It is Equation (1.2) that is actually analyzed in this paper. We were provided a dataset and with it created the data file galaxy_data.rda. Table 1.1 provides a description of the variables included in the dataset. All of the variables nececssary for estimating the desired unknown parameters in Equation (1.2) (and, therefore, Equation (1.1)) are included in this dataset, along with the associated errors of the estimates. 2

Variable Label 1 mb Observed magnitude of SN Ia at maximum light 2 mberr Estimated measurement error (standard deviation) for mb 3 X1 Estimated relative rate of rise and fall 4 x1err Estimated measurement error (standard deviation) of X1 5 C Measured color of SN 6 Cerr Estimated measurement error (standard deviation) of C 7 MASS Estimated host galaxy mass 8 masserr Estimated measurement error (standard deviation) of MASS 9 covmbx Covariance between mb and X1 10 covmbc Covariance between mb and C 11 covx1c Covariance between X1 and C Table 1.1: Description of variables in the dataset. 2 PRELIMINARY ANALYSIS OF ASTRONOMICAL DATA ON GALAXIES First I wish to answer some preliminary questions about the data: Variable Distributions What distributions appear to fit the variables provided in the dataset? Homoskedasticity Do the error terms estimated for each of the datapoints appear to be constant? Or do they vary to such a degree that we have to deal with the presence of heteroskedasticity? Covariation and Independence Is there evidence for independence among the variables of interest? This is not easy to determine in general, and a covariance of 0 does not imply independence unless the two variables follow a Gaussian distribution. Even then, zero covariance may make analysis easier. I will not be performing any statistical tests in this section. Instead I will be looking at basic summary statistics, plots, and estimated density functions to try and get a preliminary understanding of the data. 2.1 VARIABLE DISTRIBUTIONS We can use kernel density estimation (KDE) to try and get an idea of what the probability density function (PDF) of a variable is. Knowing what the PDF is will then help us decide what distribution for a variable seems appropriate. The easiest way to determine what distribution matches an estimated PDF is to look at a plot of the data s KDE. This is easy to do in R when using the density function. Let s first start with the variables that are actually desired measurements (in other words, not a measure of some error), which are mb, X1, C, and MASS: 3

Table 2.1: Summary statistics for primary variables. Statistic Mean St. Dev. Min Pctl(25) Median Pctl(75) Max mb 19.416 0.315 20.293 19.619 19.458 19.248 17.642 X1 0.008 1.124 3.106 0.774 0.045 0.805 5.000 C 0.019 0.094 0.271 0.089 0.024 0.037 0.255 MASS 10.661 0.608 8.757 10.264 10.755 11.113 11.826 mb.pdf <- density(mb) # This is the kernel density estimation function in R; # default kernel function is Gaussian. X1.pdf <- density(x1) C.pdf <- density(c) MASS.pdf <- density(mass) # Now to plot the estimated densities plot4.par <- list(mfrow = c(2, 2), oma = c(1,1,0,0), mar=c(2.3,2,1.5,1.5)) # Save this style of plot # Saves defaults, and makes multiple plots default.par <- par(plot4.par) plot(mb.pdf$x, mb.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of mb") plot(x1.pdf$x, X1.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of X1") plot(c.pdf$x, C.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of C") plot(mass.pdf$x, MASS.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of MASS") In Figure 2.1 we see some evidence for normality with the C variable, but this does not seem very plausible for the other variables given the density plots shown. Numeric summary statistics are shown in Table 2.1. Since I am particularly interested in whether the data follows a Gaussian distribution, I also include Q-Q plots which can shed more light on this question than merely looking at the KDEs. par(plot4.par) qqnorm(mb, main = "mb", pch=20); qqline(mb, col="blue") qqnorm(x1, main = "X1", pch=20); qqline(x1, col="blue") qqnorm(c, main = "C", pch=20); qqline(c, col="blue") qqnorm(mass, main = "MASS", pch=20); qqline(mass, col="blue") 4

0.0 0.5 1.0 1.5 Estimated PDF of mb 0.00 0.10 0.20 0.30 Estimated PDF of X1 20.5 19.5 18.5 17.5 4 2 0 2 4 6 Estimated PDF of C Estimated PDF of MASS 0 1 2 3 4 0.0 0.2 0.4 0.6 0.3 0.1 0.1 0.3 9 10 11 12 Figure 2.1: Estimated density plots of main variables of interest. According to Figure 2.2, C and X1 may follow a distribution similar to the Gaussian distribution, but mb and MASS have tails that are too heavy for this to be a reasonable belief. 2.2 HOMOSKEDASTICITY I repeat the analysis done in the prior section, only on the variables that measure the uncertainty (e.g. standard error) of the measurements of the variables of interest, which are mberr, x1err, Cerr, and masserr. Here, I want to see small variation of these numbers. Large variation may suggest heteroskedasticity in the estimates, which may complicate future analysis of this dataset. mberr.pdf <- density(mberr) x1err.pdf <- density(x1err) Cerr.pdf <- density(cerr) 5

mb X1 20.0 19.0 18.0 Sample Quantiles 2 0 2 4 3 1 0 1 2 3 C 3 1 0 1 2 3 Theoretical MASS Quantiles Sample Quantiles 0.2 0.0 0.1 0.2 Sample Quantiles 9.0 10.0 11.0 3 1 0 1 2 3 3 1 0 1 2 3 Theoretical Quantiles Theoretical Quantiles Figure 2.2: Gaussian Q-Q plots of main variables of interest. masserr.pdf <- density(masserr) # Now to plot the estimated densities par(plot4.par) # Makes multiple plots plot(mberr.pdf$x, mberr.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of mberr") plot(x1err.pdf$x, x1err.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of x1err") plot(cerr.pdf$x, Cerr.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of Cerr") plot(masserr.pdf$x, masserr.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of masserr") 6

Table 2.2: Summary statistics for variables describing measurement error. Statistic Mean St. Dev. Min Pctl(25) Median Pctl(75) Max mberr 0.073 0.026 0.024 0.053 0.069 0.089 0.248 x1err 0.771 0.420 0.077 0.446 0.702 1.035 2.348 Cerr 0.058 0.023 0.007 0.040 0.055 0.074 0.143 masserr 0.109 0.052 0.007 0.075 0.100 0.130 0.445 Table 2.3: Summary statistics for variables describing covariation between variable estimates. Statistic Mean St. Dev. Min Pctl(25) Median Pctl(75) Max covmbx 0.043 0.041 0.020 0.012 0.031 0.062 0.236 covmbc 0.002 0.002 0.001 0.001 0.002 0.003 0.032 covx1c 0.020 0.024 0.024 0.005 0.011 0.028 0.200 Table 2.2 shows some summary statistics of these variables and Figure 2.3 shows their estimated density functions. mberr, Cerr, and masserr appear to have small spreads and very peaked distributions, so their corresponding variables (namely, mb, C and MASS) may be homoskedastic; there isn t much variation in the estimates errors. This is not the case for x1err, which may suggest we need to deal with homoskedasticity for X1. 2.3 COVARIATION AND INDEPENDENCE As stated in the introduction, zero covariance implies independence only if both variables follow a Gaussian distribution. The only variable for which a Gaussian distribution seems plausible is X1 and C, so nowhere will finding zero covariance imply independence. However, zero covariance may still make our analysis easier, so finding zero covariance is still desirable. The variables in our dataset that describe covariation in the estimates of our variables of interest are covmbx (the covariance between mb and X1), covmbc (the covariance between mb and C), and covx1c (the covariance between mb and C). I would like to see these variables centered at zero and with narrow spreads. I repeat the above analysis here. covmbx.pdf <- density(covmbx) covmbc.pdf <- density(covmbc) covx1c.pdf <- density(covx1c) # Now to plot the estimated densities par(plot4.par) # Makes multiple plots plot(covmbx.pdf$x, covmbx.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of covmbx") 7

0 5 10 15 Estimated PDF of mberr 0.0 0.2 0.4 0.6 0.8 1.0 Estimated PDF of x1err 0.00 0.10 0.20 0.0 1.0 2.0 Estimated PDF of Cerr Estimated PDF of masserr 0 5 10 15 0 2 4 6 8 10 0.00 0.05 0.10 0.15 0.0 0.1 0.2 0.3 0.4 Figure 2.3: Estimated density plots of variables describing measurement error. plot(covmbc.pdf$x, covmbc.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of covmbc") plot(covx1c.pdf$x, covx1c.pdf$y, type = 'l', xlab = "x", ylab = "y", main = "Estimated PDF of covx1c") Table 2.3 show summary statistics from the data for covmbx, covmbc, and covx1c. Figure 2.4 shows the plots of the estimated PDFs. Here we do see that covariances tend to be centered at zero and don t tend to vary much; strangely, though, the distributions appear skewed to the right (which means that positive covariance is more likely than negative covariance). I have no comment as to what this means. 8

0 5 10 15 Estimated PDF of covmbx 0 100 200 300 400 Estimated PDF of covmbc 0.05 0.05 0.15 0.25 Estimated PDF of covx1c 0.000 0.010 0.020 0.030 0 10 20 30 0.00 0.10 0.20 Figure 2.4: Estimated density plots of variables describing estimate covariation between variables. 3 MODEL ESTIMATION AND ERROR ANALYSIS Equation (1.2) represents the model I will try to estimate in this paper. I estimate the model both with least squares and absolute deviation estimators for the model s coefficients. Each time, I display residuals and provide an error analysis. Least squares estimation in R is trivial: 3.1 LEAST SQUARES ESTIMATION modelform <- mb ~ X1 + C lsestimate <- lm(modelform) stargazer(lsestimate, label = "tab:lmest", 9

Table 3.1: Least squares estimates for model parameters Dependent variable: X1 C Constant mb 0.102 (0.008) 2.177 (0.100) 19.373 (0.010) Observations 553 R 2 0.515 Adjusted R 2 0.514 Residual Std. Error 0.220 (df = 550) F Statistic 292.337 (df = 2; 550) Note: p<0.1; p<0.05; p<0.01 title="least squares estimates for model parameters", table.placement = 't') Table 3.1 shows the least squares estimates of the coefficients in the model. Standard errors are shown in parenthesis. All coefficients in the model are different from zero, and the model s F statistic indicates that at least one of the variables included helps explain variation in m B. 3.1.1 RESIDUALS I would like to analyze the residuals and their relationship with the variables. The easiest way to do so is to plot the residuals along with the variables. The following code does so: par(mfrow = c(1, 3)) res <- lsestimate$residuals plot(mb, res, pch=20, cex =.5); abline(lm(res ~ mb), col="blue") plot(x1, res, pch=20, cex =.5); abline(lm(res ~ X1), col="blue") plot(c, res, pch=20, cex =.5); abline(lm(res ~ C), col="blue") Figure 3.1 shows the residuals relationship to the variables mb, X1, and C. The residuals show no strong pattern with X1, or C. However, there is a slight linear relationship with the 10

res 0.5 0.0 0.5 1.0 1.5 res 0.5 0.0 0.5 1.0 1.5 res 0.5 0.0 0.5 1.0 1.5 20.0 19.0 18.0 mb 2 0 2 4 X1 0.2 0.0 0.1 0.2 C Figure 3.1: Analysis of residuals for the least squares estimated model. residuals and mb. I consider this a concerning finding; there should be no relationship between the dependent variable and the residuals. A relationship like the linear one observed may jeopardize the validity of inference since it suggests that the underlying assumptions used for inference do not hold. Something in this model is amiss. 3.1.2 ERROR ANALYSIS As seen in Table 3.1, all of the parameters in our model are statistically significant with relatively small standard errors. The residuals standard error is also displayed. The sum of squared error is SSE = 26.5484. The estimated variance of the residuals is ˆσ 2 = 0.0481. There is evidence that the model does provide a fit, but as mentioned in the previous subsection, there are undesired patterns in the residuals that call underlying assumptions of these methods into question. 3.2 LEAST ABSOLUTE DEVIATION The R package that performs least absolute deviation (LAD) regression is the package quantreg (Koenker 2015). The function rq (which performs quantile regression) will compute the LAD estimate for β by default. library(quantreg) ladestimate <- rq(modelform) stargazer(ladestimate, label = "tab:rqest", title="least absolute deviation estimates for model parameters", table.placement = 't') Table 3.2 shows the LAD estimate for β. There is not considerable evidence that these models are different, so I expect to see similar results to what I observed for the least squares 11

Table 3.2: Least absolute deviation estimates for model parameters Dependent variable: X1 C Constant mb 0.107 (0.007) 2.149 (0.087) 19.390 (0.009) Observations 553 Note: p<0.1; p<0.05; p<0.01 estimate. 3.2.1 RESIDUALS I display the residuals using a method similar to how I created Figure 3.1: par(mfrow = c(1, 3)) res <- ladestimate$residuals plot(mb, res, pch=20, cex =.5); abline(lm(res ~ mb), col="blue") plot(x1, res, pch=20, cex =.5); abline(lm(res ~ X1), col="blue") plot(c, res, pch=20, cex =.5); abline(lm(res ~ C), col="blue") Figure 3.2 shows the residuals produced by the model when compared to the variables. I see a similar pattern to what I observed with the least squares model. 3.2.2 ERROR ANALYSIS The standard errors of the coefficients of the LAD estimated parameters do not differ greatly from those of the least squares estimated model, although they do tend to be larger. The sum of squared errors is SSE = 26.7035. It appears that the LAD estimated model does not represent a major improvement over the least squares estimated model. 4 WEIGHTED LEAST SQUARES ESTIMATION No where in this analysis have we accounted for the error in the estimation of the variables that are used as regressors in the model described by Equation (1.2). mberr, x1err, Cerr, 12

res 0.5 0.0 0.5 1.0 1.5 res 0.5 0.0 0.5 1.0 1.5 res 0.5 0.0 0.5 1.0 1.5 20.0 19.0 18.0 mb 2 0 2 4 X1 0.2 0.0 0.1 0.2 C Figure 3.2: Analysis of residuals for the least absolute deviation estimated model. covmbx, covmbc, and covx1c have so far been ignored. I now try to account for these errors when estimating the coefficients in the model. Equation (2) in (Olmstead 2015) describes the function we want to minimize. This function is: (m B,i + αx 1,i βc i model i ) χ 2 = i σ 2 int +V ar (m B,i ) + α2 V ar (x 1,i + β 2 V ar (c i ) + 2αCov(m B,i, x 1,i ) 2βCov(m B,i,c i ) 2βCov(c i, x 1,i )) (4.1) However, (ibid.) says we can safely assume that model i = 0. In R, I use the optim function to minimize this quantity. estimate.model <- function(d) { sse <- function(param) { sigma <- param[1]; b0 <- param[2]; b1 <- param[3]; b2 <- param[4] error <- with(d, sum((mb + b0 + b1*x1 - b2 * C)^2/ (sigma^2 + (b1*x1err)^2 + (b2*cerr)^2 + mberr^2-2*b2*covx1c + 2*b1*covmbx - 2*b2*covmbc))) return(error) } param <- c(0.09, 19.3, 0.14, 3.1) } return(optim(param, fn = sse)$par) estimate.model(galaxy_data)[-1] -> optimfit 13

Coefficients M B 19.30 (6.20) α 0.14 (0.10) β 3.10 (0.89) Num. obs. 553 p < 0.001, p < 0.01, p < 0.05 Table 4.1: Weighted least squares estimate of model coefficients estimate.model(galaxy_data)[1] -> sigmahat # I want estimates of standard errors; I use jackknife sapply(1:nrow(galaxy_data), function(i) (estimate.model(galaxy_data[-i,])[-1] - optimfit)^2) -> temp sqrt(sapply(1:3, function(i) sum(temp[i,])) * (1-1/nrow(galaxy_data))) -> jack.se res <- mb + optimfit[1] + optimfit[2] * X1 - optimfit[3] * C df <- nrow(galaxy_data) - 3 p.vals <- pt(optimfit / jack.se, df = df, lower.tail = FALSE) texreg(lsestimate, custom.model.names = c("coefficients"), custom.coef.names = c("$m_b$", "$\\alpha$", "$\\beta$"), override.coef = list(optimfit), override.se = list(jack.se), override.pval = list(p.vals), label = "tab:optimest", caption = "Weighted least squares estimate of model coefficients", include.rsquared = FALSE, include.adjrs = FALSE, include.nobs = TRUE, include.fstatistic = FALSE, include.rmse = FALSE) The results of the weighted least squares estimate for the coefficients are shown in Table 4.1. The standard errors were estimated using the jackknife method. These estimators have a larger standard error (which is more appropriate), and as a result we see that the estimate 14

res 1.0 0.5 0.0 0.5 1.0 1.5 res 1.0 0.5 0.0 0.5 1.0 1.5 res 1.0 0.5 0.0 0.5 1.0 1.5 20.0 19.0 18.0 2 0 2 4 0.2 0.0 0.1 0.2 mb X1 C Figure 4.1: Analysis of residuals for the weighted least squares estimated model. for α is not statistically significant from zero. The estimated standard error, also found by the weighted least squares estimator, is ˆσ = 0.0911. All of these results agree with the typical results described in (Olmstead 2015). par(mfrow = c(1, 3)) plot(mb, res, pch=20, cex =.5); abline(lm(res ~ mb), col="blue") plot(x1, res, pch=20, cex =.5); abline(lm(res ~ X1), col="blue") plot(c, res, pch=20, cex =.5); abline(lm(res ~ C), col="blue") Figure 4.1 shows that the strong linear pattern observed before is not as strong in this model as in some of the others, but a relationship still exists. In fact, a relationship appears in C as well. It appears that this model is an improvement, though not perfect. 5 CONCLUSION The model described in Equation (1.1) is not perfect. There is an undesirable relationship between m B and the error term when the model is actually estimated. These relationships appear regardless of whether least squares, weighted least squares, or LAD estimation are used. This suggests that the model may need to be revised. REFERENCES Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2. Harvard University. Cambridge, USA. URL: http://cran.r-project. org/package=stargazer. Koenker, Roger (2015). quantreg: Quantile Regression. R package version 5.19. URL: http : //CRAN.R-project.org/package=quantreg. 15

Olmstead, Matt D. (2015). Host Galaxy Properties and Hubble Residuals from the SDSS SN Survey. Draft version provided to Fall 2015 MATH 6010 class. R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria. URL: http://www.r-project.org/. Xie, Yihui (2015). knitr: A General-Purpose Package for Dynamic Report Generation in R. R package version 1.11. 16