Multiple regression is arguably the single most important method in all of statistics.

Size: px
Start display at page:

Download "Multiple regression is arguably the single most important method in all of statistics."

Transcription

1 Multiple Regressio: How Much Is Your Car Worth? 3 Essetially, all models are wrog; some are useful. George E. P. Box 1 Multiple regressio is arguably the sigle most importat method i all of statistics. Regressio models are widely used i may disciplies. I additio, a good uderstadig of regressio is all but essetial for uderstadig may other, more sophisticated statistical methods. This chapter cosists of a set of activities that will eable you to build a multivariate regressio model. The model will be used to describe the relatioship betwee the retail price of 005 used GM cars ad various car characteristics, such as mileage, make, model, presece or absece of cruise cotrol, ad egie size. The set of activities i this chapter allows you to work through the etire process of model buildig ad assessmet, icludig Applyig variable selectio techiques Usig residual plots to check for violatios of model assumptios, such as heteroskedasticity, outliers, autocorrelatio, ad oormality distributed errors Trasformig data to better fit model assumptios Uderstadig the impact of correlated explaatory variables Icorporatig categorical explaatory variables ito a regressio model Applyig F-tests i multiple regressio 67

2 68 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? 3.1 Ivestigatio: How Ca We Build a Model to Estimate Used Car Prices? Have you ever browsed through a car dealership ad observed the sticker prices o the vehicles? If you have ever seriously cosidered purchasig a vehicle, you ca probably relate to the difficulty of determiig whether that vehicle is a good deal or ot. Most dealerships are willig to egotiate o the sale price, so how ca you kow how much to egotiate? For ovices (like this author), it is very helpful to refer to a outside pricig source, such as the Kelley Blue Book, before agreeig o a purchase price. For over 80 years, Kelley Blue Book has bee a resource for accurate vehicle pricig. The compay s Website, provides a free olie resource where ayoe ca iput several car characteristics (such as age, mileage, make, model, ad coditio) ad quickly receive a good estimate of the retail price. I this chapter, you will use a relatively small subset of the Kelley Blue Book database to describe the associatio of several explaatory variables (car characteristics) with the retail value of a car. Before developig a complex multiple regressio model with several variables, let s start with a quick review of the simple liear regressio model by askig a questio: Are cars with lower mileage worth more? It seems reasoable to expect to see a relatioship betwee mileage (umber of miles the car has bee drive) ad retail value. The data set Cars cotais the make, model, equipmet, mileage, ad Kelley Blue Book suggested retail price of several used 005 GM cars. A Simple Liear Regressio Model 1. Produce a scatterplot from the Cars data set to display the relatioship betwee mileage (Mileage) ad suggested retail price (Price). Does the scatterplot show a strog relatioship betwee Mileage ad Price?. Calculate the least squares regressio lie, Price = b 0 + b 1 (Mileage). Report the regressio model, the R value, the correlatio coefficiet, the t-statistics, ad p-values for the estimated model coefficiets (the itercept ad slope). Based o these statistics, ca you coclude that Mileage is a strog idicator of Price? Explai your reasoig i a few seteces. 3. The first car i this data set is a Buick Cetury with 81 miles. Calculate the residual value for this car (the observed retail price mius the expected price calculated from the regressio lie). Mathematical Note For ay regressio equatio of the form y i = b 0 + b 1 x i + e i, the hypothesis test for the slope of the regressio equatio (b 1 ) is similar to other t-tests discussed i itroductory textbooks. (Mathematical details for this hypothesis test are described i Chapter.) To test the ull hypothesis that the slope coefficiet is zero (H 0 : b 1 = 0 versus H a : b 1 0), calculate the followig test statistic: t = b 1 - b 1 s b1 (3.1) where b 1 is the estimated slope calculated from the data ad s b1 is a estimate of the stadard deviatio of b 1. Probability theory ca be used to prove that if the regressio model assumptios are true, the t-statistic i Equatio (3.1) follows a t-distributio with - degrees of freedom. If the sample statistic, b 1, is far away from b 1 = 0 relative to the estimated stadard deviatio, the t-statistic will be large ad the correspodig p-value will be small. The t-statistic for the slope coefficiet idicates that Mileage is a importat variable. However, the R value (the percetage of variatio explaied by the regressio lie) idicates that the regressio lie is ot very useful i predictig retail price. (A review of the R value is give i the exteded activities.) As is always the case with statistics, we eed to visualize the data rather tha focus solely o a p-value. Figure 3.1 shows that the expected price decreases as mileage icreases, but the observed poits do ot appear to be close to the regressio lie. Thus, it seems reasoable that icludig additioal explaatory variables i the regressio model might help to better explai the variatio i retail price.

3 3. Goals of Multiple Regressio 69 Figure 3.1 Scatterplot ad least squares regressio model: Price = 4, (Mileage). The regressio lie shows that for each additioal mile a car is drive, the expected price of the car decreases by about 17 cets. However, may poits are ot close to the regressio lie, idicatig that the expected price is ot a accurate estimate of the actual observed price. I this chapter, you will build a liear combiatio of explaatory variables that explais the respose variable, retail price. As you work through the chapter, you will fid that there is ot oe techique, or recipe, that will give the best model. I fact, you will come to see that there is t just oe best model for these data. Ulike i most mathematics classes, where every studet is expected to submit the oe right aswer to a assigmet, here it is expected that the fial regressio models submitted by various studets will be at least slightly differet. While a sigle best model may ot exist for these data, there are certaily may bad models that should be avoided. This chapter focuses o uderstadig the process of developig a statistical model. It does t matter if you are developig a regressio model i ecoomics, psychology, sociology, or egieerig there are commo key questios ad processes that should be evaluated before a fial model is submitted. 3. Goals of Multiple Regressio It is importat to ote that multiple regressio aalysis ca be used to serve differet goals. The goals will ifluece the type of aalysis that is coducted. The most commo goals of multiple regressio are to describe, predict, or cofirm. Describe: A model may be developed to describe the relatioship betwee multiple explaatory variables ad the respose variable. Predict: A regressio model may be used to geeralize to observatios outside the sample. Just as i simple liear regressio, explaatory variables should be withi the rage of the sample data to predict future resposes. Cofirm: Theories are ofte developed about which variables or combiatio of variables should be icluded i a model. For example, is mileage useful i predictig retail price? Iferetial techiques ca be used to test if the associatio betwee the explaatory variables ad the respose could just be due to chace.

4 70 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? Theory may also predict the type of relatioship that exists, such as cars with lower mileage are worth more. More specific theories ca also be tested, such as retail price decreases liearly with mileage. Whe the goal of developig a multiple regressio model is descriptio or predictio, the primary issue is ofte determiig which variables to iclude i the model (ad which to leave out). All potetial explaatory variables ca be icluded i a regressio model, but that ofte results i a cumbersome model that is difficult to uderstad. O the other had, a model that icludes oly oe or two of the explaatory variables, such as the model i Figure 3.1, may be much less accurate tha a more complex model. This tesio betwee fidig a simple model ad fidig the model that best explais the respose is what makes it difficult to fid a best model. The process of fidig the most reasoable mix, which provides a relatively simple liear combiatio of explaatory variables, ofte resembles a exploratory artistic process much more tha a formulaic recipe. Icludig redudat or uecessary variables ot oly creates a uwieldy model but also ca lead to test statistics (ad coclusios from correspodig hypothesis tests) that are less reliable. If explaatory variables are highly correlated, the their effects i the model will be estimated with more imprecisio. This imprecisio leads to larger stadard errors ad ca lead to isigificat test results for idividual variables that ca be importat i the model. Failig to iclude a relevat variable ca result i biased estimates of the regressio coefficiets ad ivalid t-statistics, especially whe the excluded variable is highly sigificat or whe the excluded variable is correlated with other variables also i the model. 3.3 Variable Selectio Techiques to Describe or Predict a Respose If your objective is to describe a relatioship or predict ew respose variables, variable selectio techiques are useful for determiig which explaatory variables should be i the model. For this ivestigatio, we will cosider the respose to be the suggested retail price from Kelley Blue Book (the Price variable i the data). We may iitially believe the followig are relevat potetial explaatory variables: Make (Buick, Cadillac, Chevrolet, Potiac, SAAB, Satur) Model (specific car for each previously listed Make) Trim (specific type of Model) Type (Seda, Coupe, Hatchback, Covertible, or Wago) Cyl (umber of cyliders: 4, 6, or 8) Liter (a measure of egie size) Doors (umber of doors:, 4) Cruise (1 = cruise cotrol, 0 = o cruise cotrol) Soud (1 = upgraded speakers, 0 = stadard speakers) Leather (1 = leather seats, 0 = ot leather seats) Mileage (umber of miles the car has bee drive) Stepwise Regressio Whe a large umber of variables are available, stepwise regressio is a iterative techique that has historically bee used to idetify key variables to iclude i a regressio model. For example, forward stepwise regressio begis by fittig several sigle-predictor regressio models for the respose; oe regressio model is developed for each idividual explaatory variable. The sigle explaatory variable (call it X 1 ) that best explais the respose (has the highest R value) is selected to be i the model. * I the ext step, all possible regressio models usig X 1 ad exactly oe other explaatory variable are calculated. From amog all these two-variable models, the regressio model that best explais the respose is used to idetify X. After the first ad secod explaatory variables, X 1 ad X, have bee selected, the * A F-test is coducted o each of the models. The size of the F-statistic (ad correspodig p-value) is used to evaluate the fit of each model. Whe models with the same umber of predictors are compared, the model with the largest F-statistic will also have the largest R value.

5 3.3 Variable Selectio Techiques to Describe or Predict a Respose 71 process is repeated to fid X 3. This cotiues util icludig additioal variables i the model o loger greatly improves the model s ability to describe the respose. * Backward stepwise regressio is similar to forward stepwise regressio except that it starts with all potetial explaatory variables i the model. Oe by oe, this techique removes variables that make the smallest cotributio to the model fit util a best model is foud. While sequetial techiques are easy to implemet, there are usually better approaches to fidig a regressio model. Sequetial techiques have a tedecy to iclude too may variables ad at the same time sometimes elimiate importat variables. 3 With improvemets i techology, most statisticias prefer to use more global techiques (such as best subset methods), which compare all possible subsets of the explaatory variables. Note Later sectios will show that whe explaatory variables are highly correlated, sequetial procedures ofte leave out variables that explai (are highly correlated with) the respose. I additio, sequetial procedures ivolve umerous iteratios ad each iteratio ivolves hypothesis tests about the sigificace of coefficiets. Remember that with ay multiple compariso problem, a a@level of 0.05 meas there is a 5% chace that each irrelevat variable will be foud sigificat ad may iappropriately be determied importat for the model. Selectig the Best Subset of Predictors A researcher must balace icreasig R agaist keepig the model simple. Whe models with the same umber of parameters are compared, the model with the highest R value should typically be selected. A larger R value idicates that more of the variatio i the respose variable is explaied by the model. However, R ever decreases whe aother explaatory variable is added. Thus, other techiques are suggested for comparig models with differet umbers of explaatory variables. Statistics such as the adjusted R, Mallows C p, ad Akaike s ad Bayes iformatio criteria are used to determie a best model. Each of these statistics icludes a pealty for icludig too may terms. I other words, whe two models have equal R values, each of these statistics will select the model with fewer terms. Best subsets techiques use several statistics to simultaeously compare several regressio models with the same umber of predictors. Mathematical Note R, called the coefficiet of determiatio, is the percetage of variatio i the respose variable that is explaied by the regressio lie. a (y i - y) a (y i - y i ) R = = 1 - a (y i - y) (3.) a (y i - y) Whe the sum of the squared residuals (y i - y i ) are small compared to the total spread of the resposes ( a (y i - y) ), R is close to oe. R = 1 idicates that the regressio model perfectly fits the data. Calculatios for adjusted R ad Mallows C p are * A a@level is ofte used to determie if ay of the explaatory variables ot curretly i the model should be added to the model. If the p-value of all additioal explaatory variables is greater tha the a@level, o more variables will be etered ito the model. Larger a@levels (such as a = 0.) will iclude more terms while smaller a@levels (such as a = 0.05) will iclude fewer terms. R adj = a (y i - y i ) - p (3.3) a (y i - y)

6 7 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? C p = ( - p) s + (p - ) (3.4) s Full where is the sample size, p is the umber of coefficiets i the model (icludig the costat term), s estimates the variace of the residuals i the model, ad s Full estimates the variace of the residuals i the full model (i.e., the model with all possible terms). R adj is similar to R, except that R adj icludes a pealty whe uecessary terms are icluded i the model. Specifically, whe p is larger, ( - 1)/( - p) is larger, ad thus R adj is smaller. C p also adjusts for the umber of terms i the model. Notice that whe the curret model explais the data as well as the full model, s /s Full = 1. The C p = ( - p)(1) + p - = p. Thus, the objective is to fid the smallest C p value that is close to the umber of coefficiets i the model. While ay of these are appropriate for some models, adjusted R still teds to select models with too may terms. Mathematical details for these calculatios are provided i the exteded activities. Comparig Variable Selectio Techiques 4. Use the Cars data to coduct a stepwise regressio aalysis. a. Calculate seve regressio models, each with oe of the followig explaatory variables: Cy1, Liter, Doors, Cruise, Soud, Leather, ad Mileage. Idetify the explaatory variable that correspods to the model with the largest R value. Call this variable X 1. b. Calculate six regressio models. Each model should have two explaatory variables, X 1 ad oe of the other six explaatory variables. Fid the two-variable model that has the highest R value. How much did R improve whe this secod variable was icluded? c. Istead of cotiuig this process to idetify more variables, use the software istructios provided to coduct a stepwise regressio aalysis. List each of the explaatory variables i the model suggested by the stepwise regressio procedure. 5. Use the software istructios provided to develop a model usig best subsets techiques. Notice that stepwise regressio simply states which model to use, while best subsets provides much more iformatio ad requires the user to choose how may variables to iclude i the model. I geeral, statisticias select models that have a relatively low C p, a large R, ad a relatively small umber of explaatory variables. (It is rare for these statistics to all suggest the same model. Thus, the researcher much choose a model based o his or her goals. The exteded activities provide additioal details about each of these statistics.) Based o the output from best subsets, which explaatory variables should be icluded i a regressio model? 6. Compare the regressio models i Questios 4 ad 5. a. Are differet explaatory variables cosidered importat? b. Did the stepwise regressio i Questio 4 provide ay idicatio that Liter could be useful i predictig Price? Did the best subsets output i Questio 5 provide ay idicatio that Liter might be useful i predictig Price? Explai why best subsets techiques ca be more iformative tha sequetial techiques. Neither sequetial or best subsets techiques guaratee a best model. Arbitrarily usig slightly differet criteria will produce differet models. Best subset methods allow us to compare models with a specific umber of predictors, but models with more predictors do ot always iclude the same terms as smaller models. Thus, it is ofte difficult to iterpret the importace of ay coefficiets i the model. Variable selectio techiques are useful i providig a high R value while limitig the umber of variables. Whe our goal is to develop a model to describe or predict a respose, we are cocered ot about the sigificace of each explaatory variable, but about how well the overall model fits. If our goal ivolves cofirmig a theory, iterative techiques are ot recommeded. Cofirmig a theory is similar to hypothesis testig. Iterative variable selectio techiques test each variable or combiatio of variables several times, ad thus the p-values are ot reliable. The stated sigificace level for a t-statistic is valid oly if the data are used for a sigle test. If multiple tests are coducted to fid the best equatio, the actual sigificace level for each test for a idividual compoet is ivalid.

7 3.4 Checkig Model Assumptios 73 Key Cocept If variables are selected by iterative techiques, hypothesis tests should ot be used to determie the sigificace of these same terms. 3.4 Checkig Model Assumptios The simple liear regressio model discussed i itroductory statistics courses typically has the followig form: y i = b 0 + b 1 x i + e i for i = 1,, c, where e i N(0, s ) (3.5) For this liear regressio model, the mea respose (b 0 + b 1 x i ) is a liear fuctio of the explaatory variable, x. The multiple liear regressio model has a very similar form. The key differece is that ow more terms are icluded i the model. y i = b 0 + b 1 x 1,i + c + b p - x p -,i + b p -1 x p -1,i + e i for i = 1,, c, where e i N(0, s ) (3.6) I this chapter, p represets the umber of parameters i the regressio model (b 0, b 1, b, c, b p -1 ) ad is the total umber of observatios i the data. I this chapter, we make the followig assumptios about the regressio model: The model parameters b 0, b 1, c, b p -1 ad s are costat. Each term i the model is additive. The error terms i the regressio model are idepedet ad have bee sampled from a sigle populatio (idetically distributed). This is ofte abbreviated as iid. The error terms follow a ormal probability distributio cetered at zero with a fixed variace, s. This assumptio is deoted as e i N(0, s ) for i = 1, c,. Regressio assumptios about the error terms are geerally checked by lookig at the residuals from the data: y i - y i. Here, y i are the observed resposes ad y i are the estimated resposes calculated by the regressio model. Istead of formal hypothesis tests, plots will be used to visually assess whether the assumptios hold. The theory ad methods are simplest whe ay scatterplot of residuals resembles a sigle, horizotal, oval balloo, but real data may ot cooperate by coformig to the ideal patter. A orery plot may show a wedge, a curve, or multiple clusters. Figure 3. shows examples of each of these types of residual plots. Suppose you held a cup full of cois ad dropped them all at oce. We hope to fid residual plots that resemble the radom patter that would likely result from dropped cois (like the oval-shaped plot). The other three plots show patters that would be very ulikely to occur by radom chace. Ay plot patters that are ot ice oval shapes suggest that the error terms are violatig at least oe model assumptio, ad thus it is likely that we have ureliable estimates of our model coefficiets. The followig sectio illustrates strategies for dealig with oe of these uwated shapes: a wedge-shaped patter. Note that i sigle-variable regressio models, residual plots show the same iformatio as the iitial fitted lie plot. However, the residual plots ofte emphasize violatios of model assumptios better tha the fitted lie plot. I additio, multivariate regressio lies are very difficult to visualize. Thus, residual plots are essetial whe multiple explaatory variables are used. Heteroskedasticity Heteroskedasticity is a term used to describe the situatio where the variace of the error term is ot costat for all levels of the explaatory variables. For example, i the regressio equatio Price = 4, (Mileage), the spread of the suggested retail price values aroud the regressio lie should be about the same whether mileage is 0 or mileage is 50,000. If heteroskedasticity exists i the model, the most commo remedy is to trasform either the explaatory variable, the respose variable, or both i the hope that the trasformed relatioship will exhibit homoskedasticity (equal variaces aroud the regressio lie) i the error terms.

8 74 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? Figure 3. Commo shapes of residual plots. Ideally, residual plots should look like a radomly scattered set of dropped cois, as see i the oval-shaped plot. If a patter exists, it is usually best to try other regressio models. 7. Usig the regressio equatio calculated i Questio 5, create plots of the residuals versus each explaatory variable i the model. Also create a plot of the residuals versus the predicted retail price (ofte called a residual versus fit plot). a. Does the size of the residuals ted to chage as mileage chages? b. Does the size of the residuals ted to chage as the predicted retail price chages? You should see patters idicatig heteroskedasticity (ocostat variace). c. Aother patter that may ot be immediately obvious from these residual plots is the right skewess see i the residual versus mileage plot. Ofte ecoomic data, such as price, are right skewed. To see the patter, look at just oe vertical slice of this plot. With a pecil, draw a vertical lie correspodig to mileage equal to Are the poits i the residual plots balaced aroud the lie Y = 0? d. Describe ay patters see i the other residual plots. 8. Trasform the suggested retail price to log (Price) ad Price. Trasformig data usig roots, logarithms, or reciprocals ca ofte reduce heteroskedasticity ad right skewess. (Trasformatios are discussed i Chapter.) Create regressio models ad residual plots for these trasformed respose variables usig the explaatory variables selected i Questio 5. a. Which trasformatio did the best job of reducig the heteroskedasticity ad skewess i the residual plots? Give the R values of both ew models. b. Do the best residual plots correspod to the best R values? Explai. While other trasformatios could be tried, throughout this ivestigatio we will refer to the log-trasformed respose variable as TPrice. Figure 3.3 shows residual plots that were created to aswer Questios 7 ad 8. Notice that whe the respose variable is Price, the residual versus fit plot has a clear wedge-shaped patter. The residuals have

9 3.4 Checkig Model Assumptios 75 much more spread whe the fitted value is large (i.e., expected retail price is close to $40,000) tha whe the fitted value is ear $10,000. Usig TPrice as a respose did improve the residual versus fit plot. Although there is still a fait wedge shape, the variability of the residuals is much more cosistet as the fitted value chages. Figure 3.3 reveals aother patter i the residuals. The followig sectio will address why poits i both plots appear i clusters. Figure 3.3 Residual versus fit plots usig Price ad TPrice (the log 10 trasformatio), as resposes. The residual plot with Price as the respose has a much stroger wedge-shaped patter tha the oe with TPrice. Examiig Residual Plots Across Time/Order Autocorrelatio exists whe cosecutive error terms are related. If autocorrelatio exists, the assumptio about the idepedece of the error terms is violated. To idetify autocorrelatio, we plot the residuals versus the order of the data etries. If the ordered plot shows a patter, the we coclude that autocorrelatio exists. Whe autocorrelatio exists, the variable resposible for creatig the patter i the ordered residual plot should be icluded i the model. Note Time (order i which the data were collected) is perhaps the most commo source of autocorrelatio, but other forms, such as spatial autocorrelatios, ca also be preset. If time is ideed a variable that should be icluded i the model, a specific type of regressio model, called a time series model, should be used. 9. Create a residual versus order plot from the TPrice versus Mileage regressio lie. Describe ay patter you see i the ordered residual plot. Apparetly somethig i our data is affectig the residuals based o the order of the data. Clearly, time is ot the ifluetial factor i our data set (all of the data are from 005). Ca you suggest a variable i the Cars data set that may be causig this patter? 10. Create a secod residual versus order plot usig TPrice as the respose ad usig the explaatory variables selected i Questio 5. Describe ay patters that you see i these plots. While ordered plots make sese i model checkig oly whe there is a meaigful order to the data, the residual versus order plots could demostrate the eed to iclude additioal explaatory variables i the regressio model. Figure 3.4 shows the two residual plots created i Questios 9 ad 10. Both plots show that the data poits are clearly clustered by order. However, there is less clusterig whe the six explaatory variables (Mileage, Cyl, Doors, Cruise, Soud, ad Leather) are i the model. Also otice that the residuals ted

10 76 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? Figure 3.4 Residual versus order plots usig TPrice as the respose. The first graph uses Mileage as the explaatory variable, ad the secod graph uses Mileage, Cyl, Doors, Cruise, Soud, ad Leather as explaatory variables. to be closer to zero i the secod graph. Thus, the secod graph (with six explaatory variables) teds to have estimates that are closer to the actual observed values. We do ot have a time variable i this data set, so reorderig the data would ot chage the meaig of the data. Reorderig the data could elimiate the patter; however, the clear patter see i the residual versus order plots should ot be igored because it idicates that we could create a model with a much higher R value if we could accout for this patter i our model. This type of autocorrelatio is called taxoomic autocorrelatio, meaig that the relatioship see i this residual plot is due to how the items i the data set are classified. Suggestios o how to address this issue are give i later sectios. Outliers ad Ifluetial Observatios 11. Calculate a regressio equatio usig the explaatory variables suggested i Questio 5 ad Price as the respose. Idetify ay residuals (or cluster of residuals) that do t seem to fit the overall patter i the residual versus fit ad residual versus mileage plots. Ay data values that do t seem to fit the geeral patter of the data set are called outliers. a. Idetify the specific rows of data that represet these poits. Are there ay cosistecies that you ca fid? b. Is this cluster of outliers helpful i idetifyig the patters that were foud i the ordered residual plots? Why or why ot? 1. Ru the aalysis with ad without the largest cluster of potetial outliers (the cluster of outliers correspods to the Cadillac covertibles). Use Price as the respose. Does the cluster of outliers ifluece the coefficiets i the regressio lie? If the coefficiets chage dramatically betwee the regressio models, these poits are cosidered ifluetial. If ay observatios are ifluetial, great care should be take to verify their accuracy. I additio to reducig heterskedasticity, trasformatios ca ofte reduce the effect of outliers. Figure 3.5 shows the residual versus fit plots usig Price ad TPrice, respectively. The cluster of outliers correspodig to the Cadillac covertibles is much more visible i the plot with the utrasformed (Price) respose variable. Eve though there is still clusterig i the trasformed data, the residuals correspodig to the Cadillac covertibles are o loger uusually large.

11 3.4 Checkig Model Assumptios 77 Figure 3.5 Residual versus fit plots usig Price ad TPrice as the respose. The circled observatios i the plot usig Price are o loger clear outliers i the plot usig TPrice. I some situatios, clearly uderstadig outliers ca be more time cosumig (ad possibly more iterestig) tha workig with the rest of the data. It ca be quite difficult to determie if a outlier was accurately recorded or whether the outliers should be icluded i the aalysis. If the outliers were accurately recorded ad trasformatios are ot useful i elimiatig them, it ca be difficult to kow what to do with them. The simplest approach is to ru the aalysis twice: oce with the outliers icluded ad oce without. If the results are similar, the it does t matter if the outliers are icluded or ot. If the results do chage, it is much more difficult to kow what to do. Outliers should ever automatically be removed because they do t fit the overall patter of the data. Most statisticias ted to err o the side of keepig the outliers i the sample data set uless there is clear evidece that they were mistakely recorded. Whatever fial model is selected, it is importat to clearly state if you are aware that your results are sesitive to outliers. Normally Distributed Residuals Eve though the calculatios of the regressio model ad R do ot deped o the ormality assumptio, idetifyig patters i residual plots ca ofte lead to aother model that better explais the respose variable. To determie if the residuals are ormally distributed, two graphs are ofte created: a histogram of the residuals ad a ormal probability plot. Normal probability plots are created by sortig the data (the residuals i this case) from smallest to largest. The the sorted residuals are plotted agaist a theoretical ormal distributio. If the plot forms a straight lie, the actual data ad the theoretical data have the same shape (i.e., the same distributio). (Normal probability plots are discussed i more detail i Chapter.) 13. Create a regressio lie to predict TPrice from Mileage. Create a histogram ad a ormal probability plot of the residuals. a. Do the residuals appear to follow the ormal distributio? b. Are the te outliers visible o the ormal probability plot ad the histogram? Figure 3.6 shows the ormal probability plot usig six explaatory variables to estimate TPrice. While the outliers are ot visible, both plots still show evidece of lack of ormality.

12 78 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? Figure 3.6 Normal probability plot ad histogram of residuals from the model usig TPrice as the respose ad Mileage, Cyl, Doors, Cruise, Soud, ad Leather as the explaatory variables. At this time, it should be clear that simply pluggig data ito a software package ad usig a iterative variable selectio techique will ot reliably create a best model. Key Cocept Before a fial model is selected, the residuals should be plotted agaist fitted (estimated) values, observatio order, the theoretical ormal distributio, ad each explaatory variable i the model. Table 3.1 shows how each residual plot is used to check model assumptios. If a patter exists i ay of the residual plots, the R value is likely to improve if differet explaatory variables or trasformatios are icluded i the model. Table 3.1 Plots that ca be used to evaluate model assumptios about the residuals. Assumptio Normality Zero mea Equal variaces Idepedece Idetically distributed Plot Histogram or ormal probability plot No plot (errors will always sum to zero uder these models) Plot of the residuals versus fits ad each explaatory variable Residuals versus order No plot (esure each subject was sampled from the same populatio withi each group) Correlatio Betwee Explaatory Variables Multicolliearity exists whe two or more explaatory variables i a multiple regressio model are highly correlated with each other. If two explaatory variables X 1 ad X are highly correlated, it ca be very difficult to idetify whether X 1, X, or both variables are actually resposible for ifluecig the respose variable, Y. 14. Create three regressio models usig Price as the respose variable. I all three cases, provide the regressio model, R value, t-statistic for the slope coefficiets, ad correspodig p-values.

13 3.4 Checkig Model Assumptios 79 a. I the first model, use oly Mileage ad Liter as the explaatory variables. Is Liter a importat explaatory variable i this model? b. I the secod model, use oly Mileage ad umber of cyliders (Cyl) as the explaatory variables. Is Cyl a importat explaatory variable i this model? c. I the third model, use Mileage, Liter, ad umber of cyliders (Cyl) as the explaatory variables. How did the test statistics ad p-values chage whe all three explaatory variables were icluded i the model? 15. Note that the R values are essetially the same i all three models i Questio 14. The coefficiets for Mileage also stay roughly the same for all three models the iclusio of Liter or Cyl i the model does ot appear to ifluece the Mileage coefficiet. Depedig o which model is used, we state that for each additioal mile o the car, Price is reduced somewhere betwee $0.15 to $0.16. Describe how the coefficiet for Liter depeds o whether Cyl is i the model. 16. Plot Cyl versus Liter ad calculate the correlatio betwee these two variables. Is there a strog correlatio betwee these two variables? Explai. Recall that Questio 4 suggested deletig Liter from the model. The goal i stepwise regressio is to fid the best model based o the R value. If two explaatory variables both impact the respose i the same way, stepwise regressio will rather arbitrarily pick oe variable ad igore the other. Key Cocept Stepwise regressio ca ofte completely miss importat explaatory variables whe there is multicolliearity. Mathematical Note Most software packages ca create a matrix plot of the correlatios ad correspodig scatterplots of all explaatory variables. This matrix plot is helpful i idetifyig ay patters of iterdepedece amog the explaatory variables. A easy-to-apply guidelie for determiig if multicolliearity eeds to be dealt with is to use the variace iflatio factor (VIF). VIF coducts a regressio of each explaatory variable (X i ) o the remaiig explaatory variables, calculates the correspodig R value (R i ), ad the calculates the followig fuctio for each variable X i : 1/(1 - R i ). If the R i value is zero, VIF is oe, ad X i is ucorrelated with all other explaatory variables. Motgomery, Peck, ad Viig state, Practical experiece idicates that if ay of the VIFs exceed 5 or 10, it is a idicatio that the associated regressio coefficiets are poorly estimated because of multicolliearity. 4 Figure 3.7 shows that Liter ad Cyl are highly correlated. Withi the cotext of this problem, it does t make physical sese to cosider holdig oe variable costat while the other variable icreases. I geeral, it may ot be possible to fix a multicolliearity problem. If the goal is simply to describe or predict retail prices, multicolliearity is ot a critical issue. Redudat variables should be elimiated from the model, but highly correlated variables that both cotribute to the model are acceptable if you are ot iterpretig the coefficiets. However, if your goal is to cofirm whether a explaatory variable is associated with a respose (test a theory), the it is essetial to idetify the presece of multicolliearity ad to recogize that the coefficiets are ureliable whe it exists. Key Cocept If your goal is to create a model to describe or predict, the multicolliearity really is ot a problem. Note that multicolliearity has very little impact o the R value. However, if your goal is to uderstad how a specific explaatory variable iflueces the respose, as is ofte doe whe cofirmig a theory, the multicolliearity ca cause coefficiets (ad their correspodig p-values whe testig their sigificace) to be ureliable.

14 80 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? Figure 3.7 Scatterplot showig a clear associatio betwee Liter ad Cyl. The followig approaches are commoly used to address multicolliearity: Get more iformatio: If it is possible, expadig the data collectio may lead to samples where the variables are ot so correlated. Cosider whether a greater rage of data could be collected or whether data could be measured differetly so that the variables are ot correlated. For example, the data here are oly for GM cars. Perhaps the relatioship betwee egie size i liters ad umber of cyliders is ot so strog for data from a wider variety of maufacturers. Re-evaluate the model: Whe two explaatory variables are highly correlated, deletig oe variable will ot sigificatly impact the R value. However, if there are theoretical reasos to iclude both variables i the model, keep both terms. I our example, Liter ad umber of cyliders (Cyl) are measurig essetially the same quatity. Liter represets the volume displaced durig oe complete egie cycle; umber of cyliders (Cyl) also is a measure of the volume that ca be displaced. Combie the variables: Usig other statistical techiques such as pricipal compoets, it is possible to combie the correlated variables optimally ito a sigle variable that ca be used i the model. There may be theoretical reasos to combie variables i a certai way. For example, the volume (size) ad weight of a car are likely highly positively correlated. Perhaps a ew variable defied as desity = weight/volume could be used i a model predictig price rather tha either of these idividual variables. I this ivestigatio, we are simply attemptig to develop a model that ca be used to estimate price, so multicolliearity will ot have much impact o our results. If we did re-evaluate the model i light of the fact that Liter ad umber of cyliders (Cyl) both measure displacemet (egie size), we might ote that Liter is a more specific variable, takig o several values, while Cyl has oly three values i the data set. Thus, we might choose to keep Liter ad remove Cyl i the model. 3.5 Iterpretig Model Coefficiets While multiple regressio is very useful i uderstadig the impacts of various explaatory variables o a respose, there are importat limitatios. Whe predictors i the model are highly correlated, the size ad meaig of the coefficiets ca be difficult to iterpret. I Questio 14, the followig three models were developed: Price = (Mileage) (Liter) Price = (Mileage) + 408(Cyl) Price = (Mileage) (Liter) + 848(Cyl)

15 3.6 Categorical Explaatory Variables 81 The iterpretatio of model coefficiets is more complex i multiple liear regressio tha i simple liear regressio. It ca be misleadig to try to iterpret a coefficiet without cosiderig other terms i the model. For example, whe Mileage ad Liter are the two predictors i a regressio model, the Liter coefficiet might seem to idicate that a icrease of oe i Liter will icrease the expected price by $4968. However, whe Cyl is also icluded i the model, the Liter coefficiet seems to idicate that a icrease of oe i Liter will icrease the expected price by $1545. The size of a regressio coefficiet ad eve the directio ca chage depedig o which other terms are i the model. I this ivestigatio, we have show that Liter ad Cyl are highly correlated. Thus, it is ureasoable to believe that Liter would chage by oe uit but Cyl would stay costat. The multiple liear regressio coefficiets caot be cosidered i isolatio. Istead, the Liter coefficiet shows how the expected price will chage whe Liter icreases by oe uit, after accoutig for correspodig chages i all the other explaatory variables i the model. Key Cocept I multiple liear regressio, the coefficiets tell us how much the expected respose will chage whe the explaatory variable icreases by oe uit, after accoutig for correspodig chages i all other terms i the model. 3.6 Categorical Explaatory Variables As we saw i Questio 9, there is a clear patter i the residual versus order plot for the Kelley Blue Book car pricig data. It is likely that oe of the categorical variables (Make, Model, Trim, or Type) could explai this patter. If ay of these categorical variables are related to the respose variable, the we wat to add these variables to our regressio model. A commo procedure used to icorporate categorical explaatory variables ito a regressio model is to defie idicator variables, also called dummy variables. Creatig idicator variables is a process of mappig the oe colum (variable) of categorical data ito several colums (idicator variables) of 0 ad 1 data. Let s take the variable Make as a example. The six possible values (Buick, Cadillac, Chevrolet, Potiac, SAAB, Satur) ca be recoded usig six idicator variables, oe for each of the six makes of car. For example, the idicator variable for Buick will have the value 1 for every car that is a Buick ad 0 for each car that is ot a Buick. Most statistical software packages have a commad for creatig the idicator variables automatically. Creatig Idicator Variables 17. Create boxplots or idividual value plots of the respose variable TPrice versus the categorical variables Make, Model, Trim, ad Type. Describe ay patters you see. 18. Create idicator variables for Make. Name the colums, i order, Buick, Cadillac, Chevrolet, Potiac, SAAB, ad Satur. Look at the ew data colums ad describe how the idicator variables are defied. For example, list all possible outcomes for the Cadillac idicator variable ad explai what each outcome represets. Ay of the idicator variables i Questio 18 ca be icorporated ito a regressio model. However, if you wat to iclude Make i its etirety i the model, do ot iclude all six idicator variables. Five will suffice because there is complete redudacy i the sixth idicator variable. If the first five idicator variables are all 0 for a particular car, we automatically kow that this car belogs to the sixth category. Below, we will leave the Buick idicator variable out of our model. The coefficiet for a idicator variable is a estimate of the average amout by which the respose variable will chage. For example, the estimated coefficiet for the Satur variable is a estimate of the average differece i TPrice whe the car is a Satur rather tha a Buick (after adjustig for correspodig chages i all other terms i the model).

16 8 Chapter 3 Multiple Regressio: How Much Is Your Car Worth? Mathematical Note For ay categorical explaatory variable with g groups, oly g - 1 terms should be icluded i the regressio model. Most software packages use matrix algebra to develop multiple regressio models. If all g terms are i the model, explaatory variables will be 100% correlated (you ca exactly predict the value of oe variable if you kow the other variables) ad the eeded matrix iversio caot be doe. If the researcher chooses to leave all g terms i the model, most software packages will arbitrarily remove oe term so that the eeded matrix calculatios ca be completed. It may be temptig to simplify the model by icludig oly a few of the most sigificat idicator variables. For example, istead of icludig five idicator variables for Make, you might cosider oly usig Cadillac ad SAAB. Most statisticias would recommed agaist this. By limitig the model to oly idicator variables that are sigificat i the sample data set, we ca overfit the model. Models are overfit whe researchers overwork a data set i order to icrease the R value. For example, a researcher could sped a sigificat amout of time pickig a few idicator variables from Make, Model, Trim, ad Type i order to fid the best R value. While the model would likely estimate the mea respose well, it ulikely to accurately predict ew values of the respose variable. This is a fairly uaced poit. The purpose of variable selectio techiques is to select the variables that best explai the respose variable. However, overfittig may occur if we break up categorical variables ito smaller uits ad the pick ad choose amog the best parts of those variables. (The Model Validatio sectio i the exteded activities discusses this topic i more detail.) Buildig Regressio Models with Idicator Variables 19. Build a ew regressio model usig TPrice as the respose ad Mileage, Liter, Satur, Cadillac, Chevrolet, Potiac, ad SAAB as the explaatory variables. Explai why you expect the R value to icrease whe you add terms for Make. 0. Create idicator variables for Type. Iclude the Make ad Type idicator variables, plus the variables Liter, Doors, Cruise, Soud, Leather, ad Mileage, i a model to predict TPrice. Remember to leave at least oe category out of the model for the Make ad Type idicator variables (e.g., leave Buick ad Hatchback out of the model). Compare this regressio model to the other models that you have fit to the data i this ivestigatio. Does the ormal probability plot suggest that the residuals could follow a ormal distributio? Describe whether the residual versus fit, the residual versus order, ad the residual versus each explaatory variable plots look more radom tha they did i earlier problems. The additioal categorical variables are importat i improvig the regressio model. Figure 3.8 shows that whe Make ad Type are ot i the model, the residuals are clustered. Whe Make ad Type are icluded i the model, the residuals appear to be more radomly distributed. By icorporatig Make ad Type, we have bee able to explai some of variability that was causig the clusterig. I additio, the sizes of the residuals are much smaller. Smaller residuals idicate a better fittig model ad a higher R value. Eve though Make ad Type improved the residual plots, there is still clusterig that might be improved by icorporatig Model ito the regressio equatio. However, if the goal is simply to develop a model that accurately estimates retail prices, the R value i Questio 0 is already fairly high. Are there a few more terms that ca be added to the model that would dramatically improve the R value? To determie a fial model, you should attempt to maximize the R value while simultaeously keepig the model relatively simple. For your fial model, you should commet o the residual versus fit, residual versus order, ad ay other residual plots that previously showed patters. If ay patter exists i the residual plots, it may be worth attemptig a ew regressio model that will accout for these patters. If the regressio model ca be modified to address the patters i the residuals, the R value will improve. However, ote that the R value is already fairly high. It may ot be worth makig the model more complex for oly a slight icrease i the R value. 1. Create a regressio model that is simple (i.e., does ot have too may terms) ad still accurately predicts retail price. Validate the model assumptios. Look at residual plots ad check for heteroskedasticity, multicolliearity, autocorrelatio, ad outliers. Your fial model should ot have sigificat clusters, skewess, outliers, or heteroskedasticity appearig i the residual plots. Submit your suggested least squares regressio formula alog with a limited umber of appropriate graphs that provide justificatio for your model. Describe why you believe this model is best.

17 3.8 F-Tests for Multiple Regressio 83 Figure 3.8 Residual versus order plots show that icorporatig the idicator variables ito the regressio model improves the radom behavior ad reduces the sizes of the residuals. 3.7 What Ca We Coclude from the 005 GM Car Study? The data are from a observatioal study, ot a experimet. Therefore, eve though the R value reveals a strog relatioship betwee our explaatory variables ad the respose, a sigificat correlatio (ad thus a sigificat coefficiet) does ot imply a causal lik betwee the explaatory variable ad the respose. There may be theoretical or practical reasos to believe that mileage (or ay of the other explaatory variables) causes lower prices, but the fial model ca be used oly to show that there is a associatio. Best subsets ad residual graphs were used to develop a model that is useful for describig or predictig the retail price based o a fuctio of the explaatory variables. However, sice iterative techiques were used, the p-values correspodig to the sigificace of each idividual coefficiet are ot reliable. For this data set, cars were radomly selected withi each make, model, ad type of 005 GM car produced, ad the suggested retail prices were determied from Kelley Blue Book. While this is ot a simple radom sample of all 005 GM cars actually o the road, there is still reaso to believe that your fial model will provide a accurate descriptio or predictio of retail price for used 005 GM cars. Of course, as time goes by, the value of these cars will be reduced ad updated models will eed to be developed. Multiple Regressio 3.8 F-Tests for Multiple Regressio Decompositio of Sum of Squares May of the calculatios ivolved i multiple regressio are very closely related to those for the simple liear regressio model. Simple liear regressio models have the advatage of beig easily visualized with scatterplots. Thus, we start with a simple liear regressio model to derive several key equatios used i multiple regressio. Figure 3.9 shows a scatterplot ad regressio lie for a subset of the used 005 Kelly Blue Book Cars data. The data set is restricted to just Chevrolet Cavalier Sedas. I this scatterplot, oe specific observatio is highlighted: the poit where Mileage is 11,488 ad the observed Price is y i = 14, I Figure 3.9, we see that for ay idividual observatio the total deviatio (y i - y) is decomposed ito two parts: y i - y = ( y i - y i ) + ( y i - y) (3.7)

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation

II. Descriptive Statistics D. Linear Correlation and Regression. 1. Linear Correlation II. Descriptive Statistics D. Liear Correlatio ad Regressio I this sectio Liear Correlatio Cause ad Effect Liear Regressio 1. Liear Correlatio Quatifyig Liear Correlatio The Pearso product-momet correlatio

More information

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable

Response Variable denoted by y it is the variable that is to be predicted measure of the outcome of an experiment also called the dependent variable Statistics Chapter 4 Correlatio ad Regressio If we have two (or more) variables we are usually iterested i the relatioship betwee the variables. Associatio betwee Variables Two variables are associated

More information

1 Inferential Methods for Correlation and Regression Analysis

1 Inferential Methods for Correlation and Regression Analysis 1 Iferetial Methods for Correlatio ad Regressio Aalysis I the chapter o Correlatio ad Regressio Aalysis tools for describig bivariate cotiuous data were itroduced. The sample Pearso Correlatio Coefficiet

More information

Properties and Hypothesis Testing

Properties and Hypothesis Testing Chapter 3 Properties ad Hypothesis Testig 3.1 Types of data The regressio techiques developed i previous chapters ca be applied to three differet kids of data. 1. Cross-sectioal data. 2. Time series data.

More information

Statistics 511 Additional Materials

Statistics 511 Additional Materials Cofidece Itervals o mu Statistics 511 Additioal Materials This topic officially moves us from probability to statistics. We begi to discuss makig ifereces about the populatio. Oe way to differetiate probability

More information

11 Correlation and Regression

11 Correlation and Regression 11 Correlatio Regressio 11.1 Multivariate Data Ofte we look at data where several variables are recorded for the same idividuals or samplig uits. For example, at a coastal weather statio, we might record

More information

Read through these prior to coming to the test and follow them when you take your test.

Read through these prior to coming to the test and follow them when you take your test. Math 143 Sprig 2012 Test 2 Iformatio 1 Test 2 will be give i class o Thursday April 5. Material Covered The test is cummulative, but will emphasize the recet material (Chapters 6 8, 10 11, ad Sectios 12.1

More information

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference EXST30 Backgroud material Page From the textbook The Statistical Sleuth Mea [0]: I your text the word mea deotes a populatio mea (µ) while the work average deotes a sample average ( ). Variace [0]: The

More information

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y. Testig Statistical Hypotheses Recall the study where we estimated the differece betwee mea systolic blood pressure levels of users of oral cotraceptives ad o-users, x - y. Such studies are sometimes viewed

More information

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers Chapter 4 4-1 orth Seattle Commuity College BUS10 Busiess Statistics Chapter 4 Descriptive Statistics Summary Defiitios Cetral tedecy: The extet to which the data values group aroud a cetral value. Variatio:

More information

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010, 2007, 2004 Pearso Educatio, Ic. Comparig Two Proportios Read the first two paragraphs of pg 504. Comparisos betwee two percetages are much more commo

More information

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram.

Summary: CORRELATION & LINEAR REGRESSION. GC. Students are advised to refer to lecture notes for the GC operations to obtain scatter diagram. Key Cocepts: 1) Sketchig of scatter diagram The scatter diagram of bivariate (i.e. cotaiig two variables) data ca be easily obtaied usig GC. Studets are advised to refer to lecture otes for the GC operatios

More information

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators. IE 330 Seat # Ope book ad otes 120 miutes Cover page ad six pages of exam No calculators Score Fial Exam (example) Schmeiser Ope book ad otes No calculator 120 miutes 1 True or false (for each, 2 poits

More information

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity

LINEAR REGRESSION ANALYSIS. MODULE IX Lecture Multicollinearity LINEAR REGRESSION ANALYSIS MODULE IX Lecture - 9 Multicolliearity Dr Shalabh Departmet of Mathematics ad Statistics Idia Istitute of Techology Kapur Multicolliearity diagostics A importat questio that

More information

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/26/12. Multiple Regression (2) (Chapter 15, Hines) Dr Maddah NMG 617 M Statistics 11/6/1 Multiple egressio () (Chapter 15, Hies) Test for sigificace of regressio This is a test to determie whether there is a liear relatioship betwee the depedet variable

More information

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes

Statistics Lecture 27. Final review. Administrative Notes. Outline. Experiments. Sampling and Surveys. Administrative Notes Admiistrative Notes s - Lecture 7 Fial review Fial Exam is Tuesday, May 0th (3-5pm Covers Chapters -8 ad 0 i textbook Brig ID cards to fial! Allowed: Calculators, double-sided 8.5 x cheat sheet Exam Rooms:

More information

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}. 1 (*) If a lot of the data is far from the mea, the may of the (x j x) 2 terms will be quite large, so the mea of these terms will be large ad the SD of the data will be large. (*) I particular, outliers

More information

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n.

ST 305: Exam 3 ( ) = P(A)P(B A) ( ) = P(A) + P(B) ( ) = 1 P( A) ( ) = P(A) P(B) σ X 2 = σ a+bx. σ ˆp. σ X +Y. σ X Y. σ X. σ Y. σ n. ST 305: Exam 3 By hadig i this completed exam, I state that I have either give or received assistace from aother perso durig the exam period. I have used o resources other tha the exam itself ad the basic

More information

Power and Type II Error

Power and Type II Error Statistical Methods I (EXST 7005) Page 57 Power ad Type II Error Sice we do't actually kow the value of the true mea (or we would't be hypothesizig somethig else), we caot kow i practice the type II error

More information

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc.

Chapter 22. Comparing Two Proportions. Copyright 2010 Pearson Education, Inc. Chapter 22 Comparig Two Proportios Copyright 2010 Pearso Educatio, Ic. Comparig Two Proportios Comparisos betwee two percetages are much more commo tha questios about isolated percetages. Ad they are more

More information

Stat 139 Homework 7 Solutions, Fall 2015

Stat 139 Homework 7 Solutions, Fall 2015 Stat 139 Homework 7 Solutios, Fall 2015 Problem 1. I class we leared that the classical simple liear regressio model assumes the followig distributio of resposes: Y i = β 0 + β 1 X i + ɛ i, i = 1,...,,

More information

This is an introductory course in Analysis of Variance and Design of Experiments.

This is an introductory course in Analysis of Variance and Design of Experiments. 1 Notes for M 384E, Wedesday, Jauary 21, 2009 (Please ote: I will ot pass out hard-copy class otes i future classes. If there are writte class otes, they will be posted o the web by the ight before class

More information

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01 ENGI 44 Cofidece Itervals (Two Samples) Page -0 Two Sample Cofidece Iterval for a Differece i Populatio Meas [Navidi sectios 5.4-5.7; Devore chapter 9] From the cetral limit theorem, we kow that, for sufficietly

More information

Infinite Sequences and Series

Infinite Sequences and Series Chapter 6 Ifiite Sequeces ad Series 6.1 Ifiite Sequeces 6.1.1 Elemetary Cocepts Simply speakig, a sequece is a ordered list of umbers writte: {a 1, a 2, a 3,...a, a +1,...} where the elemets a i represet

More information

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ STATISTICAL INFERENCE INTRODUCTION Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I oesample testig, we essetially

More information

A statistical method to determine sample size to estimate characteristic value of soil parameters

A statistical method to determine sample size to estimate characteristic value of soil parameters A statistical method to determie sample size to estimate characteristic value of soil parameters Y. Hojo, B. Setiawa 2 ad M. Suzuki 3 Abstract Sample size is a importat factor to be cosidered i determiig

More information

ANALYSIS OF EXPERIMENTAL ERRORS

ANALYSIS OF EXPERIMENTAL ERRORS ANALYSIS OF EXPERIMENTAL ERRORS All physical measuremets ecoutered i the verificatio of physics theories ad cocepts are subject to ucertaities that deped o the measurig istrumets used ad the coditios uder

More information

Random Variables, Sampling and Estimation

Random Variables, Sampling and Estimation Chapter 1 Radom Variables, Samplig ad Estimatio 1.1 Itroductio This chapter will cover the most importat basic statistical theory you eed i order to uderstad the ecoometric material that will be comig

More information

Final Examination Solutions 17/6/2010

Final Examination Solutions 17/6/2010 The Islamic Uiversity of Gaza Faculty of Commerce epartmet of Ecoomics ad Political Scieces A Itroductio to Statistics Course (ECOE 30) Sprig Semester 009-00 Fial Eamiatio Solutios 7/6/00 Name: I: Istructor:

More information

Correlation Regression

Correlation Regression Correlatio Regressio While correlatio methods measure the stregth of a liear relatioship betwee two variables, we might wish to go a little further: How much does oe variable chage for a give chage i aother

More information

ECON 3150/4150, Spring term Lecture 3

ECON 3150/4150, Spring term Lecture 3 Itroductio Fidig the best fit by regressio Residuals ad R-sq Regressio ad causality Summary ad ext step ECON 3150/4150, Sprig term 2014. Lecture 3 Ragar Nymoe Uiversity of Oslo 21 Jauary 2014 1 / 30 Itroductio

More information

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise)

Lecture 22: Review for Exam 2. 1 Basic Model Assumptions (without Gaussian Noise) Lecture 22: Review for Exam 2 Basic Model Assumptios (without Gaussia Noise) We model oe cotiuous respose variable Y, as a liear fuctio of p umerical predictors, plus oise: Y = β 0 + β X +... β p X p +

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Explorig Data: Distributios Look for overall patter (shape, ceter, spread) ad deviatios (outliers). Mea (use a calculator): x = x 1 + x 2 + +

More information

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So,

First, note that the LS residuals are orthogonal to the regressors. X Xb X y = 0 ( normal equations ; (k 1) ) So, 0 2. OLS Part II The OLS residuals are orthogoal to the regressors. If the model icludes a itercept, the orthogoality of the residuals ad regressors gives rise to three results, which have limited practical

More information

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 9 Hypothesis testig PSYCHOLOGICAL RESEARCH (PYC 34-C Lecture 9 Statistical iferece is that brach of Statistics i which oe typically makes a statemet about a populatio based upo the results of a sample. I

More information

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all!

ENGI 4421 Probability and Statistics Faculty of Engineering and Applied Science Problem Set 1 Solutions Descriptive Statistics. None at all! ENGI 44 Probability ad Statistics Faculty of Egieerig ad Applied Sciece Problem Set Solutios Descriptive Statistics. If, i the set of values {,, 3, 4, 5, 6, 7 } a error causes the value 5 to be replaced

More information

Parameter, Statistic and Random Samples

Parameter, Statistic and Random Samples Parameter, Statistic ad Radom Samples A parameter is a umber that describes the populatio. It is a fixed umber, but i practice we do ot kow its value. A statistic is a fuctio of the sample data, i.e.,

More information

Statistical Properties of OLS estimators

Statistical Properties of OLS estimators 1 Statistical Properties of OLS estimators Liear Model: Y i = β 0 + β 1 X i + u i OLS estimators: β 0 = Y β 1X β 1 = Best Liear Ubiased Estimator (BLUE) Liear Estimator: β 0 ad β 1 are liear fuctio of

More information

6.3 Testing Series With Positive Terms

6.3 Testing Series With Positive Terms 6.3. TESTING SERIES WITH POSITIVE TERMS 307 6.3 Testig Series With Positive Terms 6.3. Review of what is kow up to ow I theory, testig a series a i for covergece amouts to fidig the i= sequece of partial

More information

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering CEE 5 Autum 005 Ucertaity Cocepts for Geotechical Egieerig Basic Termiology Set A set is a collectio of (mutually exclusive) objects or evets. The sample space is the (collectively exhaustive) collectio

More information

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph

Correlation. Two variables: Which test? Relationship Between Two Numerical Variables. Two variables: Which test? Contingency table Grouped bar graph Correlatio Y Two variables: Which test? X Explaatory variable Respose variable Categorical Numerical Categorical Cotigecy table Cotigecy Logistic Grouped bar graph aalysis regressio Mosaic plot Numerical

More information

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions Chapter 9 Slide Ifereces from Two Samples 9- Overview 9- Ifereces about Two Proportios 9- Ifereces about Two Meas: Idepedet Samples 9-4 Ifereces about Matched Pairs 9-5 Comparig Variatio i Two Samples

More information

Linear Regression Models

Linear Regression Models Liear Regressio Models Dr. Joh Mellor-Crummey Departmet of Computer Sciece Rice Uiversity johmc@cs.rice.edu COMP 528 Lecture 9 15 February 2005 Goals for Today Uderstad how to Use scatter diagrams to ispect

More information

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample. Statistical Iferece (Chapter 10) Statistical iferece = lear about a populatio based o the iformatio provided by a sample. Populatio: The set of all values of a radom variable X of iterest. Characterized

More information

MA238 Assignment 4 Solutions (part a)

MA238 Assignment 4 Solutions (part a) (i) Sigle sample tests. Questio. MA38 Assigmet 4 Solutios (part a) (a) (b) (c) H 0 : = 50 sq. ft H A : < 50 sq. ft H 0 : = 3 mpg H A : > 3 mpg H 0 : = 5 mm H A : 5mm Questio. (i) What are the ull ad alterative

More information

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised

Continuous Data that can take on any real number (time/length) based on sample data. Categorical data can only be named or categorised Questio 1. (Topics 1-3) A populatio cosists of all the members of a group about which you wat to draw a coclusio (Greek letters (μ, σ, Ν) are used) A sample is the portio of the populatio selected for

More information

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test.

Because it tests for differences between multiple pairs of means in one test, it is called an omnibus test. Math 308 Sprig 018 Classes 19 ad 0: Aalysis of Variace (ANOVA) Page 1 of 6 Itroductio ANOVA is a statistical procedure for determiig whether three or more sample meas were draw from populatios with equal

More information

Chapter 3: Other Issues in Multiple regression (Part 1)

Chapter 3: Other Issues in Multiple regression (Part 1) Chapter 3: Other Issues i Multiple regressio (Part 1) 1 Model (variable) selectio The difficulty with model selectio: for p predictors, there are 2 p differet cadidate models. Whe we have may predictors

More information

Polynomial Functions and Their Graphs

Polynomial Functions and Their Graphs Polyomial Fuctios ad Their Graphs I this sectio we begi the study of fuctios defied by polyomial expressios. Polyomial ad ratioal fuctios are the most commo fuctios used to model data, ad are used extesively

More information

1 Review of Probability & Statistics

1 Review of Probability & Statistics 1 Review of Probability & Statistics a. I a group of 000 people, it has bee reported that there are: 61 smokers 670 over 5 960 people who imbibe (drik alcohol) 86 smokers who imbibe 90 imbibers over 5

More information

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Discrete Mathematics for CS Spring 2008 David Wagner Note 22 CS 70 Discrete Mathematics for CS Sprig 2008 David Wager Note 22 I.I.D. Radom Variables Estimatig the bias of a coi Questio: We wat to estimate the proportio p of Democrats i the US populatio, by takig

More information

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1

PH 425 Quantum Measurement and Spin Winter SPINS Lab 1 PH 425 Quatum Measuremet ad Spi Witer 23 SPIS Lab Measure the spi projectio S z alog the z-axis This is the experimet that is ready to go whe you start the program, as show below Each atom is measured

More information

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen) Goodess-of-Fit Tests ad Categorical Data Aalysis (Devore Chapter Fourtee) MATH-252-01: Probability ad Statistics II Sprig 2019 Cotets 1 Chi-Squared Tests with Kow Probabilities 1 1.1 Chi-Squared Testig................

More information

MEASURES OF DISPERSION (VARIABILITY)

MEASURES OF DISPERSION (VARIABILITY) POLI 300 Hadout #7 N. R. Miller MEASURES OF DISPERSION (VARIABILITY) While measures of cetral tedecy idicate what value of a variable is (i oe sese or other, e.g., mode, media, mea), average or cetral

More information

Chapter 6 Sampling Distributions

Chapter 6 Sampling Distributions Chapter 6 Samplig Distributios 1 I most experimets, we have more tha oe measuremet for ay give variable, each measuremet beig associated with oe radomly selected a member of a populatio. Hece we eed to

More information

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to:

STA Learning Objectives. Population Proportions. Module 10 Comparing Two Proportions. Upon completing this module, you should be able to: STA 2023 Module 10 Comparig Two Proportios Learig Objectives Upo completig this module, you should be able to: 1. Perform large-sample ifereces (hypothesis test ad cofidece itervals) to compare two populatio

More information

Understanding Samples

Understanding Samples 1 Will Moroe CS 109 Samplig ad Bootstrappig Lecture Notes #17 August 2, 2017 Based o a hadout by Chris Piech I this chapter we are goig to talk about statistics calculated o samples from a populatio. We

More information

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals 7-1 Chapter 4 Part I. Samplig Distributios ad Cofidece Itervals 1 7- Sectio 1. Samplig Distributio 7-3 Usig Statistics Statistical Iferece: Predict ad forecast values of populatio parameters... Test hypotheses

More information

Frequentist Inference

Frequentist Inference Frequetist Iferece The topics of the ext three sectios are useful applicatios of the Cetral Limit Theorem. Without kowig aythig about the uderlyig distributio of a sequece of radom variables {X i }, for

More information

Confidence Intervals for the Population Proportion p

Confidence Intervals for the Population Proportion p Cofidece Itervals for the Populatio Proportio p The cocept of cofidece itervals for the populatio proportio p is the same as the oe for, the samplig distributio of the mea, x. The structure is idetical:

More information

AP Statistics Review Ch. 8

AP Statistics Review Ch. 8 AP Statistics Review Ch. 8 Name 1. Each figure below displays the samplig distributio of a statistic used to estimate a parameter. The true value of the populatio parameter is marked o each samplig distributio.

More information

Math 140 Introductory Statistics

Math 140 Introductory Statistics 8.2 Testig a Proportio Math 1 Itroductory Statistics Professor B. Abrego Lecture 15 Sectios 8.2 People ofte make decisios with data by comparig the results from a sample to some predetermied stadard. These

More information

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row:

(3) If you replace row i of A by its sum with a multiple of another row, then the determinant is unchanged! Expand across the i th row: Math 5-4 Tue Feb 4 Cotiue with sectio 36 Determiats The effective way to compute determiats for larger-sized matrices without lots of zeroes is to ot use the defiitio, but rather to use the followig facts,

More information

Data Analysis and Statistical Methods Statistics 651

Data Analysis and Statistical Methods Statistics 651 Data Aalysis ad Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasii/teachig.html Suhasii Subba Rao Review of testig: Example The admistrator of a ursig home wats to do a time ad motio

More information

Analysis of Experimental Data

Analysis of Experimental Data Aalysis of Experimetal Data 6544597.0479 ± 0.000005 g Quatitative Ucertaity Accuracy vs. Precisio Whe we make a measuremet i the laboratory, we eed to kow how good it is. We wat our measuremets to be both

More information

Median and IQR The median is the value which divides the ordered data values in half.

Median and IQR The median is the value which divides the ordered data values in half. STA 666 Fall 2007 Web-based Course Notes 4: Describig Distributios Numerically Numerical summaries for quatitative variables media ad iterquartile rage (IQR) 5-umber summary mea ad stadard deviatio Media

More information

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence Sequeces A sequece of umbers is a fuctio whose domai is the positive itegers. We ca see that the sequece,, 2, 2, 3, 3,... is a fuctio from the positive itegers whe we write the first sequece elemet as

More information

6 Sample Size Calculations

6 Sample Size Calculations 6 Sample Size Calculatios Oe of the major resposibilities of a cliical trial statisticia is to aid the ivestigators i determiig the sample size required to coduct a study The most commo procedure for determiig

More information

Stat 421-SP2012 Interval Estimation Section

Stat 421-SP2012 Interval Estimation Section Stat 41-SP01 Iterval Estimatio Sectio 11.1-11. We ow uderstad (Chapter 10) how to fid poit estimators of a ukow parameter. o However, a poit estimate does ot provide ay iformatio about the ucertaity (possible

More information

Module 1 Fundamentals in statistics

Module 1 Fundamentals in statistics Normal Distributio Repeated observatios that differ because of experimetal error ofte vary about some cetral value i a roughly symmetrical distributio i which small deviatios occur much more frequetly

More information

Chapter 23: Inferences About Means

Chapter 23: Inferences About Means Chapter 23: Ifereces About Meas Eough Proportios! We ve spet the last two uits workig with proportios (or qualitative variables, at least) ow it s time to tur our attetios to quatitative variables. For

More information

Regression and correlation

Regression and correlation Cotets 43 Regressio ad correlatio 1. Regressio. Correlatio Learig outcomes You will lear how to explore relatioships betwee variables ad how to measure the stregth of such relatioships. You should ote

More information

Chapter 12 Correlation

Chapter 12 Correlation Chapter Correlatio Correlatio is very similar to regressio with oe very importat differece. Regressio is used to explore the relatioship betwee a idepedet variable ad a depedet variable, whereas correlatio

More information

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara Poit Estimator Eco 325 Notes o Poit Estimator ad Cofidece Iterval 1 By Hiro Kasahara Parameter, Estimator, ad Estimate The ormal probability desity fuctio is fully characterized by two costats: populatio

More information

Ismor Fischer, 1/11/

Ismor Fischer, 1/11/ Ismor Fischer, //04 7.4-7.4 Problems. I Problem 4.4/9, it was show that importat relatios exist betwee populatio meas, variaces, ad covariace. Specifically, we have the formulas that appear below left.

More information

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2.

multiplies all measures of center and the standard deviation and range by k, while the variance is multiplied by k 2. Lesso 3- Lesso 3- Scale Chages of Data Vocabulary scale chage of a data set scale factor scale image BIG IDEA Multiplyig every umber i a data set by k multiplies all measures of ceter ad the stadard deviatio

More information

(X i X)(Y i Y ) = 1 n

(X i X)(Y i Y ) = 1 n L I N E A R R E G R E S S I O N 10 I Chapter 6 we discussed the cocepts of covariace ad correlatio two ways of measurig the extet to which two radom variables, X ad Y were related to each other. I may

More information

Chapter 4 - Summarizing Numerical Data

Chapter 4 - Summarizing Numerical Data Chapter 4 - Summarizig Numerical Data 15.075 Cythia Rudi Here are some ways we ca summarize data umerically. Sample Mea: i=1 x i x :=. Note: i this class we will work with both the populatio mea µ ad the

More information

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random Part III. Areal Data Aalysis 0. Comparative Tests amog Spatial Regressio Models While the otio of relative likelihood values for differet models is somewhat difficult to iterpret directly (as metioed above),

More information

Regression and Correlation

Regression and Correlation 43 Cotets Regressio ad Correlatio 43.1 Regressio 43. Correlatio 17 Learig outcomes You will lear how to explore relatioships betwee variables ad how to measure the stregth of such relatioships. You should

More information

Principle Of Superposition

Principle Of Superposition ecture 5: PREIMINRY CONCEP O RUCUR NYI Priciple Of uperpositio Mathematically, the priciple of superpositio is stated as ( a ) G( a ) G( ) G a a or for a liear structural system, the respose at a give

More information

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE IN STATISTICS, 017 MODULE 4 : Liear models Time allowed: Oe ad a half hours Cadidates should aswer THREE questios. Each questio carries

More information

The Sample Variance Formula: A Detailed Study of an Old Controversy

The Sample Variance Formula: A Detailed Study of an Old Controversy The Sample Variace Formula: A Detailed Study of a Old Cotroversy Ky M. Vu PhD. AuLac Techologies Ic. c 00 Email: kymvu@aulactechologies.com Abstract The two biased ad ubiased formulae for the sample variace

More information

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING Lectures MODULE 5 STATISTICS II. Mea ad stadard error of sample data. Biomial distributio. Normal distributio 4. Samplig 5. Cofidece itervals

More information

Linear Regression Demystified

Linear Regression Demystified Liear Regressio Demystified Liear regressio is a importat subject i statistics. I elemetary statistics courses, formulae related to liear regressio are ofte stated without derivatio. This ote iteds to

More information

Circle the single best answer for each multiple choice question. Your choice should be made clearly.

Circle the single best answer for each multiple choice question. Your choice should be made clearly. TEST #1 STA 4853 March 6, 2017 Name: Please read the followig directios. DO NOT TURN THE PAGE UNTIL INSTRUCTED TO DO SO Directios This exam is closed book ad closed otes. There are 32 multiple choice questios.

More information

SNAP Centre Workshop. Basic Algebraic Manipulation

SNAP Centre Workshop. Basic Algebraic Manipulation SNAP Cetre Workshop Basic Algebraic Maipulatio 8 Simplifyig Algebraic Expressios Whe a expressio is writte i the most compact maer possible, it is cosidered to be simplified. Not Simplified: x(x + 4x)

More information

Economics Spring 2015

Economics Spring 2015 1 Ecoomics 400 -- Sprig 015 /17/015 pp. 30-38; Ch. 7.1.4-7. New Stata Assigmet ad ew MyStatlab assigmet, both due Feb 4th Midterm Exam Thursday Feb 6th, Chapters 1-7 of Groeber text ad all relevat lectures

More information

Estimation of a population proportion March 23,

Estimation of a population proportion March 23, 1 Social Studies 201 Notes for March 23, 2005 Estimatio of a populatio proportio Sectio 8.5, p. 521. For the most part, we have dealt with meas ad stadard deviatios this semester. This sectio of the otes

More information

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading

Comparing Two Populations. Topic 15 - Two Sample Inference I. Comparing Two Means. Comparing Two Pop Means. Background Reading Topic 15 - Two Sample Iferece I STAT 511 Professor Bruce Craig Comparig Two Populatios Research ofte ivolves the compariso of two or more samples from differet populatios Graphical summaries provide visual

More information

Measures of Spread: Standard Deviation

Measures of Spread: Standard Deviation Measures of Spread: Stadard Deviatio So far i our study of umerical measures used to describe data sets, we have focused o the mea ad the media. These measures of ceter tell us the most typical value of

More information

Chapter 10: Power Series

Chapter 10: Power Series Chapter : Power Series 57 Chapter Overview: Power Series The reaso series are part of a Calculus course is that there are fuctios which caot be itegrated. All power series, though, ca be itegrated because

More information

INTEGRATION BY PARTS (TABLE METHOD)

INTEGRATION BY PARTS (TABLE METHOD) INTEGRATION BY PARTS (TABLE METHOD) Suppose you wat to evaluate cos d usig itegratio by parts. Usig the u dv otatio, we get So, u dv d cos du d v si cos d si si d or si si d We see that it is ecessary

More information

Posted-Price, Sealed-Bid Auctions

Posted-Price, Sealed-Bid Auctions Posted-Price, Sealed-Bid Auctios Professors Greewald ad Oyakawa 207-02-08 We itroduce the posted-price, sealed-bid auctio. This auctio format itroduces the idea of approximatios. We describe how well this

More information

Chapter 2 Descriptive Statistics

Chapter 2 Descriptive Statistics Chapter 2 Descriptive Statistics Statistics Most commoly, statistics refers to umerical data. Statistics may also refer to the process of collectig, orgaizig, presetig, aalyzig ad iterpretig umerical data

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Liear Regressio 2.1 Simple liear model The simple liear regressio model shows how oe kow depedet variable is determied by a sigle explaatory variable (regressor). Is is writte as: Y i

More information

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6)

STAT 350 Handout 19 Sampling Distribution, Central Limit Theorem (6.6) STAT 350 Hadout 9 Samplig Distributio, Cetral Limit Theorem (6.6) A radom sample is a sequece of radom variables X, X 2,, X that are idepedet ad idetically distributed. o This property is ofte abbreviated

More information

Introduction There are two really interesting things to do in statistics.

Introduction There are two really interesting things to do in statistics. ECON 497 Lecture Notes E Page 1 of 1 Metropolita State Uiversity ECON 497: Research ad Forecastig Lecture Notes E: Samplig Distributios Itroductio There are two really iterestig thigs to do i statistics.

More information

Chapter 22: What is a Test of Significance?

Chapter 22: What is a Test of Significance? Chapter 22: What is a Test of Sigificace? Thought Questio Assume that the statemet If it s Saturday, the it s the weeked is true. followig statemets will also be true? Which of the If it s the weeked,

More information

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls

Economics 250 Assignment 1 Suggested Answers. 1. We have the following data set on the lengths (in minutes) of a sample of long-distance phone calls Ecoomics 250 Assigmet 1 Suggested Aswers 1. We have the followig data set o the legths (i miutes) of a sample of log-distace phoe calls 1 20 10 20 13 23 3 7 18 7 4 5 15 7 29 10 18 10 10 23 4 12 8 6 (1)

More information