Take-home Final. The questions you are expected to answer for this project are numbered and italicized. There is one bonus question. Good luck!
|
|
- Jennifer Sherman
- 5 years ago
- Views:
Transcription
1 Take-home Final The data for our final project come from a study initiated by the Tasmanian Aquaculture and Fisheries Institute to investigate the growth patterns of abalone living along the Tasmanian coastline. The harvest of abalone is subject to quotas that restrict both the number of abalone that can be caught as well as their size. The population of abalone can be regulated effectively if there is a simple way to tell the age of abalone based solely on their appearance. Hence, researchers are interested in relating abalone age to variables like length, height and weight of the animal (measurements that a diver can take before they harvest the animal). Determining the actual age of an abalone is a bit like estimating the age of a tree. Rings are formed in the shell of the abalone as it grows, usually at the rate of one ring per year. Getting access to the rings of an abalone involves cutting the shell. After polishing and staining, a lab technician examines a shell sample under a microscope and counts the rings. Because some rings are hard to make out using this method, these researchers believed adding 1.5 to the ring count is a reasonable estimate of the abalone s age. The relationship between age and ring count, however, is somewhat controversial. Under certain conditions, abalone can grow more than one ring per year. These conditions relate to weather patterns and other environmental variables. These effects suggest that any relationship we find between ring count and size measurements taken from the animal is likely to involve a lot of error; there are plenty of important variables that have been left out of the study. With these caveats, we will use data collected on 4,177 abalone by the team of researchers from Tasmania. I was not able to find an electronic version of their actual report on these data, but I am making a related report by the same organization available on the course Web site. The questions you are expected to answer for this project are numbered and italicized. There is one bonus question. Good luck!
2 1. Getting started We begin by downloading the dataset of abalone measurements. source(" cocteau/stat13/data/final.r") ls() You should see (possibly among other things) the dataset ab. dim(ab) The output of this command indicates that ab is a matrix with 4,177 rows and 9 columns. Each row refers to a different abalone caught off the coast of Tasmania. For each of the 4,177 abalone caught, the researchers recorded a number of measurements. We describe these measurements in detail in Table 1. The following command lists the values of the variables for the first three abalone in the dataset. ab[1:3,] Recall from Lab 2 that R stores its datasets as matrices. With this command, we are specifying just the first three rows of ab. You should see something like the following. rings length diameter height whole shucked viscera shell infant The first column in this output is just the row number of the specimen in our dataset; in this case, we are just looking at rows 1, 2 and 3. The number of rings for these three abalone ranged from 7 to 15. We can approximately determine the age of an abalone in this study by adding 1.5 to the number of rings. This means that our three specimens were 16.5, 8.5 and 10.5 years old, respectively. Looking at the output a bit more, we see that their lengths varied from 70mm to 106mm, and their whole weights ranged from 45.1g to 135.4g. Looking at the last column in the output, we see that these three specimens were also mature (that is, they were past their infant stage). The 4,177 abalone in this dataset all had between 1 and 29 rings. If we use the
3 Variable name Units Description rings count number of rings length mm longest shell measurement diameter mm measured perpendicular to the length height mm height of abalone with meat in the shell whole grams weight of the whole abalone shucked grams weight of just the meat viscera grams gut weight after bleeding shell grams weight of shell after drying infant 0/1 1 if the abalone is an infant, 0 otherwise Table 1: Descriptions for the variables in the dataset ab. The first variable (column 1 in ab) is our response, the quantity we d like to predict. It represents the number of rings in an abalone s shell. The second 8 variables (columns 2 through 9 in ab) are potential predictors; we want to form a model that will allow us to predict ring count from these 8 measurements. approximate dating rule, that means they are between 2.5 and 30.5 years old. Examine the data for the oldest and youngest abalone in the dataset with the following two commands. ab[ab$rings==1,] ab[ab$rings==29,] In the first command, the == relation is asking for all the data in ab for abalone having just one ring; in the second command we are asking for the data for those abalone with 29 rings. 1. Do the sets of measurements make sense? That is, what aspects of the data agree with the fact that one of these abalone is very young and the other is very old? 2. A first look at the data We begin by looking at simple summaries of the separate predictor variables. But first we need a graphics window.
4 quartz() In Lab 2 we introduced the commands boxplot, hist and summary. Apply these tools to the separate variables to get a sense of their distributions. Take for example, the variable length, the widest point of the abalone. summary(ab$length) hist(ab$length) 2. Describe the distribution of length. Is it skewed? Bell-shaped? Repeat these summaries for a few of the variables to get a feel for the data. Report on at least one variable other than length. Next, consider the four variables related to the weight of the abalones, whole, shucked, viscera, and shell. These are stored in columns 5 through 8 of ab. Look at a simple summary and a histogram for each variable. To compare the values for each weight measurement directly, we can look at a boxplot. boxplot(ab[,5:8]) Again, since this dataset is basically a matrix in R, the index 5:8 in the command above refers to just columns 5 through 8. What do you see in the plot? Do these relationships agree with what you would expect given their description in Table 1? To dig a bit deeper, we will make a simple scatterplot of shucked versus whole. (The second command adds a reference line with intercept 0 and slope 1.) plot(ab$shucked,ab$whole) abline(0,1) 3. What do you see from this distribution? Is there a trend? How does the spread vary as the shucked weight (the weight of just the meat of the abalone) increases? Since whole is meant to describe the weight of the entire animal, and shucked just one part of that weight, what can we say about the four points that are below our reference line? Do these make sense? Make the same plot for ab$viscera and ab$shucked (in this case, viscera is the weight of the meat after bleeding and should be less than shucked, the weight of the meat before bleeding). Do you see any questionable points in this plot? In R, we can make scatterplots between several pairs of variables at one time. For
5 example, we can consider all possible pairs of scatterplots from the 4 variables related to an abalone s weight with the following command. pairs(ab[,5:8]) Note that in this plot, each cell is a scatterplot between two variables. The graph in the second row and first column is a scatterplot of whole and shucked, while the graph in the fourth row and second column is a plot of shucked and shell. What do you see? Finally, we examine rings, our response variable. You can look at the five-number summary as well as a histogram of this variable. Because this variable represents the number of rings, it is discrete. We can see a tabulation of all the values with the following command. table(ab$rings) 4. What ring count is most prevalent in our dataset? What approximate age does this represent? 3. Correlation We now consider more formal descriptions of the relationships between two variables. In Section of your text, the authors introduce the correlation coefficient. This quantity measures the degree to which the data in a simple scatterplot lie on a straight line. See your text for details on how you would compute the correlation coefficient from a sample of data. In R, we can use the command cor. cor(ab$length,ab$height) What value do you get? Use the description in your text on page 541 together with the graphs in Figure to guess what the scatterplot of length and height might look like. Now, let s make this plot and see if your guess was correct plot(ab$length,ab$height) What do you notice? There seems to be a strong trend, but there also appears to be two outliers. Technically, outliers are points that don t seem to agree with the rest of the data in some way. Depending on their size, they can have a big impact
6 on some of our analyses. In this case, the two points have extremely large values of height. These points correspond to rows 1418 and We can remove them and create a new dataset ab2 with the following command. ab2 = ab[-c(1418,2052),] Again, we are indexing rows in this command, but the minus sign tells R that we want to leave these rows out. Hence ab2 has 4175 rows and 9 columns. dim(ab2) To see how things have changed, we can again compute the correlation coefficient and make a scatterplot of height and length, but this time using ab2. cor(ab2$length,ab2$height) plot(ab2$length,ab2$height) 5. What do you notice? What has happened to the correlation coefficient? Does this agree with what you see in the new scatterplot? Just like the command pairs that we used above, the command cor can also return a matrix of correlation coefficients between all the pairs in a dataset. Try the following two commands. cor(ab2[,1:3]) pairs(ab2[,1:3]) (Note that plots involving rings will have stripes because the ring count from abalone shells is a discrete variable. It has a large number of levels, but it s still discrete.) 6. Do the relationships in these plots agree with your intuition about the correlation coefficients? Explain using a pair with high correlation and one with relatively low correlation. The diagonal elements in cor(ab2[,1:3]) are all 1. Does that make sense? From this point on, we will work with ab2 rather than ab. If at some point you exit from R and don t save your work, you will have to retype the command creating ab2.
7 4. Modeling ring count At the end of the last section, we saw plots of rings against the first two predictor variables. There seemed to be a lot of spread in these data. As noted in the introduction, the investigators at the Marine Resources Division acknowledge that several environmental factors can influence the growth of rings in abalone. That means, there are variables relating to the condition of the water off the coast of Tasmania as well as the weather patterns for the last 30 years that might help explain ring counts. None of these data are available to us at this point, and hence we expect a certain amount of error in any model we build with our 8 predictor variables. To begin the modeling process, we will just look at a simple linear model with only one predictor. In mathematical terms we consider the following description of the data: rings = β 0 + β 1 length + U, (0.1) where U is a random error. Here, the error also accounts for all the variables that we can t measure (like weather patterns). We can compute least squares estimates for β 0 and β 1 in R with this command. fit = lm(rings 1+length,data=ab2) (The symbol between rings and 1 is a tilde; on my keyboard it is above the single quote, just next to the 1. ) The formula specifies the same relationship given in (0.1), where 1 represents the intercept. The summary table we studied in class is obtained with next command. summary(fit) 7. What are the least squares estimates ˆβ 0 and ˆβ 1? What are their standard errors? Give a 95% confidence interval for β 1 (since the number of samples here is large, use the approximation involving plus or minus two standard errors). In Section of the text, we see that for the simple linear model Y = β 0 + β 1 X + U, (0.2)
8 we can write down the least squares estimates explicitly. They are given by ˆβ 1 = r s Y s X and ˆβ0 = ȳ ˆβ 1 x (0.3) where ȳ and x are the sample means of the response and predictor variables, respectively; and s Y and s X are the sample standard deviations of the response and predictor variables, respectively. The sample correlation coefficient is denoted r. Taking rings as a response and length as the predictor variable, we can form the necessary ingredients in (0.3) with calls to mean(ab2$rings), mean(ab2$length), sd(ab2$rings), sd(ab2$length), and cor(ab2$length,ab2$rings). 8. Execute these commands and verify that the coefficients you get in summary(fit) agree with those in equation (0.3). Next, we will consider a simple linear model involving height as the predictor rather than length. fit2 = lm(rings 1+height,data=ab2) summary(fit2) What expression of the form (0.2) does this represent? Now, let s fit this model using the dataset that includes the outliers in height. Notice that we are now using data=ab and not data=ab2 as before. fit3 = lm(rings 1+height,data=ab) summary(fit3) 9. Do the least squares estimates change? By how much? Bonus. Create a new dataset ab3 by removing two points from ab2. Use a command like the one in the previous section that we used to create ab2 from ab. Instead of 1418 and 2052, pick any two other rows. Refit the linear model using the option data=ab3 and look at the summary. How much did the coefficients change from fit2? Compare that to the change you observed by removing the two outliers in height. Finally, we will consider a full model using all of our 8 predictor variables. bigfit = lm(rings 1+length+diameter+height+whole+shucked+ viscera+shell+infant,data=ab2)
9 summary(bigfit) 10. The P-values in the last column represent a series of hypothesis tests. What null hypothesis is being tested in each case? What is the alternative? Which would we reject at the 0.05 level? (That is, which have P-values less than 0.05?) 5. Evaluating the fit Continuing with the summary table from the full bigfit, we have established statistical significance of many of the variables. But what about the practical significance? That is, how much of an effect do these variables have on ring count? To study this, consider the variable whole. In Section 2 you probably observed that the median value of this variable is Three abalone have whole weights of and we can see them as follows. test = ab2[ab2$whole == 159.9,] test With the data in each row of test, we can use our fitted model to make a prediction about these abalone s ages. 11. Using the coefficients in the summary table for bigfit and the values in the first row of test, compute the prediction from our model. Compare it to the true value for rings for this abalone. In R we can compute the predictions using the following command. predict(bigfit,newdata=test) Make sure the first value matches what you computed above. Now, let s change the value of whole for these abalone by 50g and see how our predictions change. test$whole = test$whole + 50 predict(bigfit,newdata=test) 12. By how much have our predictions changed? Interpret this in terms of the coefficients in the summary table for bigfit. Comment on the change you might expect by varying a predictor other than whole weight.
10 Finally, we consider an elaboration of the linear model represented by bigfit. Notice that the coefficient of length is not significant in bigfit. This means the data do not contain strong evidence to suggest that our linear model depends on length, or, rather, that the coefficient on length is not zero. We can explore this fact in more detail by making a scatterplot of the variable length and the residuals from the fit. plot(ab2$length,residuals(bigfit)) Here, the command residuals(bigfit) constructs the residuals from our fit involving all the variables. Give yourself a fairly big graphics window to see the relationship between the variables. What do you notice? To give you a hint, let us try a slightly larger model. This time, we will use a quadratic in length. In R, this is done by using a call to the function poly(length,2) where poly specifies that we want a polynomial in length and the degree of that polynomial is 2, or a quadratic. bigfit2 = lm(rings 1+poly(length,2)+diameter+height+whole+shucked+ viscera+shell+infant,data=ab2) summary(bigfit2) In the summary you will now see two entries involving length. The first row is labeled poly(length,2)1 and is just length, while the second poly(length,2)2 involves length 2. 1 To see how we have changed the fit, look at the plot of length against the residuals of this new fit, and compare it to the same plot for the previous fit. plot(ab2$length,residuals(bigfit2)) 13. What has changed? Admittedly, the effects here are somewhat subtle. There is still a lot of error in the data that we have not accounted for with this simple model. If we were to continue with this exercise, we would be led to transform the response variable rings and consider log(rings + 1.5) instead. This reduces some of the deficiencies in the final fit. Even after all of this work, however, the relationship is still noisy. As noted 1 Actually, a bit more is being done to transform length, but all you need to know is that the first term is basically a straight line, while the second is a bowl-shaped quadratic.
11 earlier, in this case, what we are calling noise can probably be explained by variables we do not have at our disposal, like weather patterns and ocean conditions. The point of this project was not to come up with the best explanation of ring growth, but rather to give you some experience working with linear models. Have a good summer!
Business Statistics. Lecture 9: Simple Regression
Business Statistics Lecture 9: Simple Regression 1 On to Model Building! Up to now, class was about descriptive and inferential statistics Numerical and graphical summaries of data Confidence intervals
More information1 Least Squares Estimation - multiple regression.
Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationMICROPIPETTE CALIBRATIONS
Physics 433/833, 214 MICROPIPETTE CALIBRATIONS I. ABSTRACT The micropipette set is a basic tool in a molecular biology-related lab. It is very important to ensure that the micropipettes are properly calibrated,
More informationStat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS
Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i
More informationINFERENCE FOR REGRESSION
CHAPTER 3 INFERENCE FOR REGRESSION OVERVIEW In Chapter 5 of the textbook, we first encountered regression. The assumptions that describe the regression model we use in this chapter are the following. We
More informationAMS 7 Correlation and Regression Lecture 8
AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation
More informationUNIT 12 ~ More About Regression
***SECTION 15.1*** The Regression Model When a scatterplot shows a relationship between a variable x and a y, we can use the fitted to the data to predict y for a given value of x. Now we want to do tests
More informationWarm-up Using the given data Create a scatterplot Find the regression line
Time at the lunch table Caloric intake 21.4 472 30.8 498 37.7 335 32.8 423 39.5 437 22.8 508 34.1 431 33.9 479 43.8 454 42.4 450 43.1 410 29.2 504 31.3 437 28.6 489 32.9 436 30.6 480 35.1 439 33.0 444
More informationBusiness Statistics. Lecture 10: Course Review
Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,
More informationRegression, part II. I. What does it all mean? A) Notice that so far all we ve done is math.
Regression, part II I. What does it all mean? A) Notice that so far all we ve done is math. 1) One can calculate the Least Squares Regression Line for anything, regardless of any assumptions. 2) But, if
More informationSTAT 350: Summer Semester Midterm 1: Solutions
Name: Student Number: STAT 350: Summer Semester 2008 Midterm 1: Solutions 9 June 2008 Instructor: Richard Lockhart Instructions: This is an open book test. You may use notes, text, other books and a calculator.
More informationappstats27.notebook April 06, 2017
Chapter 27 Objective Students will conduct inference on regression and analyze data to write a conclusion. Inferences for Regression An Example: Body Fat and Waist Size pg 634 Our chapter example revolves
More informationChapter 27 Summary Inferences for Regression
Chapter 7 Summary Inferences for Regression What have we learned? We have now applied inference to regression models. Like in all inference situations, there are conditions that we must check. We can test
More informationDensity Temp vs Ratio. temp
Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,
More informationMultiple Regression Theory 2006 Samuel L. Baker
MULTIPLE REGRESSION THEORY 1 Multiple Regression Theory 2006 Samuel L. Baker Multiple regression is regression with two or more independent variables on the right-hand side of the equation. Use multiple
More informationMATH 1150 Chapter 2 Notation and Terminology
MATH 1150 Chapter 2 Notation and Terminology Categorical Data The following is a dataset for 30 randomly selected adults in the U.S., showing the values of two categorical variables: whether or not the
More informationMatrices and vectors A matrix is a rectangular array of numbers. Here s an example: A =
Matrices and vectors A matrix is a rectangular array of numbers Here s an example: 23 14 17 A = 225 0 2 This matrix has dimensions 2 3 The number of rows is first, then the number of columns We can write
More informationComputer projects for Mathematical Statistics, MA 486. Some practical hints for doing computer projects with MATLAB:
Computer projects for Mathematical Statistics, MA 486. Some practical hints for doing computer projects with MATLAB: You can save your project to a text file (on a floppy disk or CD or on your web page),
More informationMath Precalculus I University of Hawai i at Mānoa Spring
Math 135 - Precalculus I University of Hawai i at Mānoa Spring - 2013 Created for Math 135, Spring 2008 by Lukasz Grabarek and Michael Joyce Send comments and corrections to lukasz@math.hawaii.edu Contents
More informationUncertainty, Error, and Precision in Quantitative Measurements an Introduction 4.4 cm Experimental error
Uncertainty, Error, and Precision in Quantitative Measurements an Introduction Much of the work in any chemistry laboratory involves the measurement of numerical quantities. A quantitative measurement
More informationChapter 3. Measuring data
Chapter 3 Measuring data 1 Measuring data versus presenting data We present data to help us draw meaning from it But pictures of data are subjective They re also not susceptible to rigorous inference Measuring
More information1 Introduction to Minitab
1 Introduction to Minitab Minitab is a statistical analysis software package. The software is freely available to all students and is downloadable through the Technology Tab at my.calpoly.edu. When you
More informationProbability Distributions
CONDENSED LESSON 13.1 Probability Distributions In this lesson, you Sketch the graph of the probability distribution for a continuous random variable Find probabilities by finding or approximating areas
More informationLAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION
LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION In this lab you will learn how to use Excel to display the relationship between two quantitative variables, measure the strength and direction of the
More informationRegression Analysis in R
Regression Analysis in R 1 Purpose The purpose of this activity is to provide you with an understanding of regression analysis and to both develop and apply that knowledge to the use of the R statistical
More informationy = a + bx 12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation Review: Interpreting Computer Regression Output
12.1: Inference for Linear Regression Review: General Form of Linear Regression Equation y = a + bx y = dependent variable a = intercept b = slope x = independent variable Section 12.1 Inference for Linear
More informationRegression. Marc H. Mehlman University of New Haven
Regression Marc H. Mehlman marcmehlman@yahoo.com University of New Haven the statistician knows that in nature there never was a normal distribution, there never was a straight line, yet with normal and
More informationQuadratic Equations Part I
Quadratic Equations Part I Before proceeding with this section we should note that the topic of solving quadratic equations will be covered in two sections. This is done for the benefit of those viewing
More informationChapter 8. Linear Regression. Copyright 2010 Pearson Education, Inc.
Chapter 8 Linear Regression Copyright 2010 Pearson Education, Inc. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu: Copyright
More informationRegression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur
Regression Analysis and Forecasting Prof. Shalabh Department of Mathematics and Statistics Indian Institute of Technology-Kanpur Lecture 10 Software Implementation in Simple Linear Regression Model using
More informationInferences for Regression
Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In
More informationSection 3: Simple Linear Regression
Section 3: Simple Linear Regression Carlos M. Carvalho The University of Texas at Austin McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction
More informationWISE Regression/Correlation Interactive Lab. Introduction to the WISE Correlation/Regression Applet
WISE Regression/Correlation Interactive Lab Introduction to the WISE Correlation/Regression Applet This tutorial focuses on the logic of regression analysis with special attention given to variance components.
More informationK. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =
K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing
More informationMath Precalculus I University of Hawai i at Mānoa Spring
Math 135 - Precalculus I University of Hawai i at Mānoa Spring - 2014 Created for Math 135, Spring 2008 by Lukasz Grabarek and Michael Joyce Send comments and corrections to lukasz@math.hawaii.edu Contents
More informationSampling Distributions in Regression. Mini-Review: Inference for a Mean. For data (x 1, y 1 ),, (x n, y n ) generated with the SRM,
Department of Statistics The Wharton School University of Pennsylvania Statistics 61 Fall 3 Module 3 Inference about the SRM Mini-Review: Inference for a Mean An ideal setup for inference about a mean
More information22s:152 Applied Linear Regression. Chapter 6: Statistical Inference for Regression
22s:152 Applied Linear Regression Chapter 6: Statistical Inference for Regression Simple Linear Regression Assumptions for inference Key assumptions: linear relationship (between Y and x) *we say the relationship
More informationGRE Quantitative Reasoning Practice Questions
GRE Quantitative Reasoning Practice Questions y O x 7. The figure above shows the graph of the function f in the xy-plane. What is the value of f (f( ))? A B C 0 D E Explanation Note that to find f (f(
More informationSection 29: What s an Inverse?
Section 29: What s an Inverse? Our investigations in the last section showed that all of the matrix operations had an identity element. The identity element for addition is, for obvious reasons, called
More informationMAT Mathematics in Today's World
MAT 1000 Mathematics in Today's World Last Time 1. Three keys to summarize a collection of data: shape, center, spread. 2. Can measure spread with the fivenumber summary. 3. The five-number summary can
More informationResults and Analysis 10/4/2012. EE145L Lab 1, Linear Regression
EE145L Lab 1, Linear Regression 10/4/2012 Abstract We examined multiple sets of data to assess the relationship between the variables, linear or non-linear, in addition to studying ways of transforming
More informationData Analysis and Statistical Methods Statistics 651
Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 31 (MWF) Review of test for independence and starting with linear regression Suhasini Subba
More information401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.
401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis
More informationRegression, Part I. - In correlation, it would be irrelevant if we changed the axes on our graph.
Regression, Part I I. Difference from correlation. II. Basic idea: A) Correlation describes the relationship between two variables, where neither is independent or a predictor. - In correlation, it would
More informationThe scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationChapter 9: Roots and Irrational Numbers
Chapter 9: Roots and Irrational Numbers Index: A: Square Roots B: Irrational Numbers C: Square Root Functions & Shifting D: Finding Zeros by Completing the Square E: The Quadratic Formula F: Quadratic
More informationThis is particularly true if you see long tails in your data. What are you testing? That the two distributions are the same!
Two sample tests (part II): What to do if your data are not distributed normally: Option 1: if your sample size is large enough, don't worry - go ahead and use a t-test (the CLT will take care of non-normal
More informationExperiment 1: The Same or Not The Same?
Experiment 1: The Same or Not The Same? Learning Goals After you finish this lab, you will be able to: 1. Use Logger Pro to collect data and calculate statistics (mean and standard deviation). 2. Explain
More informationLearning Packet. Lesson 5b Solving Quadratic Equations THIS BOX FOR INSTRUCTOR GRADING USE ONLY
Learning Packet Student Name Due Date Class Time/Day Submission Date THIS BOX FOR INSTRUCTOR GRADING USE ONLY Mini-Lesson is complete and information presented is as found on media links (0 5 pts) Comments:
More informationLinear Regression. Linear Regression. Linear Regression. Did You Mean Association Or Correlation?
Did You Mean Association Or Correlation? AP Statistics Chapter 8 Be careful not to use the word correlation when you really mean association. Often times people will incorrectly use the word correlation
More informationReview of Statistics 101
Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods
More informationSeismicity of Northern California
Homework 4 Seismicity of Northern California Purpose: Learn how we make predictions about seismicity from the relationship between the number of earthquakes that occur and their magnitudes. We know that
More informationLinear Regression and Correlation. February 11, 2009
Linear Regression and Correlation February 11, 2009 The Big Ideas To understand a set of data, start with a graph or graphs. The Big Ideas To understand a set of data, start with a graph or graphs. If
More informationLab 3 A Quick Introduction to Multiple Linear Regression Psychology The Multiple Linear Regression Model
Lab 3 A Quick Introduction to Multiple Linear Regression Psychology 310 Instructions.Work through the lab, saving the output as you go. You will be submitting your assignment as an R Markdown document.
More information1 A Review of Correlation and Regression
1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then
More information1 The Classic Bivariate Least Squares Model
Review of Bivariate Linear Regression Contents 1 The Classic Bivariate Least Squares Model 1 1.1 The Setup............................... 1 1.2 An Example Predicting Kids IQ................. 1 2 Evaluating
More informationLab 2 Worksheet. Problems. Problem 1: Geometry and Linear Equations
Lab 2 Worksheet Problems Problem : Geometry and Linear Equations Linear algebra is, first and foremost, the study of systems of linear equations. You are going to encounter linear systems frequently in
More informationIntroduction to Algebra: The First Week
Introduction to Algebra: The First Week Background: According to the thermostat on the wall, the temperature in the classroom right now is 72 degrees Fahrenheit. I want to write to my friend in Europe,
More informationInference with Simple Regression
1 Introduction Inference with Simple Regression Alan B. Gelder 06E:071, The University of Iowa 1 Moving to infinite means: In this course we have seen one-mean problems, twomean problems, and problems
More informationBusiness Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee
Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)
More informationSTEP Support Programme. Hints and Partial Solutions for Assignment 17
STEP Support Programme Hints and Partial Solutions for Assignment 7 Warm-up You need to be quite careful with these proofs to ensure that you are not assuming something that should not be assumed. For
More informationData Analysis and Statistical Methods Statistics 651
y 1 2 3 4 5 6 7 x Data Analysis and Statistical Methods Statistics 651 http://www.stat.tamu.edu/~suhasini/teaching.html Lecture 32 Suhasini Subba Rao Previous lecture We are interested in whether a dependent
More informationLet the x-axis have the following intervals:
1 & 2. For the following sets of data calculate the mean and standard deviation. Then graph the data as a frequency histogram on the corresponding set of axes. Set 1: Length of bass caught in Conesus Lake
More information[Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty.]
Math 43 Review Notes [Disclaimer: This is not a complete list of everything you need to know, just some of the topics that gave people difficulty Dot Product If v (v, v, v 3 and w (w, w, w 3, then the
More informationBivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data.
Bivariate Data: Graphical Display The scatterplot is the basic tool for graphically displaying bivariate quantitative data. Example: Some investors think that the performance of the stock market in January
More informationappstats8.notebook October 11, 2016
Chapter 8 Linear Regression Objective: Students will construct and analyze a linear model for a given set of data. Fat Versus Protein: An Example pg 168 The following is a scatterplot of total fat versus
More informationStat 101 L: Laboratory 5
Stat 101 L: Laboratory 5 The first activity revisits the labeling of Fun Size bags of M&Ms by looking distributions of Total Weight of Fun Size bags and regular size bags (which have a label weight) of
More informationUsing Microsoft Excel
Using Microsoft Excel Objective: Students will gain familiarity with using Excel to record data, display data properly, use built-in formulae to do calculations, and plot and fit data with linear functions.
More informationAnalyzing the NYC Subway Dataset
PROJECT REPORT Analyzing the NYC Subway Dataset Short Questions Overview This project consists of two parts. In Part 1 of the project, you should have completed the questions in Problem Sets 2, 3, 4, and
More informationChapter 8. Linear Regression. The Linear Model. Fat Versus Protein: An Example. The Linear Model (cont.) Residuals
Chapter 8 Linear Regression Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8-1 Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Fat Versus
More informationBIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression
BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested
More informationExtreme Values and Positive/ Negative Definite Matrix Conditions
Extreme Values and Positive/ Negative Definite Matrix Conditions James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University November 8, 016 Outline 1
More informationRegression Analysis IV... More MLR and Model Building
Regression Analysis IV... More MLR and Model Building This session finishes up presenting the formal methods of inference based on the MLR model and then begins discussion of "model building" (use of regression
More informationSTAT 3022 Spring 2007
Simple Linear Regression Example These commands reproduce what we did in class. You should enter these in R and see what they do. Start by typing > set.seed(42) to reset the random number generator so
More informationAP Statistics. The only statistics you can trust are those you falsified yourself. RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9
AP Statistics 1 RE- E X P R E S S I N G D A T A ( P A R T 2 ) C H A P 9 The only statistics you can trust are those you falsified yourself. Sir Winston Churchill (1874-1965) (Attribution to Churchill is
More informationSTAT 350. Assignment 4
STAT 350 Assignment 4 1. For the Mileage data in assignment 3 conduct a residual analysis and report your findings. I used the full model for this since my answers to assignment 3 suggested we needed the
More informationregression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist
regression analysis is a type of inferential statistics which tells us whether relationships between two or more variables exist sales $ (y - dependent variable) advertising $ (x - independent variable)
More informationL21: Chapter 12: Linear regression
L21: Chapter 12: Linear regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 37 So far... 12.1 Introduction One sample
More informationMath Lab 10: Differential Equations and Direction Fields Complete before class Wed. Feb. 28; Due noon Thu. Mar. 1 in class
Matter & Motion Winter 2017 18 Name: Math Lab 10: Differential Equations and Direction Fields Complete before class Wed. Feb. 28; Due noon Thu. Mar. 1 in class Goals: 1. Gain exposure to terminology and
More informationModern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur
Modern Algebra Prof. Manindra Agrawal Department of Computer Science and Engineering Indian Institute of Technology, Kanpur Lecture 02 Groups: Subgroups and homomorphism (Refer Slide Time: 00:13) We looked
More informationThe SuperBall Lab. Objective. Instructions
1 The SuperBall Lab Objective This goal of this tutorial lab is to introduce data analysis techniques by examining energy loss in super ball collisions. Instructions This laboratory does not have to be
More informationSTA 114: Statistics. Notes 21. Linear Regression
STA 114: Statistics Notes 1. Linear Regression Introduction A most widely used statistical analysis is regression, where one tries to explain a response variable Y by an explanatory variable X based on
More informationAt the start of the term, we saw the following formula for computing the sum of the first n integers:
Chapter 11 Induction This chapter covers mathematical induction. 11.1 Introduction to induction At the start of the term, we saw the following formula for computing the sum of the first n integers: Claim
More informationDescribing Distributions with Numbers
Describing Distributions with Numbers Using graphs, we could determine the center, spread, and shape of the distribution of a quantitative variable. We can also use numbers (called summary statistics)
More informationSTATISTICS 110/201 PRACTICE FINAL EXAM
STATISTICS 110/201 PRACTICE FINAL EXAM Questions 1 to 5: There is a downloadable Stata package that produces sequential sums of squares for regression. In other words, the SS is built up as each variable
More informationExperiment 1: Linear Regression
Experiment 1: Linear Regression August 27, 2018 1 Description This first exercise will give you practice with linear regression. These exercises have been extensively tested with Matlab, but they should
More informationEXPERIMENT 2 Reaction Time Objectives Theory
EXPERIMENT Reaction Time Objectives to make a series of measurements of your reaction time to make a histogram, or distribution curve, of your measured reaction times to calculate the "average" or mean
More informationLesson 26: Characterization of Parallel Lines
Student Outcomes Students know that when a system of linear equations has no solution, i.e., no point of intersection of the lines, then the lines are parallel. Lesson Notes The discussion that begins
More informationNote that we are looking at the true mean, μ, not y. The problem for us is that we need to find the endpoints of our interval (a, b).
Confidence Intervals 1) What are confidence intervals? Simply, an interval for which we have a certain confidence. For example, we are 90% certain that an interval contains the true value of something
More informationChapter 3 - Linear Regression
Chapter 3 - Linear Regression Lab Solution 1 Problem 9 First we will read the Auto" data. Note that most datasets referred to in the text are in the R package the authors developed. So we just need to
More information1 Newton s 2nd and 3rd Laws
Physics 13 - Winter 2007 Lab 2 Instructions 1 Newton s 2nd and 3rd Laws 1. Work through the tutorial called Newton s Second and Third Laws on pages 31-34 in the UW Tutorials in Introductory Physics workbook.
More informationSimple Linear Regression
Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)
More informationSimple linear regression
Simple linear regression Biometry 755 Spring 2008 Simple linear regression p. 1/40 Overview of regression analysis Evaluate relationship between one or more independent variables (X 1,...,X k ) and a single
More informationSurvey on Population Mean
MATH 203 Survey on Population Mean Dr. Neal, Spring 2009 The first part of this project is on the analysis of a population mean. You will obtain data on a specific measurement X by performing a random
More informationDescribing distributions with numbers
Describing distributions with numbers A large number or numerical methods are available for describing quantitative data sets. Most of these methods measure one of two data characteristics: The central
More information3 Non-linearities and Dummy Variables
3 Non-linearities and Dummy Variables Reading: Kennedy (1998) A Guide to Econometrics, Chapters 3, 5 and 6 Aim: The aim of this section is to introduce students to ways of dealing with non-linearities
More informationMath 148. Polynomial Graphs
Math 148 Lab 1 Polynomial Graphs Due: Monday Wednesday, April April 10 5 Directions: Work out each problem on a separate sheet of paper, and write your answers on the answer sheet provided. Submit the
More informationScatter plot of data from the study. Linear Regression
1 2 Linear Regression Scatter plot of data from the study. Consider a study to relate birthweight to the estriol level of pregnant women. The data is below. i Weight (g / 100) i Weight (g / 100) 1 7 25
More informationLesson 5b Solving Quadratic Equations
Lesson 5b Solving Quadratic Equations In this lesson, we will continue our work with Quadratics in this lesson and will learn several methods for solving quadratic equations. The first section will introduce
More information