Forestry 430 Advanced Biometrics and FRST 533 Problems in Statistical Methods Course Materials 2010

Size: px
Start display at page:

Download "Forestry 430 Advanced Biometrics and FRST 533 Problems in Statistical Methods Course Materials 2010"

Transcription

1 Forestr 430 Advanced Biometrics and FRST 533 Problems in Statistical Methods Course Materials 00 Instructor: Dr. Valerie LeMa, Forest Sciences 039, , Course Objectives and Overview: The objectives of this course are:. To be able to use simple linear and multiple linear regression to fit models using sample data;. To be able to design and analze lab and field experiments; 3. To be able to interpret results of model fitting and experimental analsis; and 4. To be aware of other analsis methods not explicitl covered in this course. In order to meet these objectives, background theor and examples will be used. A statistical package called SAS will be used in examples, and used to help in analzing data in exercises. Texts are also important, both to increase understanding while taking the course, and as a reference for future applied and research work. Course Content Materials: These cover most of the course materials. However, changes will be made from ear to ear, including additional examples. An additional course materials will be given as in-class handouts. NOTE: Items given in Italics are onl described briefl in this course. These course materials will be presented in class and are essential for the courses. These materials are not published and should not be used as citations for papers. Recommendations for some published reference materials, including the textbook for the course, will be listed in the course outline handed out in class. I. Short Review of Probabilit and Statistics (pp. 9-37) Descriptive statistics Inferential statistics using known probabilit distributions: normal, t, F, Chi-square, binomial, Poisson II. Fitting Equations (pp ) Dependent variable and predictor variables Purpose: Prediction and examination General examples Simple linear, multiple linear, and nonlinear regression Objectives in fitting: Least squared error or Maximum likelihood Simple Linear Regression (SLR) (pp. 4-96) Definition, notation, and example uses dependent variable () and predictor variable (x) intercept, and slope, and error Least squares solution to finding an estimated intercept and slope Derivation Normal equations Examples Assumptions of simple linear regression and properties when assumptions are met Residual plots to visuall check the assumptions that: o. Relationship is linear MOST IMPORTANT!! o. Equal variance of around x (equal spread of errors around the line) o 3. Observations are independent (not correlated in space nor time) Normalit plots to check assumption that: o 4. Normal distribution of around x (normal distribution of errors around the line) Sampling and measurement assumptions: o 5. x values are fixed o 6. random sampling of occurs for ever x

2 Transformations and other measures to meet assumptions Common Transformations for nonlinear trends, unequal variances, percents, rank transformation Outliers: unusual observations Other methods: nonlinear least squares, weighted least squares, general least squares, general linear models Measures of goodness-of-fit Graphs Coefficient of determination (r ) [and Fit Index, I ] Standard error of the estimate (SE E) [and SE E ] Estimated variances, confidence intervals and hpothesis tests For the equation For the intercept and slope For the mean of the dependent variable given a value for x For a single or group of values of the predicted dependent variable given a value for x Selecting among alternative models Process to fit an equation using least squares regression Meeting assumptions Measures of goodness-of-fit: Graphs, Coefficient of determination (r ) or I, and Standard error of the estimate (SE E) or SE E Significance of the regression Biological or logical basis and cost Multiple Linear Regression (pp ) Definition, notation, and example uses dependent variable () and predictor variables (x s) intercept, and slopes and error Least squares solution to finding an estimated intercept and slopes Least Squares and comparison to Maximum Likelihood Estimation Derivation Linear algebra to obtain normal equations; matrix algebra Examples: Calculations and SAS outputs Assumptions of multiple linear regression Residual plots to visuall check the assumptions that: o. Relationship is linear ( with ALL x s, not each x, necessaril); MOST IMPORTANT!! o. Equal variance of around x s (equal spread of errors around the surface ) o 3. Observations are independent (not correlated in space nor time) Normalit plots to check assumption that: o 4. Normal distribution of around x s (normal distribution of errors around the surface ) Sampling and measurement assumptions: o 5. x values are fixed o 6. random sampling of occurs for ever combination of x values Properties when all assumptions are met versus some are not met Transformations and other measures to meet assumptions: same as for SLR, but more difficult to select correct transformations Measures of goodness-of-fit Graphs Coefficient of multiple determination (R ) [and Fit Index, I ] Standard error of the estimate (SE E) [and SE E ] Estimated variances, confidence intervals and hpothesis tests: Calculations and SAS outputs For the regression surface For the intercept and slopes For the mean of the dependent variable given a particular value for each of the x variables For a single or group of values of the predicted dependent variable given a particular value for each of the x variables Adding class variables as predictors Dumm variables to represent a class variable Interactions to change slopes for different classes Comparing two regressions for different class levels More than one class variable (class variables as the dependent variable covered in FRST 530; under generalized linear model). 3 4

3 Methods to aid in selecting predictor (x) variables All possible regressions R criterion in SAS Stepwise methods Selecting and comparing alternative models Meeting assumptions Parsimon and cost Biological nature of the sstem modeled Measures of goodness-of-fit: Graphs, Coefficient of determination (R ) [or Fit Index, I ], and Standard error of the estimate (SE E) [or SE E ] Comparing models when some models have a transformed dependent variable Other methods using maximum likelihood criteria II. Experimental Design and Analsis (pp. 74-9) Sampling versus experiments Definitions of terms: experimental unit, response variable, factors, treatments, replications, crossed factors, randomization, sum of squares, degrees of freedom, confounding Variations in designs: number of factors, fixed versus random effects, blocking, split-plot, nested factors, subsampling, covariates Designs in use Main questions in experiments Completel Randomized Design (CRD) (pp ) Definition: no blocking and no splitting of experimental units One Factor Experiment, Fixed Effects (pp ) Main questions of interest Notation and example: observed response, overall (grand mean), treatment effect, treatment means Data organization and preliminar calculations: means and sums of squares Test for differences among treatment means: error variance, treatment effect, mean squares, F-test Assumptions regarding the error term: independence, equal variance, normalit, expected values under the assumptions Differences among particular treatment means Confidence intervals for treatment means Power of the test Transformations if assumptions are not met SAS code Two Factor Experiment, Fixed Effects (pp ) Introduction: Separating treatment effects into factor, factor and interaction between these Example laout Notation, means and sums of squares calculations Assumptions, and transformations Test for interactions and main effects: ANOVA table, expected mean squares, hpotheses and tests, interpretation Differences among particular treatment means Confidence intervals for treatment means SAS analsis for example One Factor Experiment, Random Effects Definition and example Notation and assumptions Least squares versus maximum likelihood solution Two Factor Experiment, One Fixed and One Random Effect (pp ) Introduction Example laout Notation, means and sums of squares calculations Assumptions, and transformations Test for interactions and main effects: ANOVA table, expected mean squares, hpotheses and tests, interpretation SAS code Orthogonal polnomials not covered 5 6

4 Restrictions on Randomization (pp ) Randomized Block Design (RCB) with one fixed factor (pp ) Introduction, example laout, data organization, and main questions Notation, means and sums of squares calculations Assumptions, and transformations Differences among treatments: ANOVA table, expected mean squares, hpotheses and tests, interpretation Differences among particular treatment means Confidence intervals for treatment means SAS code Randomized Block Design with other experiments (pp ) RCB with replicates in each block Two fixed factors One fixed, one random factor Incomplete Block Design Definition Examples Latin Square Design: restrictions in two directions (pp ) Definition and examples Notation and assumptions Expected mean squares Hpotheses and confidence intervals for main questions if assumptions are met Split Plot and Split-Split Plot Design (pp ) Definition and examples Notation and assumptions Expected mean squares Hpotheses and confidence intervals for main questions if assumptions are met Nested and hierarchical designs (pp ) CRD: Two Factor Experiment, Both Fixed Effects, with Second Factor Nested in the First Factor (pp ) Introduction using an example Notation Analsis methods: averages, least squares, maximum likelihood Data organization and preliminar calculations: means and sums of squares Example using SAS CRD: One Factor Experiment, Fixed Effects, with sub-sampling (pp ) Introduction using an example Notation Analsis methods: averages, least squares, maximum likelihood Data organization and preliminar calculations: means and sums of squares Example using SAS RCB: One Factor Experiment, Fixed Effects, with sub-sampling (pp ) Introduction using an example Example using SAS Adding Covariates (continuous variables) (pp ) Analsis of covariance Definition and examples Notation and assumptions Expected mean squares Hpotheses and confidence intervals for main questions if assumptions are met Allowing for Inequalit of slopes Expected Mean Squares Method to Calculate These (pp ) Method and examples Power Analsis (pp ) Concept and an example Use of Linear Mixed Models for Experimental Design (pp ) Concept and examples Summar (pp ) 7 8

5 Probabilit and Statistics Review Population vs. sample: N number of observations in the population N number of observations in the sample Experimental vs. observational studies: In experiments, we manipulate the results whereas in observational studies we simple measure what is alread there. Therefore, in experiments, we tr to assign cause and effect. Variable of interest/ dependent variable/ response variable/ outcome: Auxilliar variables/ explanator variables/ predictor variables/ independent variables/ covariates: x Observations: Measure s and x s for a census (all N) or on a sample (n out of the N) x and can be: ) continuous (ratio or interval scale); or ) discrete (nominal or ordinal scale) Descriptive Statistics: summarize the sample data as means, variances, ranges, etc. Inferential Statistics: use the sample statistics to estimate the parameters of the population 9 0

6 Parameters for populations:. Mean -- μ e.g. for N4 and 5; 6; 3 7, 4 6 μ6. Range: Maximum value minimum value 3. Standard Deviation and Variance N ( i i μ) N 4. Covariance between x and : x 5. Correlation (Pearson s) between two variables, and x: ρ ρ x x x Ranges from - to +; with strong negative correlations near to - and strong positive correlations near to Distribution for -- frequenc of each value of or x (ma be divided into classes) 7. Probabilit Distribution of or x probabilit associated with each value x N ( i μ )( xi μx) i N 8. Mode -- most common value of or x 9. Median -- -value or x-value which divides the distribution (50% of N observations are above and 50% are below)

7 Example: 50 aspen trees of Alberta Descriptive Statistics: age N50 trees Mean 7 ears Median 73 ears 5% percentile 55 75% percentile 8 Minimum 4 Maximum 60 Variance 54.7 Standard Deviation.69. Compare mean versus median. Normal distribution? Pearson correlation of age and dbh for the population of N50 trees 3 4

8 Statistics from the Sample:. Mean -- e.g. for n3 and 5; 6; 3 7, 6. Range: Maximum value minimum value 3. Standard Deviation s and Variance s s s n ( i s i ) ( n ) 4. Standard Deviation of the sample means (also called the Standard Error, short for Standard Error of the Mean) and it s square called the variance of the sample means are estimated b: s s n and s s n 5. Coefficient of variation (CV): The standard deviation from the sample, divided b the sample mean. Ma be multiplied b 00 to get CV in percent. 6. Covariance between x and : s x s x n ( i i )( x i x) ( n ) 7. Correlation (Pearson s) between two variables, and x: r r x s s x x s Ranges from - to +; with strong negative correlations near to - and strong positive correlations near to Distribution for -- frequenc of each value of or x (ma be divided into classes) 5 6

9 9. Estimated Probabilit Distribution of or x probabilit associated with each value based on the n observations Example: n50 0. Mode -- most common value of or x. Median -- -value or x-value which divides the estimated probabilit distribution (50% of N observations are above and 50% are below) 7 8

10 n50 trees Mean 69 ears Median 68 ears 5% percentile 48 75% percentile 8 Minimum 4 Maximum 60 Variance Standard Deviation 5.69 ears Standard error of the mean. ears Good estimate of population values? Pearson correlation of age and dbh 0.66 with a p-value of for the sample of n50 trees from a population of 50 trees Null and alternative hpothesis for the p-value? What is a p-value? Sample Statistics to Estimate Population Parameters: If simple random sampling (ever observation has the same chance of being selected) is used to select n from N, then: Sample estimates are unbiased estimates of their counterparts (e.g., sample mean estimates the population mean), meaning that over all possible samples the sample statistics, averaged, would equal the population statistic. A particular sample value (e.g., sample mean) is called a point estimate -- do not necessaril equal the population parameter for a given sample. Can calculate an interval where the true population parameter is likel to be, with a certain probabilit. This is a Confidence Interval, and can be obtained for an population parameter, IF the distribution of the sample statistic is known. 9 0

11 Common continuous distributions: Normal: Smmetric distribution around μ Defined b μ and. If we know that a variable has a normal distribution, and we know these parameters, then we know the probabilit of getting an particular value for the variable. Probabilit tables are for μ0 and, and are often called z-tables. Examples: P(-<z<+) 0.68; P(-.96<z<.96)0.95. Notation example: For α0.05, z α z z-scores: scale the values for b subtracting the mean, and dividing b the standard deviation. z i i μ E.g., for mean0, and standard deviation of and 0, z-5.0 (an extreme value)

12 t-distribution: Smmetric distribution Table values have the center at 0. The spread varies with the degrees of freedom. As the sample size increases, the df increases, and the spread decreases, and will approach the normal distribution. Used for a normall distributed variable whenever the variance of that variable is not known. Notation examples: t n, α where n- is the degrees of freedom, in this case, and we are looking for the α percentile. For example, for n5 and α0.05, we are looking for t with 4 degrees of freedom and the percentile (will be a value around ). Χ distribution: Starts at zero, and is not smmetric Is the square of a normall distributed variable e.g. sample variances have a Χ distribution if the variable is normall distributed Need the degrees of freedom and the percentile as with the t-distribution 3 4

13 F-distribution: Is the ratio of variables that each have a Χ distribution eg. The ratio of sample variances for variables that are each normall distributed. Need the percentile, and two degrees of freedom (one for the numerator and one for the denominator) Central Limit Theorem: As n increases, the distribution of sample means will approach a normal distribution, even if the distribution is something else (e.g. could be non-smmetric) Tables in the Textbook: Some tables give the values for probabilit distribution for the degrees of freedom, and for the percentile. Others, give this for the degrees of freedom and for the alpha level (or sometimes alpha/). Must be careful in reading probabilit tables. Confidence Intervals for a single mean: Collect data and get point estimates: o The sample mean, to estimate of the population mean μ ---- Will be unbiased o The sample variance, s to estimate of the population variance ---- Will be unbiased Can calculate interval estimates of each point estimate e.g. 95% confidence interval for the true mean o If the s are normall distributed OR o The sample size is large enough that the Central Limit Theorem holds -- will be normall distributed 5 6

14 7 ( ) ( ) ( ) ( ) large ver is when or replacemen t with replacemen t; without / them add then and value each square items all sum over where infinite is sometimes items possible of out measured items N n s s N n N n s s n n s n n N N n i i n i i n i i n i i 8 n s t s CV + /, / : population the of mean true the for Intervals Confidence 95% 00 Variation t of Coefficien α

15 Examples: n is: 4 Plot volume ba/ha ave. dbh mean: variance: std.dev.: std.dev. of mean: t should be: 3.8 Actual 95% CI (+/-): NOTE: EXCEL: %(+/-) t: not correct!!! Hpothesis Tests: Can hpothesize what the true value of an population parameter might be, and state this as null hpothesis (H0: ) We also state an alternate hpothesis (H: or Ha: ) that it is a) not equal to this value; b) greater than this value; or c) less than this value Collect sample data to test this hpothesis From the sample data, we calculate a sample statistic as a point estimate of this population parameter and an estimated variance of the sample statistic. We calculate a test-statistic using the sample estimates Under H0, this test-statistic will follow a known distribution. If the test-statistic is ver unusual, compared to the tabular values for the known distribution, then the H0 is ver unlikel and we conclude H: 9 30

16 Example for a single mean: We believe that the average weight of ravens in Yukon is kg. H0: Standard Error of the Mean: Aside: What is the CV? Test statistic: t-distribution t H: A sample of 0 birds is taken (HOW??) and each bird is weighed and released. The average bird weight is 0.8 kg, and the standard deviation was 0.0 kg. Assuming the bird weights follow a normal distribution, we can use a t-test (wh not a z-test?) Under H0: this will follow a t-distribution with df n-. Find value from t-table and compare: Mean: Variance: Conclude? 3 3

17 The p-value: Is the probabilit that we would get a value outside of the sample test statistic. Example: Comparing two means: We believe that the average weight of male ravens differs from female ravens NOTE: In EXCEL use: tdist(x,df,tails) H0: H: μ μ or μ μ 0 μ μ or μ μ 0 A sample of 0 birds is taken and each bird is weighed and released. birds were males with an average weight of. kg and a standard deviation of 0.0 kg. 8 birds were females with an average weight of 0.8 and a standard deviation of 0.0 kg. Means? Sample Variances? 33 34

18 Test statistic: Errors for Hpothesis Tests t t ( ) s 0 ( n ) s + ( n ) s n + n Under H0: this will follow a t-distribution with df (n+n-). Find t-value from tables and compare, or use the p-value: H0 True H0 False Accept -α β (Tpe II error) Reject α (Tpe I error) -β Tpe I Error: Reject H0 when it was true. Probabilit of this happening is α Tpe II Error: Accept H0 when it is false. Probabilit of this happening is β Power of the test: Reject H0 when it is false. Probabilit of this is -β Conclude? 35 36

19 What increases power? Increase sample sizes, resulting in lower standard errors A larger difference between mean for H0 and for H Increase alpha. Will decrease beta. Fitting Equations REF: Idea is : - variable of interest (dependent variable) i ; hard to measure - eas to measure variables (predictor/ independent) that are related to the variable of interest, labeled x i, x i,...x mi - measure i, x i,...x mi for a sample of n items - use this sample to estimate an equation that relates i (dependent variable) to x i,..x mi (independent or predictor variables) - once equation is fitted, one can then just measure the x s, and get an estimate of without measuring it -- also can examine relationships between variables 37 38

20 Examples: Objective:. Percent deca i ; x i logten (dbh). Logten (volume) i ; x i logten(dbh), x i logten(height) 3. Branch length i ; x i relative height above ground, x i dbh, x 3i height Find estimates of β 0, β, β... β m such that the sum of squared differences between measured i and predicted i (usuall labeled as ŷ i, values on the line or surface) is the smallest (minimize the sum of squared errors, called least squared error). Tpes of Equations Simple Linear Equation: i β o + β x i + ε i OR Find estimates of β 0, β, β... β m such that the likelihood (probabilit) of getting these values is the largest (maximize the likelihood). Multiple Linear Equation: i β 0 + β x i + β x i +...+β m x mi +ε i Nonlinear Equation: takes man forms, for example: Finding the minimum of sum of squared errors is often easier. In some cases, the lead to the same estimates of parameters. i β 0 + β x i β x i β 3 +ε i 39 40

21 Simple Linear Regression (SLR) Population: i β 0 + β x i + ε i Sample: i b 0 + b x i + e i b 0 is an estimate of β 0 [intercept] b is an estimate of β [slope] μ β + β x Y x 0 ˆ i b0 + b xi ei i ŷ i is the predicted ; an estimate of the average for for a particular x value e i is an estimate of ε i, called the error or the residual; represents the variation in the dependent variable (the ) which is not accounted for b predictor variable (the x). Find b o (intercept; i when x i 0) and b (slope) so that SSE e i (sum of squared errors over all n sample observations) is the smallest (least squares solution) The variables do not have to be in the same units. Coefficients will change with different units of measure. Given estimates of b o and b, we can get an estimate of the dependent variable (the ) for ANY value of the x, within the ranges of x s represented in the original data. i ˆ i Example: Tree Height (m) hard to measure; Dbh (diameter at.3 m above ground in cm) eas to measure use Dbh squared for a linear equation h e i g h t i Difference between measured and the mean of ˆ i i Dbh squared ˆ Difference between measured and predicted i i ˆ i ˆ ( ) ( ˆ ) i i i i Difference between predicted and mean of ˆ i ˆ i i 4 4

22 Least Squares Solution: Finding the Set of Coefficients that Minimizes the Sum of Squared Errors To find the estimated coefficients that minimizes SSE for a particular set of sample data and a particular equation (form and variables):. Define the sum of squared errors (SSE) in terms of the measured minus the predicted s (the errors);. Take partial derivatives of the SSE equation with respect to each coefficient 3. Set these equal to zero (for the minimum) and solve for all of the equations (solve the set of equations using algebra or linear algebra). For linear models (simple or multiple linear), there will be one solution. We can mathematicall solve the set of partial derivative equations. WILL ALWAYS GO THROUGH THE POINT DEFINED BY ( x, ). Will alwas result in e i 0 For nonlinear models, this is not possible and we must search to find a solution (covered in FRST 530). If we used the criterion of finding the maximum likelihood (probabilit) rather than the minimum SSE, we would need to search for a solution, even for linear models (covered FRST 530)

23 45 Least Squares Solution for SLR: Find the set of estimated parameters (coefficients) that minimize sum of squared errors ( ) + n i i i n i i x b b e SSE 0 ) ( min ) min( ) min( Take partial derivatives with respect to b 0 and b, set them equal to zero and solve. ( ) + n i i x i b b b SSE 0 0 ) ( n i i n i i n i i n i i n i i n i n i i x n b n b x b nb x b b b x b 0 46 ( ) + n i i i i x b b x b SSE 0 ) ( n i i i n i n i i i n i i i n i n i i i i n i n i i i n i i n i i i n i n i i i x x x b x b x x b x b x b x x b x b x b x ) ( 0 With some further manipulations: ( )( ) ( ) SSx SPx n s n s x x x x b x x n i i n i i i ) ( ) ( Where SPx refers to the corrected sum of cross products for x and ; SSx refers to the corrected sum of squares for x [Class example]

24 Properties of b 0 and b b 0 and b are least squares estimates of β 0 and β. Under assumptions concerning the error term and sampling/ measurements, these are: Unbiased estimates; given man estimates of the slope and intercept for all possible samples, the average of the sample estimates will equal the true values The variabilit of these estimates from sample to sample can be estimated from the single sample; these estimated variances will be unbiased estimates of the true variances (and standard errors) The estimated intercept and slope will be the most precise (most efficient with the lowest variances) estimates possible (called Best ) These will also be the maximum likelihood estimates of the intercept and slope Assumptions of SLR Once coefficients are obtained, we must check the assumptions of SLR. Assumptions must be met to: obtain the desired characteristics assess goodness of fit (i.e., how well the regression line fits the sample data) test significance of the regression and other hpotheses calculate confidence intervals and test hpothesis for the true coefficients (population) calculate confidence intervals for mean predicted value given a set of x value (i.e. for the predicted given a particular value of the x) Need good estimates (unbiased or at least consistent) of the standard errors of coefficients and a known probabilit distribution to test hpotheses and calculate confidence intervals

25 Checking assumptions using residual Plots Assumptions of : Residual plot that meets the assumptions of a linear relationship, and equal variance of the observations:. a linear relationship between the and the x;. equal variance of errors; and 3. independence of errors (independent observations) can be visuall checked b using RESIDUAL PLOTS A residual plot shows the residual (i.e., i - ŷ i ) as the -axis and the predicted value ( ŷ i ) as the x-axis. The data points are evenl distributed about zero and there are no outliers (ver unusual points that ma be a measurement or entr error). For independence: Residual plots can also indicate unusual points (outliers) that ma be measurement errors, transcription errors, etc

26 Examples of Residual Plots Indicating Failures to Meet Assumptions:. The relationship between the x s and is linear. If not. The variance of the values must be the same for ever one of the x values. If not met, the spread around the line will not be even. met, the residual plot and the plot of vs. x will show a curved line: ht 60 ˆ 50 ˆ 40 ˆ 30 ˆ ˆ ˆ 4 0 ˆ Šˆ ˆ ˆ ˆ ˆ ˆ dbhsq Residual 5 ˆ * * * * * * 0 ˆ * * *** * *** *** *** ** ** * *** ***** *** *** * *** * 5 ˆ ********* ** ******* * **** * ** ** * ****** *** ** * ***** ** * ** *** * * * ***** ** * * * **** 0 ˆ ******** ** *** ******** * * * * ** ******* * * ******* ** * * * ******* * * * -5 ˆ * *** * * * * * * ** ** * * ** * ** * * * ** ** -0 ˆ * * * * * * * -5 ˆ * * -0 ˆ Š-ˆ ˆ ˆ ˆ ˆ ˆ ˆ Predicted Value of ht Result: If this assumption is not met, the estimated coefficients (slopes and intercept) will be unbiased, but the estimates of the standard deviation of these coefficients will be biased. we cannot calculate CI nor test the significance of the x Result: If this assumption is not met: the regression line does not fit the data well; biased estimates of coefficients variable. However, estimates of the coefficients of the regression line and goodness of fit are still unbiased and standard errors of the coefficients will occur 5 5

27 3. Each observation (i.e., x i and i ) must be independent of all other observations. In this case, we produce a different residual plot, where the residuals are on the -axis as before, but the x-axis is the variable that is thought to produce the dependencies (e.g., time). If not met, this revised residual plot will show a trend, indicating the residuals are not independent. we cannot calculate CI nor test the significance of the x variable. However, estimates of the coefficients of the regression line and goodness of fit are still unbiased Normalit Histogram or Plot A fourth assumption of the SLR is: 4. The values must be normall distributed for each of the x values. A histogram of the errors, and/or a normalit plot can be used to check this, as well as tests of normalit Result: If this assumption is not met, the estimated coefficients (slopes and intercept) will be unbiased, but the estimates of the standard deviation of these coefficients will be biased. Histogram # Boxplot 0.5+* 0.*.*.*.**** 8.******* 4.************** 7.******************** 40.***************************** ************************** 5.****************************** 60 *--+--* -0.5+***************************** 58.************************* 49.***************** ************** 8.************ 4.***********.**** 7.**** 7.*** 5..* ** HO: data are normal H: data are not normal Tests for Normalit 53 54

28 Test --Statistic p Value Shapiro-Wilk W Pr < W Kolmogorov-Smirnov D Pr > D Cramer-von Mises W-Sq Pr > W-Sq Anderson-Darling A-Sq Pr > A-Sq < Normal Probabilit Plot 0.5+ * * +** +++** +**** +**** ***** **** ***** **** **** **** ***+ **** *** +*** ***** +** +*** +**** * -.5+* Result: We cannot calculate CI nor test the significance of the x variable, since we do not know what probabilities to use. Also, estimated coefficients are no longer equal to the maximum likelihood solution. Example: Residual Frequenc Normal Plot of Residuals Normal Score Histogram of Residuals 0 Residual Volume versus dbh Residual Residual I Chart of Residuals Observation Number 0 5 Fit Residuals vs. Fits UCL.63 X0.000 LCL

29 Measurements and Sampling Assumptions The remaining assumptions are based on the measurements and collection of the sampling data. 5. The x values are measured without error (i.e., the x values are fixed). This can onl be known if the process of collecting the data is known. For example, if tree diameters are ver precisel measured, there will be little error. If this assumption is not met, the estimated coefficients (slopes and intercept) and their variances will be biased, since the x values are varing. 6. The values are randoml selected for value of the x variables (i.e., for each x value, a list of all possible values is made, and some are randoml selected). For man biological problems, the observations will be gathered using simple random sampling or sstematic sampling (grid across the land area). This does not strictl meet this assumption. Also, more complex sampling design such as multistage sampling (sampling large units and sampling smaller units within the large units), this assumption is not met. If the equation is correct, then this does not cause problems. If not, the estimated equation will be biased

30 Transformations Outliers: Unusual Points Common Transformations Powers x 3, x 0.5, etc. for relationships that look nonlinear log0, loge also for relationships that look nonlinear, or when the variances of are not equal around the line Sin- [arcsine] when the dependent variable is a proportion. Rank transformation: for non-normal data o Sort the variable o Assign a rank to each variable from to n o Transform the rank to normal (e.g., Blom Transformation) PROBLEM: loose some of the information in the original data Tr to transform x first and leave i variable of interest; however, this is not alwas possible. Use graphs to help choose transformations Check for points that are quite different from the others on: Graph of versus x Residual plot Do not delete the point as it MAY BE VALID! Check: Is this a measurement error? E.g., a tree height of 00 m is ver unlikel Is a transcription error? E.g. for adult person, a weight of 0 lbs was entered rather than 00 lbs. Is there something ver unusual about this point? e.g., a bird has a short beak, because it was damaged. Tr to fix the observation. If it is ver different than the others, or ou know there is a measurement error that cannot be fixed, then delete it and indicate this in our research report. On the residual plot, an outlier CAN occur if the model is not correct ma need a transformation of the variable(s), or an important variable is missing 59 60

31 Other methods, than SLR (and Multiple Linear Regression), when transformations do not work (some covered in FRST 530): Nonlinear least squares: Least squares solution for nonlinear models; uses a search algorithm to find estimated coefficients; has good properties for large datasets; still assumes normalit, equal variances, and independent observations Weighted least squares: for unequal variances. Estimate the variances and use these in weighting the least squares fit of the regression; assumes normalit and independent observations Measures of Goodness of Fit How well does the regression fit the sample data? For simple linear regression, a graph of the original data with the fitted line marked on the graph indicates how well the line fits the data [not possible with MLR] Two measures commonl used: coefficient of determination (r ) and standard error of the estimate(se E ). Generalized linear model: used for distributions other than normal (e.g., binomial, Poisson, etc.), but with no correlation between observations; uses maximum likelihood Generalized least Squares and Mixed Models: use maximum likelihood for fitting models with unequal variances, correlations over space, correlations over time, but normall distributed errors Generalized linear mixed models: Allows for unequal variances, correlations over space and/or time, and non-normal distributions; uses maximum likelihood 6 6

32 To calculate r and SE E, first, calculate the SSE (this is what was minimized): SSE n i e i n i n ( ˆ ) ( ( b + b x )) i i i The sum of squared differences between the measured and estimated s. Calculate the sum of squares for : n n n SS i i i i i ( ) n s ( n ) The sum of squared difference between the measured and the mean of -measures. NOTE: In some texts, this is called the sum of squares total. Calculate the sum of squares regression: n SSreg i ( ˆ ) b SPx SS SSE i The sum of squared differences between the mean of - measures and the predicted s from the fitted equation. Also, is the sum of squares for the sum of squared errors. i i 0 i SS SSE SSE SSreg Then: r SS SS SS And: SSE, SSY are based on s used in the equation will not be in original units if was transformed r coefficient of determination; proportion of variance of, accounted for b the regression using x Is the square of the correlation between x and O (ver poor horizontal surface representing no relationship between and x s) to (perfect fit surface passes through the data) SSE SE E n SSE is based on s used in the equation will not be in original units if was transformed SE E - standard error of the estimate; in same units as Under normalit of the errors: o ± SE E 68% of sample observations o ± SE E 95% of sample observations o Want low SEE 63 64

33 -variable was transformed: Can calculate estimates of these for the original -variable unit, called I (Fit Index) and estimated standard error of the estimate (SE E ), in order to compare to r and SE E of other equations where the was not transformed. I - SSE/SSY where SSE, SSY are in original units. NOTE must back-transform the predicted s to calculate the SSE in original units. Does not have the same properties as r, however: o it can be less than 0 o it is not the square of the correlation between the (in original units) and the x used in the equation. Estimated standard error of the estimate (SE E ), when the dependent variable,, has been transformed: SSE( original units) SE E ' n SE E - standard error of the estimate ; in same units as original units for the dependent variable want low SE E [Class example] Estimated Variances, Confidence Intervals and Hpothesis Tests Testing Whether the Regression is Significant Does knowledge of x improve the estimate of the mean of? Or is it a flat surface, which means we should just use the mean of as an estimate of mean for an x? SSE/ (n-): Called the Mean squared error, as would be the average of the squared error if we divided b n. Instead, we divide b n-. Wh? The degrees of freedom are n-; n observations with two statistics estimated from these, b 0 and b Under the assumptions of SLR, is an unbiased estimated of the true variance of the error terms (error variance) SSR/: Called the Mean Square Regression Degrees of Freedom: x-variable Under the assumptions of SLR, this is an estimate the error variance PLUS a term of variance explained b the regression using x

34 H0: Regression is not significant H: Regression is significant Same as: H0: β 0 [true slope is zero meaning no relationship with x] H: β 0 [slope is positive or negative, not zero] This can be tested using an F-test, as it is the ratio of two variances, or with a t-test since we are onl testing one coefficient (more on this later) Information for the F-test is often shown as an Analsis of Variance Table: Source df SS MS F p-value Regression MSreg F Prob F> SSreg SSreg/ MSreg/MSE F (,n-,- α) Residual n- SSE MSE SSE/(n-) Total n- SS [Class example and explanation of the p-value] Using an F test statistic: SSreg F SSE ( n ) MSreg MSE Under H0, this follows an F distribution for a - α/ percentile with and n- degrees of freedom. If the F for the fitted equation is larger than the F from the table, we reject H0 (not likel true). The regression is significant, in that the true slope is likel not equal to zero

35 Estimated Standard Errors for the Slope and Intercept Under the assumptions, we can obtain an unbiased estimated of the standard errors for the slope and for the intercept [measure of how these would var among different sample sets], using the one set of sample data. s s b0 b x MSE + n SSx MSE SSx MSE n i n SSx Confidence Intervals for the True Slope and Intercept Under the assumptions, confidence intervals can be calculated as: For β o : b 0 ± t α, n sb0 x i Hpothesis Tests for the True Slope and Intercept H0: β c [true slope is equal to the constant, c] H: β c [true slope differs from the constant c] Test statistic: b t c s b Under H0, this is distributed as a t value of t c t n-, -α/. Reject H o if t > t c. The procedure is similar for testing the true intercept for a particular value It is possible to do one-sided hpotheses also, where the alternative is that the true parameter (slope or intercept) is greater than (or less than) a specified constant c. MUST be careful with the t c as this is different. [class example] b t s ± For β : α, n b [class example] 69 70

36 Confidence Interval for the True Mean of given a particular x value For the mean of all possible -values given a particular value of x (μ x h ): where ˆ x s ˆ xh ˆ xh ± tn, α sˆ x h h b 0 + b x h MSE + n ( x x) h SSx Confidence Bands Plot of the confidence intervals for the mean of for several x-values. Will appear as:

37 Confidence Interval for or more -values given a particular x value For one possible new -value given a particular value of x: Where ˆ s ˆ ( new) xh ± tn, α sˆ ( new) x h ( new) x ˆ ( new) xh h b 0 + b x MSE + + n h ( x x) h SSx For the average of g new possible -values given a particular value of x: where ˆ ( new) ˆ ( new) xh ± tn, α sˆ ( newg ) x h x h b 0 + b x h Selecting Among Alternative Models Process to Fit an Equation using Least Squares Steps:. Sample data are needed, on which the dependent variable and all explanator (independent) variables are measured.. Make an transformations that are needed to meet the most critical assumption: The relationship between and x is linear. Example: volume β 0 + β dbh ma be linear whereas volume versus dbh is not. Use i volume, x i dbh. 3. Fit the equation to minimize the sum of squared error. 4. Check Assumptions. If not met, go back to Step. s [class example] ˆ ( new g ) xh MSE + + g n ( x x) h SSx 5. If assumptions are met, then interpret the results. Is the regression significant? What is the r? What is the SE E? Plot the fitted equation over the plot of versus x

38 For a number of models, select based on: Simple Linear Regression Example. Meeting assumptions: If an equation does not meet the assumption of a linear relationship, it is not a candidate model. Compare the fit statistics. Select higher r (or I ), and lower SE E (or SE E ) 3. Reject an models where the regression is not significant, since this model is no better than just using the mean of as the predicted value. 4. Select a model that is biologicall tractable. A simpler model is generall preferred, unless there are practical/biological reasons to select the more complex model 5. Consider the cost of using the model [class example] Temperature (x) Weight () Weight () Observation temp weight Et cetera Weight () 75 76

39 weight versus temperature Obs. temp weight x-diff x-diff. sq Et cetera weight 30 0 mean SSX,8.5 SSY3,9.8 SPXY6, temperature SPx b b0 b x SSx b: b0: NOTE: calculate b first, since this is needed to calculate b

40 From these, the residuals (errors) for the equation, and the sum of squared error (SSE) were calculated: residual Obs. weight -pred residual sq Et cetera SSE: And SSRSSY-SSE F with p0.00 (ver small) In excel use: fdist(x,df,df) to obtain a p-value r : 0.97 Root MSE Or SE E :.57 BUT: Before interpreting the ANOVA table, Are assumptions met? ANOVA Source df SS MS Model Error Total

41 residuals ( erro rs) residual plot predicted weight Normalit plot: Obs. sorted Stand. Rel. Prob. resids resids Freq. z- dist Etc. Linear? Equal variance? Independent observations? 8 8

42 cum ulative probabilit Probabilit plot z-value relative frequenc Prob. z-dist. Questions:. Are the assumptions of simple linear regression met? Evidence?. If so, interpret if this is a good equation based on goodness of it measures

43 3. Is the regression significant? For 95% confidence intervals for b0 and b, would also need estimated standard errors: s b 0 x MSE + n SSx s b MSE SSx The t-value for 6 degrees of freedom and the percentile is. (tinv(0.05,6) in EXCEL) 85 86

44 b 0 ± t α, n s For β o : 5.85 ± b t ± α, n s For β : ± Est. Coeff St. Error For b0: For b: CI: b0 b t(0.975,6).. lower upper Question: Could the real intercept be equal to 0? b b 0 Given a temperature of, what is the estimated average weight (predicted value) and a 95% confidence interval for this estimate? ˆ x ˆ ( x s s ˆ x h ˆ x h h b h 0 + b x h ) MSE + n ( x x) + h SSx ( 37.5) 8.50 ˆ xh ± tn, α sˆ x h 87 88

45 Given a temperature of, what is the estimated weight for an new observation, and a 95% confidence interval for this estimate? ˆ x ˆ ( h x h b 0 + b x h ) If assumptions were not met, we would have to make some transformations and start over again! s s ˆ x h ˆ x h MSE + + n ( x x) + h SSx ( 37.5) ˆ xh ± tn, sˆ α x h

46 SAS code: * wttemp.sas ; options ls70 ps50; run; DATA regdata; input temp weight; cards; run; DATA regdata; set regdata; tempsqtemp**; tempcubtemp**3; logtemplog(temp); run; Proc plot dataregdata; plot weight*(temp tempsq logtemp)'*'; run; * ; PROC REG dataregdata simple; model weighttemp; output outout phat rresid; run; * ; PROC PLOT DATAout; plot resid*hat; run; * ; PROC univariate dataout plot normal; Var resid; Run; 9 9

47 SAS outputs: ) Graphs which appears more linear? ) How man observations were there? 3) What is the mean weight? Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Intercept temp Variable t Value Pr > t Intercept 5.4 <.000 temp 3.98 <.000 The REG Procedure Model: MODEL Dependent Variable: weight Number of Observations Read 8 Number of Observations Used 8 Analsis of Variance Sum of Mean Source DF Squares Square F Value Model Error Corr. Total Source F Value Pr > F Model <.000 Error Corrected Total Root MSE.5760 R-Square Dependent Mean 7. Adj R-Sq 0.97 Coeff Var

48 Plot of resid*hat. Legend: A obs, B obs, etc. 6 ˆ 4 ˆ A A B ˆ A A A R e A s i A d 0 ˆ u A A A a l A - ˆ A A A -4 ˆ A A -6 ˆ Tests for Normalit Test --Statistic--- --p Value--- Shapiro-Wilk W Pr<W Kolmogorov-Smirnov D Pr>D >0.500 Cramer-von Mises W-Sq Pr>W-Sq >0.500 Anderson-Darling A-Sq Pr>A-Sq >0.500 The UNIVARIATE Procedure Variable: resid (Residual) Normal Probabilit Plot * * **++++*.5+ * *++++ +*+++ **+** * * ++++ * *+ * Predicted Value of weight 95 96

49 Multiple Linear Regression (MLR) For example: Population: i β 0 + β x i + β x i +...+β p x mi +ε i Sample: i b 0 + b x i + b x i +...+b p x mi +e i ˆ i b0 + b x i + b xi + K + bm xmi ei i β o is the intercept parameter β, β, β 3,..., β m are slope parameters x i, x i, x 3i... x mi independent variables ε i - is the error term or residual - is the variation in the dependent variable (the ) which is not accounted for b the independent variables (the x s). For an fitted equation (we have the estimated parameters), we can get the estimated average for the dependent variable, for an set of x s. This will be the predicted value for, which is the estimated average of, given the particular values for the x variables. NOTE: In text b Kutner et al. pm+. This is not be confused with the p- value indicating significance in hpothesis tests. ˆ i Predicted log0(vol) X log0(dbh) +. X log0(height) where b o -4.; b. ; b. estimated b finding the least squared error solution. Using this equation for dbh 30 cm, height8m, logten(dbh).48, logten(height).45; logten(vol) volume (m 3 ) This represents the estimated average volume for trees with dbh30 cm and height8 m. Note: This equation is originall a nonlinear equation: b vol a dbh Which was transformed to a linear equation using logarithms: log 0( vol) log0( a) + b log0( dbh) + c log0( ht) + log0ε And this was fitted using multiple linear regression ht c ε 97 98

50 For the observations in the sample data used to fit the regression, we can also get an estimate of the error (we have measured volume). If the measured volume for this tree was m 3, or in log0 units: error i ˆ i For the fitted equation using log0 units. In original units, the estimated error is NOTE: This is not simpl the antilog of Finding the Set of Coefficients that Minimizes the Sum of Squared Errors Same process as for SLR: Find the set of coefficients that results in the minimum SSE, just that there are more parameters, therefore more partial derivative equations and more equations o E.g., with 3 x-variables, there will be 4 coefficients (intercept plus 3 slopes) so four equations For linear models, there will be one unique mathematical solution. For nonlinear models, this is not possible and we must search to find a solution Using the criterion of finding the maximum likelihood (probabilit) rather than the minimum SSE, we would need to search for a solution, even for linear models (covered in other courses, e.g., FRST 530)

Definitions of terms and examples. Experimental Design. Sampling versus experiments. For each experimental unit, measures of the variables of

Definitions of terms and examples. Experimental Design. Sampling versus experiments. For each experimental unit, measures of the variables of Experimental Design Sampling versus experiments similar to sampling and inventor design in that information about forest variables is gathered and analzed experiments presuppose intervention through appling

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and can be printed and given to the

More information

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

Lecture 1 Linear Regression with One Predictor Variable.p2

Lecture 1 Linear Regression with One Predictor Variable.p2 Lecture Linear Regression with One Predictor Variablep - Basics - Meaning of regression parameters p - β - the slope of the regression line -it indicates the change in mean of the probability distn of

More information

Lecture notes on Regression & SAS example demonstration

Lecture notes on Regression & SAS example demonstration Regression & Correlation (p. 215) When two variables are measured on a single experimental unit, the resulting data are called bivariate data. You can describe each variable individually, and you can also

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION

COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION COMPREHENSIVE WRITTEN EXAMINATION, PAPER III FRIDAY AUGUST 26, 2005, 9:00 A.M. 1:00 P.M. STATISTICS 174 QUESTION Answer all parts. Closed book, calculators allowed. It is important to show all working,

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Correlation and the Analysis of Variance Approach to Simple Linear Regression Correlation and the Analysis of Variance Approach to Simple Linear Regression Biometry 755 Spring 2009 Correlation and the Analysis of Variance Approach to Simple Linear Regression p. 1/35 Correlation

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter.

1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. 1) Answer the following questions as true (T) or false (F) by circling the appropriate letter. T F T F T F a) Variance estimates should always be positive, but covariance estimates can be either positive

More information

EXST7015: Estimating tree weights from other morphometric variables Raw data print

EXST7015: Estimating tree weights from other morphometric variables Raw data print Simple Linear Regression SAS example Page 1 1 ********************************************; 2 *** Data from Freund & Wilson (1993) ***; 3 *** TABLE 8.24 : ESTIMATING TREE WEIGHTS ***; 4 ********************************************;

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Simple Linear Regression: One Quantitative IV

Simple Linear Regression: One Quantitative IV Simple Linear Regression: One Quantitative IV Linear regression is frequently used to explain variation observed in a dependent variable (DV) with theoretically linked independent variables (IV). For example,

More information

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c Inference About the Slope ffl As with all estimates, ^fi1 subject to sampling var ffl Because Y jx _ Normal, the estimate ^fi1 _ Normal A linear combination of indep Normals is Normal Simple Linear Regression

More information

Lecture 11: Simple Linear Regression

Lecture 11: Simple Linear Regression Lecture 11: Simple Linear Regression Readings: Sections 3.1-3.3, 11.1-11.3 Apr 17, 2009 In linear regression, we examine the association between two quantitative variables. Number of beers that you drink

More information

General Linear Model (Chapter 4)

General Linear Model (Chapter 4) General Linear Model (Chapter 4) Outcome variable is considered continuous Simple linear regression Scatterplots OLS is BLUE under basic assumptions MSE estimates residual variance testing regression coefficients

More information

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics

Chapter 13 Student Lecture Notes Department of Quantitative Methods & Information Systems. Business Statistics Chapter 13 Student Lecture Notes 13-1 Department of Quantitative Methods & Information Sstems Business Statistics Chapter 14 Introduction to Linear Regression and Correlation Analsis QMIS 0 Dr. Mohammad

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Formal Statement of Simple Linear Regression Model

Formal Statement of Simple Linear Regression Model Formal Statement of Simple Linear Regression Model Y i = β 0 + β 1 X i + ɛ i Y i value of the response variable in the i th trial β 0 and β 1 are parameters X i is a known constant, the value of the predictor

More information

Overview Scatter Plot Example

Overview Scatter Plot Example Overview Topic 22 - Linear Regression and Correlation STAT 5 Professor Bruce Craig Consider one population but two variables For each sampling unit observe X and Y Assume linear relationship between variables

More information

Failure Time of System due to the Hot Electron Effect

Failure Time of System due to the Hot Electron Effect of System due to the Hot Electron Effect 1 * exresist; 2 option ls=120 ps=75 nocenter nodate; 3 title of System due to the Hot Electron Effect ; 4 * TIME = failure time (hours) of a system due to drift

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

In Class Review Exercises Vartanian: SW 540

In Class Review Exercises Vartanian: SW 540 In Class Review Exercises Vartanian: SW 540 1. Given the following output from an OLS model looking at income, what is the slope and intercept for those who are black and those who are not black? b SE

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Handout 1: Predicting GPA from SAT

Handout 1: Predicting GPA from SAT Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data

More information

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D.

Research Design - - Topic 15a Introduction to Multivariate Analyses 2009 R.C. Gardner, Ph.D. Research Design - - Topic 15a Introduction to Multivariate Analses 009 R.C. Gardner, Ph.D. Major Characteristics of Multivariate Procedures Overview of Multivariate Techniques Bivariate Regression and

More information

Correlation and regression

Correlation and regression 1 Correlation and regression Yongjua Laosiritaworn Introductory on Field Epidemiology 6 July 2015, Thailand Data 2 Illustrative data (Doll, 1955) 3 Scatter plot 4 Doll, 1955 5 6 Correlation coefficient,

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

REVIEW 8/2/2017 陈芳华东师大英语系

REVIEW 8/2/2017 陈芳华东师大英语系 REVIEW Hypothesis testing starts with a null hypothesis and a null distribution. We compare what we have to the null distribution, if the result is too extreme to belong to the null distribution (p

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information

Mathematical Notation Math Introduction to Applied Statistics

Mathematical Notation Math Introduction to Applied Statistics Mathematical Notation Math 113 - Introduction to Applied Statistics Name : Use Word or WordPerfect to recreate the following documents. Each article is worth 10 points and should be emailed to the instructor

More information

Ch. 1: Data and Distributions

Ch. 1: Data and Distributions Ch. 1: Data and Distributions Populations vs. Samples How to graphically display data Histograms, dot plots, stem plots, etc Helps to show how samples are distributed Distributions of both continuous and

More information

Chapter 2 Inferences in Simple Linear Regression

Chapter 2 Inferences in Simple Linear Regression STAT 525 SPRING 2018 Chapter 2 Inferences in Simple Linear Regression Professor Min Zhang Testing for Linear Relationship Term β 1 X i defines linear relationship Will then test H 0 : β 1 = 0 Test requires

More information

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" = -/\<>*"; ODS LISTING;

dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR= = -/\<>*; ODS LISTING; dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************; *** Moore, David

More information

3rd Quartile. 1st Quartile) Minimum

3rd Quartile. 1st Quartile) Minimum EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Chapter 16. Simple Linear Regression and dcorrelation

Chapter 16. Simple Linear Regression and dcorrelation Chapter 16 Simple Linear Regression and dcorrelation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007

STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 STA 302 H1F / 1001 HF Fall 2007 Test 1 October 24, 2007 LAST NAME: SOLUTIONS FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 302 STA 1001 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator.

More information

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010

Stat/F&W Ecol/Hort 572 Review Points Ané, Spring 2010 1 Linear models Y = Xβ + ɛ with ɛ N (0, σ 2 e) or Y N (Xβ, σ 2 e) where the model matrix X contains the information on predictors and β includes all coefficients (intercept, slope(s) etc.). 1. Number of

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Week 7.1--IES 612-STA STA doc

Week 7.1--IES 612-STA STA doc Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ

More information

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference.

Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals. Regression Output. Conditions for inference. Understanding regression output from software Nature vs. nurture? Lecture 18 - Regression: Inference, Outliers, and Intervals In 1966 Cyril Burt published a paper called The genetic determination of differences

More information

STOR 455 STATISTICAL METHODS I

STOR 455 STATISTICAL METHODS I STOR 455 STATISTICAL METHODS I Jan Hannig Mul9variate Regression Y=X β + ε X is a regression matrix, β is a vector of parameters and ε are independent N(0,σ) Es9mated parameters b=(x X) - 1 X Y Predicted

More information

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed. EXST3201 Chapter 13c Geaghan Fall 2005: Page 1 Linear Models Y ij = µ + βi + τ j + βτij + εijk This is a Randomized Block Design (RBD) with a single factor treatment arrangement (2 levels) which are fixed.

More information

Statistics for Managers using Microsoft Excel 6 th Edition

Statistics for Managers using Microsoft Excel 6 th Edition Statistics for Managers using Microsoft Excel 6 th Edition Chapter 13 Simple Linear Regression 13-1 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of

More information

Inference for the Regression Coefficient

Inference for the Regression Coefficient Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression line. We can shows that b 0 and b 1 are the unbiased estimates

More information

SPECIAL TOPICS IN REGRESSION ANALYSIS

SPECIAL TOPICS IN REGRESSION ANALYSIS 1 SPECIAL TOPICS IN REGRESSION ANALYSIS Representing Nominal Scales in Regression Analysis There are several ways in which a set of G qualitative distinctions on some variable of interest can be represented

More information

Analysis of Covariance (ANCOVA) with Two Groups

Analysis of Covariance (ANCOVA) with Two Groups Chapter 226 Analysis of Covariance (ANCOVA) with Two Groups Introduction This procedure performs analysis of covariance (ANCOVA) for a grouping variable with 2 groups and one covariate variable. This procedure

More information

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont. TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Business Statistics. Lecture 10: Course Review

Business Statistics. Lecture 10: Course Review Business Statistics Lecture 10: Course Review 1 Descriptive Statistics for Continuous Data Numerical Summaries Location: mean, median Spread or variability: variance, standard deviation, range, percentiles,

More information

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

 M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2 Notation and Equations for Final Exam Symbol Definition X The variable we measure in a scientific study n The size of the sample N The size of the population M The mean of the sample µ The mean of the

More information

STATISTICAL DATA ANALYSIS IN EXCEL

STATISTICAL DATA ANALYSIS IN EXCEL Microarra Center STATISTICAL DATA ANALYSIS IN EXCEL Lecture 5 Linear Regression dr. Petr Nazarov 14-1-213 petr.nazarov@crp-sante.lu Statistical data analsis in Ecel. 5. Linear regression OUTLINE Lecture

More information

Sociology 6Z03 Review II

Sociology 6Z03 Review II Sociology 6Z03 Review II John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review II Fall 2016 1 / 35 Outline: Review II Probability Part I Sampling Distributions Probability

More information

Basic Business Statistics 6 th Edition

Basic Business Statistics 6 th Edition Basic Business Statistics 6 th Edition Chapter 12 Simple Linear Regression Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value of a dependent variable based

More information

STAT 3A03 Applied Regression With SAS Fall 2017

STAT 3A03 Applied Regression With SAS Fall 2017 STAT 3A03 Applied Regression With SAS Fall 2017 Assignment 2 Solution Set Q. 1 I will add subscripts relating to the question part to the parameters and their estimates as well as the errors and residuals.

More information

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false.

ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. ST 512-Practice Exam I - Osborne Directions: Answer questions as directed. For true/false questions, circle either true or false. 1. A study was carried out to examine the relationship between the number

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Glossary for the Triola Statistics Series

Glossary for the Triola Statistics Series Glossary for the Triola Statistics Series Absolute deviation The measure of variation equal to the sum of the deviations of each value from the mean, divided by the number of values Acceptance sampling

More information

Inference for Regression Simple Linear Regression

Inference for Regression Simple Linear Regression Inference for Regression Simple Linear Regression IPS Chapter 10.1 2009 W.H. Freeman and Company Objectives (IPS Chapter 10.1) Simple linear regression p Statistical model for linear regression p Estimating

More information

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6

STA 108 Applied Linear Models: Regression Analysis Spring Solution for Homework #6 STA 8 Applied Linear Models: Regression Analysis Spring 011 Solution for Homework #6 6. a) = 11 1 31 41 51 1 3 4 5 11 1 31 41 51 β = β1 β β 3 b) = 1 1 1 1 1 11 1 31 41 51 1 3 4 5 β = β 0 β1 β 6.15 a) Stem-and-leaf

More information

Statistical tables are attached Two Hours UNIVERSITY OF MANCHESTER. May 2007 Final Draft

Statistical tables are attached Two Hours UNIVERSITY OF MANCHESTER. May 2007 Final Draft Statistical tables are attached wo Hours UNIVERSIY OF MNHESER Ma 7 Final Draft Medical Statistics M377 Electronic calculators ma be used provided that the cannot store tet nswer LL si questions in SEION

More information

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression 36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form

More information

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

ST505/S697R: Fall Homework 2 Solution.

ST505/S697R: Fall Homework 2 Solution. ST505/S69R: Fall 2012. Homework 2 Solution. 1. 1a; problem 1.22 Below is the summary information (edited) from the regression (using R output); code at end of solution as is code and output for SAS. a)

More information

Odor attraction CRD Page 1

Odor attraction CRD Page 1 Odor attraction CRD Page 1 dm'log;clear;output;clear'; options ps=512 ls=99 nocenter nodate nonumber nolabel FORMCHAR=" ---- + ---+= -/\*"; ODS LISTING; *** Table 23.2 ********************************************;

More information

Correlation & Simple Regression

Correlation & Simple Regression Chapter 11 Correlation & Simple Regression The previous chapter dealt with inference for two categorical variables. In this chapter, we would like to examine the relationship between two quantitative variables.

More information

STAT 3A03 Applied Regression Analysis With SAS Fall 2017

STAT 3A03 Applied Regression Analysis With SAS Fall 2017 STAT 3A03 Applied Regression Analysis With SAS Fall 2017 Assignment 5 Solution Set Q. 1 a The code that I used and the output is as follows PROC GLM DataS3A3.Wool plotsnone; Class Amp Len Load; Model CyclesAmp

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 1

More information

ECON3150/4150 Spring 2015

ECON3150/4150 Spring 2015 ECON3150/4150 Spring 2015 Lecture 3&4 - The linear regression model Siv-Elisabeth Skjelbred University of Oslo January 29, 2015 1 / 67 Chapter 4 in S&W Section 17.1 in S&W (extended OLS assumptions) 2

More information

1 A Review of Correlation and Regression

1 A Review of Correlation and Regression 1 A Review of Correlation and Regression SW, Chapter 12 Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club

BE640 Intermediate Biostatistics 2. Regression and Correlation. Simple Linear Regression Software: SAS. Emergency Calls to the New York Auto Club BE640 Intermediate Biostatistics 2. Regression and Correlation Simple Linear Regression Software: SAS Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and Simonoff JS A Casebook

More information

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is

Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Lecture 15 Multiple regression I Chapter 6 Set 2 Least Square Estimation The quadratic form to be minimized is Q = (Y i β 0 β 1 X i1 β 2 X i2 β p 1 X i.p 1 ) 2, which in matrix notation is Q = (Y Xβ) (Y

More information

STATISTICS 479 Exam II (100 points)

STATISTICS 479 Exam II (100 points) Name STATISTICS 79 Exam II (1 points) 1. A SAS data set was created using the following input statement: Answer parts(a) to (e) below. input State $ City $ Pop199 Income Housing Electric; (a) () Give the

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line

Inference for Regression Inference about the Regression Model and Using the Regression Line Inference for Regression Inference about the Regression Model and Using the Regression Line PBS Chapter 10.1 and 10.2 2009 W.H. Freeman and Company Objectives (PBS Chapter 10.1 and 10.2) Inference about

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Simple Linear Regression

Simple Linear Regression Chapter 2 Simple Linear Regression Linear Regression with One Independent Variable 2.1 Introduction In Chapter 1 we introduced the linear model as an alternative for making inferences on means of one or

More information

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns

Predict y from (possibly) many predictors x. Model Criticism Study the importance of columns Lecture Week Multiple Linear Regression Predict y from (possibly) many predictors x Including extra derived variables Model Criticism Study the importance of columns Draw on Scientific framework Experiment;

More information

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3

STA 303 H1S / 1002 HS Winter 2011 Test March 7, ab 1cde 2abcde 2fghij 3 STA 303 H1S / 1002 HS Winter 2011 Test March 7, 2011 LAST NAME: FIRST NAME: STUDENT NUMBER: ENROLLED IN: (circle one) STA 303 STA 1002 INSTRUCTIONS: Time: 90 minutes Aids allowed: calculator. Some formulae

More information

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures

Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures 12.1 Topic 12. The Split-plot Design and its Relatives (continued) Repeated Measures 12.9 Repeated measures analysis Sometimes researchers make multiple measurements on the same experimental unit. We have

More information

Chapter 16. Simple Linear Regression and Correlation

Chapter 16. Simple Linear Regression and Correlation Chapter 16 Simple Linear Regression and Correlation 16.1 Regression Analysis Our problem objective is to analyze the relationship between interval variables; regression analysis is the first tool we will

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7

MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 MA 575 Linear Models: Cedric E. Ginestet, Boston University Midterm Review Week 7 1 Random Vectors Let a 0 and y be n 1 vectors, and let A be an n n matrix. Here, a 0 and A are non-random, whereas y is

More information

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE

FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE FRANKLIN UNIVERSITY PROFICIENCY EXAM (FUPE) STUDY GUIDE Course Title: Probability and Statistics (MATH 80) Recommended Textbook(s): Number & Type of Questions: Probability and Statistics for Engineers

More information

Final Exam - Solutions

Final Exam - Solutions Ecn 102 - Analysis of Economic Data University of California - Davis March 19, 2010 Instructor: John Parman Final Exam - Solutions You have until 5:30pm to complete this exam. Please remember to put your

More information

Confidence Intervals, Testing and ANOVA Summary

Confidence Intervals, Testing and ANOVA Summary Confidence Intervals, Testing and ANOVA Summary 1 One Sample Tests 1.1 One Sample z test: Mean (σ known) Let X 1,, X n a r.s. from N(µ, σ) or n > 30. Let The test statistic is H 0 : µ = µ 0. z = x µ 0

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x).

Linear Regression. Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). Linear Regression Simple linear regression model determines the relationship between one dependent variable (y) and one independent variable (x). A dependent variable is a random variable whose variation

More information

A Re-Introduction to General Linear Models (GLM)

A Re-Introduction to General Linear Models (GLM) A Re-Introduction to General Linear Models (GLM) Today s Class: You do know the GLM Estimation (where the numbers in the output come from): From least squares to restricted maximum likelihood (REML) Reviewing

More information

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE

THE ROYAL STATISTICAL SOCIETY HIGHER CERTIFICATE THE ROYAL STATISTICAL SOCIETY 004 EXAMINATIONS SOLUTIONS HIGHER CERTIFICATE PAPER II STATISTICAL METHODS The Society provides these solutions to assist candidates preparing for the examinations in future

More information