Why Data Transformation? Data Transformation. Homoscedasticity and Normality. Homoscedasticity and Normality

Size: px
Start display at page:

Download "Why Data Transformation? Data Transformation. Homoscedasticity and Normality. Homoscedasticity and Normality"

Transcription

1 Objectives: Data Transformation Understand why we often need to transform our data The three commonly used data transformation techniques Additive effects and multiplicative effects Application of data transformation in ANOVA and regression. Why Data Transformation? The assumptions of most parametric methods: Homoscedasticity Normality Additivity Linearity Data transformation is used to make your data conform to the assumptions of the statistical methods Illustrative examples Homoscedasticity and Normality Homoscedasticity and Normality The data deviates from both homoscedasticity and normality. Won t it be nice if we would make data look this way?

2 Types of Data Transformation The logarithmic transformation The square-root ttransformation ti The arcsine transformation. Data transformation can be done conveniently in EXCEL. Alternatives: Ranks and nonparametric methods. ID Group 1 Group Var t P Equal Var.? P= Kurtosis Skewness.1.1 P(Zg1) Homoscedasticity The two groups of data seem to differ greatly in means, but a t-test shows that the means do not differ significantly from each other - a surprising result. The two groups of data differ greatly in variance, and both deviate significantly from normality. These results invalidate the t-test. We calculate two ratios: var/mean ratio and Std/mean ratio (i.e., coefficient of variation). Group1 Group Var/mean C.V P(Zg) Log-transformation ID ID Group Group 1 1 Group Group Var t -3.4 P Equal Var.? P= 0.67 Kurtosis Skewness P(Zg1) P(Zg) Log-Transformed Data NewX = ln(x+1) The transformation is successful because: The variance is now similar Deviation from normality is now nonsignificant The t-test revealed a highly significant t 34 difference in means between the two groups ID Group 1 Group Var t -3.4 P Equal Var.? P= 0.67 Kurtosis Skewness P(Zg1) P(Zg) Log-Transformed Data NewX = ln(x+1) Transform back: X = e NewX 11 Compare this mean with the original i mean. Which one is more preferable? Calculate the standard error, the degree of freedom, and 95% CL (t 0.05,16 =.47).

3 Normal but Heteroscedastic Any transformation that you use is likely to change normality. Fortunately, t-test and ANOVA are quite robust for this kind of data. Of course, you can also use nonparametric tests. Normal but Heteroscedastic ID Group 1 Group The t-test, however, dt detects t significant ifi 4 13 difference in means. You can use nonparametric methods to analyse data for comparison, and you are like to find t-test to be more powerful. 13 Var t P Equal Var.? P= Kurtosis Skewness 0 0 The two variances are significantly different. Factor B Level 1 Level Level Level Additivity What experimental design is this? Compare the group means. Is there an interaction effect? Additivity i i means that the difference between levels of one factor is consistent for different levels of another factor. Multiplicative Effects Factor B Level 1 Level Level Level Compare the group means. Is there an interaction effect? Does this data set meet the assumption of additivity? When the assumption of additivity is not met, we have difficulty in interpreting main effects. Now calculate the ratio of group means. What did you find?

4 Multiplicative Effects Factor B Level 1 Level Level Level For, we see that Level has a mean about. times as large as that for Level 1. For factor B, Level has a mean about.1 times as large as that for Level 1). If you know the value for Level 1 of, you can obtain the value for Level of by multiplying the known value by.. Similarly, you can do the same for Factor B. We say that the effect of Factors A and B are multiplicative, not additive. Factor B Level 1 Level Level Level Log-transformation Now log-transform the data. Compare the means. Is the assumption of additivity met now? Original Data Transformed data Variance Why log-transformation can change the multiplicative li effects to additive effects? Z = XY ln( Z ) = ln( X) + ln( Y ) Square-Root Transformation ID Group 1 Group The two groups of data differ much in variance Calculate two ratios: var/mean ratio and Std/mean ratio (i.e., coefficient of variation) Does your calculation suggest logtransformation? When is log transformation appropriate? Use square-root transformation when different groups have similar il Variance/ ratios Var Notice the means, which do not Var/ coincide with the most frequent Std/ observations

5 Square-Root Transformation ID Group 1 Group Var Square-root transformation: X ' = X + 3/ The variance is now almost identical between the two groups Transform the means back to the original scale and compare these means with the original means: 3 X = ( X ') Quiz on Data Transformation Group n Var SE T LowerL The data set is rightskewed for each group. Calculate the variance/mean ratio and C.V. for each group, and decide what transformation you should use. Do the transformation and convert the means back to the original UpperL scale. With Multiple Groups Confidence Limits Variance When you have multiple groups, a Variance vs or a Std vs plot can help you to decide which data transformation to use. The graph on the left shows that the Var/ ratio is almost constant. What transformation should you use?, Lo ower, Uppe er Before transformation ower, Upp per, L After transformation With the skewness in our data, do confidence limits on the right make more sense? Why?

6 Arcsine Transformation Group1 Group Group1 Group p Var SE LowerL UpperL Transform back New LowerL UpperL X ' = arcsin( X ) Used for proportions Compare the variances before and after transformation Do you know how to transform the means SE and C.L. back to the original scale? X = (sin X ') Data Transformation Using SAS Data Mydata; input x; newx=log(x); newx=sqrt(x+3/); newx=arsin(sqrt(x)); cards; Natural logarithm transfromation Square-root transformation Arcsine transformation

Topic 23: Diagnostics and Remedies

Topic 23: Diagnostics and Remedies Topic 23: Diagnostics and Remedies Outline Diagnostics residual checks ANOVA remedial measures Diagnostics Overview We will take the diagnostics and remedial measures that we learned for regression and

More information

APPENDIX A. Watershed Delineation and Stream Network Defined from WMS

APPENDIX A. Watershed Delineation and Stream Network Defined from WMS APPENDIX A Watershed Delineation and Stream Network Defined from WMS Figure A.1. Subbasins Delineation and Stream Network for Goodwin Creek Watershed APPENDIX B Summary Statistics of Monthly Peak Discharge

More information

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007)

CHAPTER 17 CHI-SQUARE AND OTHER NONPARAMETRIC TESTS FROM: PAGANO, R. R. (2007) FROM: PAGANO, R. R. (007) I. INTRODUCTION: DISTINCTION BETWEEN PARAMETRIC AND NON-PARAMETRIC TESTS Statistical inference tests are often classified as to whether they are parametric or nonparametric Parameter

More information

Introduction to Linear regression analysis. Part 2. Model comparisons

Introduction to Linear regression analysis. Part 2. Model comparisons Introduction to Linear regression analysis Part Model comparisons 1 ANOVA for regression Total variation in Y SS Total = Variation explained by regression with X SS Regression + Residual variation SS Residual

More information

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New

More information

Topic 8. Data Transformations [ST&D section 9.16]

Topic 8. Data Transformations [ST&D section 9.16] Topic 8. Data Transformations [ST&D section 9.16] 8.1 The assumptions of ANOVA For ANOVA, the linear model for the RCBD is: Y ij = µ + τ i + β j + ε ij There are four key assumptions implicit in this model.

More information

Assessing Model Adequacy

Assessing Model Adequacy Assessing Model Adequacy A number of assumptions were made about the model, and these need to be verified in order to use the model for inferences. In cases where some assumptions are violated, there are

More information

Practice problems. 1. Given a = 3i 2j and b = 2i + j. Write c = i + j in terms of a and b.

Practice problems. 1. Given a = 3i 2j and b = 2i + j. Write c = i + j in terms of a and b. Practice problems 1. Given a = 3i 2j and b = 2i + j. Write c = i + j in terms of a and b. 1, 1 = c 1 3, 2 + c 2 2, 1. Solve c 1, c 2. 2. Suppose a is a vector in the plane. If the component of the a in

More information

Introduction to Statistical Analysis

Introduction to Statistical Analysis Introduction to Statistical Analysis Changyu Shen Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology Beth Israel Deaconess Medical Center Harvard Medical School Objectives Descriptive

More information

Transition Passage to Descriptive Statistics 28

Transition Passage to Descriptive Statistics 28 viii Preface xiv chapter 1 Introduction 1 Disciplines That Use Quantitative Data 5 What Do You Mean, Statistics? 6 Statistics: A Dynamic Discipline 8 Some Terminology 9 Problems and Answers 12 Scales of

More information

Hypothesis Testing hypothesis testing approach

Hypothesis Testing hypothesis testing approach Hypothesis Testing In this case, we d be trying to form an inference about that neighborhood: Do people there shop more often those people who are members of the larger population To ascertain this, we

More information

Analysis of variance (ANOVA) Comparing the means of more than two groups

Analysis of variance (ANOVA) Comparing the means of more than two groups Analysis of variance (ANOVA) Comparing the means of more than two groups Example: Cost of mating in male fruit flies Drosophila Treatments: place males with and without unmated (virgin) females Five treatments

More information

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv).

Regression Analysis. Table Relationship between muscle contractile force (mj) and stimulus intensity (mv). Regression Analysis Two variables may be related in such a way that the magnitude of one, the dependent variable, is assumed to be a function of the magnitude of the second, the independent variable; however,

More information

Test one Review Cal 2

Test one Review Cal 2 Name: Class: Date: ID: A Test one Review Cal 2 Short Answer. Write the following expression as a logarithm of a single quantity. lnx 2ln x 2 ˆ 6 2. Write the following expression as a logarithm of a single

More information

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression

Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression BSTT523: Kutner et al., Chapter 1 1 Chapter 1: Linear Regression with One Predictor Variable also known as: Simple Linear Regression Bivariate Linear Regression Introduction: Functional relation between

More information

Inferences for Regression

Inferences for Regression Inferences for Regression An Example: Body Fat and Waist Size Looking at the relationship between % body fat and waist size (in inches). Here is a scatterplot of our data set: Remembering Regression In

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound

Z score indicates how far a raw score deviates from the sample mean in SD units. score Mean % Lower Bound 1 EDUR 8131 Chat 3 Notes 2 Normal Distribution and Standard Scores Questions Standard Scores: Z score Z = (X M) / SD Z = deviation score divided by standard deviation Z score indicates how far a raw score

More information

Model Fitting. Jean Yves Le Boudec

Model Fitting. Jean Yves Le Boudec Model Fitting Jean Yves Le Boudec 0 Contents 1. What is model fitting? 2. Linear Regression 3. Linear regression with norm minimization 4. Choosing a distribution 5. Heavy Tail 1 Virus Infection Data We

More information

Logarithmic Functions

Logarithmic Functions Metropolitan Community College The Natural Logarithmic Function The natural logarithmic function is defined on (0, ) as ln x = x 1 1 t dt. Example 1. Evaluate ln 1. Example 1. Evaluate ln 1. Solution.

More information

Section 3.5: Implicit Differentiation

Section 3.5: Implicit Differentiation Section 3.5: Implicit Differentiation In the previous sections, we considered the problem of finding the slopes of the tangent line to a given function y = f(x). The idea of a tangent line however is not

More information

One-sided and two-sided t-test

One-sided and two-sided t-test One-sided and two-sided t-test Given a mean cancer rate in Montreal, 1. What is the probability of finding a deviation of > 1 stdev from the mean? 2. What is the probability of finding 1 stdev more cases?

More information

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS

Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS Stat 135, Fall 2006 A. Adhikari HOMEWORK 10 SOLUTIONS 1a) The model is cw i = β 0 + β 1 el i + ɛ i, where cw i is the weight of the ith chick, el i the length of the egg from which it hatched, and ɛ i

More information

Lecture 3: Inference in SLR

Lecture 3: Inference in SLR Lecture 3: Inference in SLR STAT 51 Spring 011 Background Reading KNNL:.1.6 3-1 Topic Overview This topic will cover: Review of hypothesis testing Inference about 1 Inference about 0 Confidence Intervals

More information

13: Additional ANOVA Topics. Post hoc Comparisons

13: Additional ANOVA Topics. Post hoc Comparisons 13: Additional ANOVA Topics Post hoc Comparisons ANOVA Assumptions Assessing Group Variances When Distributional Assumptions are Severely Violated Post hoc Comparisons In the prior chapter we used ANOVA

More information

Applied Econometrics (QEM)

Applied Econometrics (QEM) Applied Econometrics (QEM) based on Prinicples of Econometrics Jakub Mućk Department of Quantitative Economics Jakub Mućk Applied Econometrics (QEM) Meeting #3 1 / 42 Outline 1 2 3 t-test P-value Linear

More information

CHAPTER 1 Systems of Linear Equations

CHAPTER 1 Systems of Linear Equations CHAPTER Systems of Linear Equations Section. Introduction to Systems of Linear Equations. Because the equation is in the form a x a y b, it is linear in the variables x and y. 0. Because the equation cannot

More information

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA)

Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) BSTT523 Pagano & Gauvreau Chapter 13 1 Nonparametric Statistics Data are sometimes not compatible with the assumptions of parametric statistical tests (i.e. t-test, regression, ANOVA) In particular, data

More information

SPSS LAB FILE 1

SPSS LAB FILE  1 SPSS LAB FILE www.mcdtu.wordpress.com 1 www.mcdtu.wordpress.com 2 www.mcdtu.wordpress.com 3 OBJECTIVE 1: Transporation of Data Set to SPSS Editor INPUTS: Files: group1.xlsx, group1.txt PROCEDURE FOLLOWED:

More information

Probability Distributions.

Probability Distributions. Probability Distributions http://www.pelagicos.net/classes_biometry_fa18.htm Probability Measuring Discrete Outcomes Plotting probabilities for discrete outcomes: 0.6 0.5 0.4 0.3 0.2 0.1 NOTE: Area within

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test.

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test. Lecture 11 Topic 8: Data Transformations Assumptions of the Analysis of Variance 1. Independence of errors The ε ij (i.e. the errors or residuals) are statistically independent from one another. Failure

More information

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression

BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression BIOL 458 BIOMETRY Lab 9 - Correlation and Bivariate Regression Introduction to Correlation and Regression The procedures discussed in the previous ANOVA labs are most useful in cases where we are interested

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression Sasivimol Rattanasiri, Ph.D Section for Clinical Epidemiology and Biostatistics Ramathibodi Hospital, Mahidol University E-mail: sasivimol.rat@mahidol.ac.th 1 Outline

More information

Table 1: Fish Biomass data set on 26 streams

Table 1: Fish Biomass data set on 26 streams Math 221: Multiple Regression S. K. Hyde Chapter 27 (Moore, 5th Ed.) The following data set contains observations on the fish biomass of 26 streams. The potential regressors from which we wish to explain

More information

FUNCTIONS AND MODELS

FUNCTIONS AND MODELS 1 FUNCTIONS AND MODELS FUNCTIONS AND MODELS 1.6 Inverse Functions and Logarithms In this section, we will learn about: Inverse functions and logarithms. INVERSE FUNCTIONS The table gives data from an experiment

More information

SET 1. (1) Solve for x: (a) e 2x = 5 3x

SET 1. (1) Solve for x: (a) e 2x = 5 3x () Solve for x: (a) e x = 5 3x SET We take natural log on both sides: ln(e x ) = ln(5 3x ) x = 3 x ln(5) Now we take log base on both sides: log ( x ) = log (3 x ln 5) x = log (3 x ) + log (ln(5)) x x

More information

Six Sigma Black Belt Study Guides

Six Sigma Black Belt Study Guides Six Sigma Black Belt Study Guides 1 www.pmtutor.org Powered by POeT Solvers Limited. Analyze Correlation and Regression Analysis 2 www.pmtutor.org Powered by POeT Solvers Limited. Variables and relationships

More information

Next is material on matrix rank. Please see the handout

Next is material on matrix rank. Please see the handout B90.330 / C.005 NOTES for Wednesday 0.APR.7 Suppose that the model is β + ε, but ε does not have the desired variance matrix. Say that ε is normal, but Var(ε) σ W. The form of W is W w 0 0 0 0 0 0 w 0

More information

Final Exam. Name: Solution:

Final Exam. Name: Solution: Final Exam. Name: Instructions. Answer all questions on the exam. Open books, open notes, but no electronic devices. The first 13 problems are worth 5 points each. The rest are worth 1 point each. HW1.

More information

Heteroscedasticity 1

Heteroscedasticity 1 Heteroscedasticity 1 Pierre Nguimkeu BUEC 333 Summer 2011 1 Based on P. Lavergne, Lectures notes Outline Pure Versus Impure Heteroscedasticity Consequences and Detection Remedies Pure Heteroscedasticity

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

7.4* General logarithmic and exponential functions

7.4* General logarithmic and exponential functions 7.4* General logarithmic and exponential functions Mark Woodard Furman U Fall 2010 Mark Woodard (Furman U) 7.4* General logarithmic and exponential functions Fall 2010 1 / 9 Outline 1 General exponential

More information

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1

Summary statistics. G.S. Questa, L. Trapani. MSc Induction - Summary statistics 1 Summary statistics 1. Visualize data 2. Mean, median, mode and percentiles, variance, standard deviation 3. Frequency distribution. Skewness 4. Covariance and correlation 5. Autocorrelation MSc Induction

More information

Institute of Actuaries of India

Institute of Actuaries of India Institute of Actuaries of India Subject CT3 Probability & Mathematical Statistics May 2011 Examinations INDICATIVE SOLUTION Introduction The indicative solution has been written by the Examiners with the

More information

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc IES 612/STA 4-573/STA 4-576 Winter 2008 Week 1--IES 612-STA 4-573-STA 4-576.doc Review Notes: [OL] = Ott & Longnecker Statistical Methods and Data Analysis, 5 th edition. [Handouts based on notes prepared

More information

Application of Variance Homogeneity Tests Under Violation of Normality Assumption

Application of Variance Homogeneity Tests Under Violation of Normality Assumption Application of Variance Homogeneity Tests Under Violation of Normality Assumption Alisa A. Gorbunova, Boris Yu. Lemeshko Novosibirsk State Technical University Novosibirsk, Russia e-mail: gorbunova.alisa@gmail.com

More information

Assumptions of classical multiple regression model

Assumptions of classical multiple regression model ESD: Recitation #7 Assumptions of classical multiple regression model Linearity Full rank Exogeneity of independent variables Homoscedasticity and non autocorrellation Exogenously generated data Normal

More information

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables.

Regression Analysis. BUS 735: Business Decision Making and Research. Learn how to detect relationships between ordinal and categorical variables. Regression Analysis BUS 735: Business Decision Making and Research 1 Goals of this section Specific goals Learn how to detect relationships between ordinal and categorical variables. Learn how to estimate

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

A Non-parametric bootstrap for multilevel models

A Non-parametric bootstrap for multilevel models A Non-parametric bootstrap for multilevel models By James Carpenter London School of Hygiene and ropical Medicine Harvey Goldstein and Jon asbash Institute of Education 1. Introduction Bootstrapping is

More information

3. Nonparametric methods

3. Nonparametric methods 3. Nonparametric methods If the probability distributions of the statistical variables are unknown or are not as required (e.g. normality assumption violated), then we may still apply nonparametric tests

More information

Chapter 13 Correlation

Chapter 13 Correlation Chapter Correlation Page. Pearson correlation coefficient -. Inferential tests on correlation coefficients -9. Correlational assumptions -. on-parametric measures of correlation -5 5. correlational example

More information

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

EXST Regression Techniques Page 1. We can also test the hypothesis H : œ 0 versus H : EXST704 - Regression Techniques Page 1 Using F tests instead of t-tests We can also test the hypothesis H :" œ 0 versus H :" Á 0 with an F test.! " " " F œ MSRegression MSError This test is mathematically

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression 1 Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable Y (criterion) is predicted by variable X (predictor)

More information

Chapter 2. First-Order Differential Equations

Chapter 2. First-Order Differential Equations Chapter 2 First-Order Differential Equations i Let M(x, y) + N(x, y) = 0 Some equations can be written in the form A(x) + B(y) = 0 DEFINITION 2.2. (Separable Equation) A first-order differential equation

More information

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos -

Hypothesis T e T sting w ith with O ne O One-Way - ANOV ANO A V Statistics Arlo Clark Foos - Hypothesis Testing with One-Way ANOVA Statistics Arlo Clark-Foos Conceptual Refresher 1. Standardized z distribution of scores and of means can be represented as percentile rankings. 2. t distribution

More information

Booklet of Code and Output for STAC32 Final Exam

Booklet of Code and Output for STAC32 Final Exam Booklet of Code and Output for STAC32 Final Exam December 8, 2014 List of Figures in this document by page: List of Figures 1 Popcorn data............................. 2 2 MDs by city, with normal quantile

More information

Statistical comparison of univariate tests of homogeneity of variances

Statistical comparison of univariate tests of homogeneity of variances Submitted to the Journal of Statistical Computation and Simulation Statistical comparison of univariate tests of homogeneity of variances Pierre Legendre* and Daniel Borcard Département de sciences biologiques,

More information

Linear Regression Model. Badr Missaoui

Linear Regression Model. Badr Missaoui Linear Regression Model Badr Missaoui Introduction What is this course about? It is a course on applied statistics. It comprises 2 hours lectures each week and 1 hour lab sessions/tutorials. We will focus

More information

Sliced Inverse Regression

Sliced Inverse Regression Sliced Inverse Regression Ge Zhao gzz13@psu.edu Department of Statistics The Pennsylvania State University Outline Background of Sliced Inverse Regression (SIR) Dimension Reduction Definition of SIR Inversed

More information

Empirical Power of Four Statistical Tests in One Way Layout

Empirical Power of Four Statistical Tests in One Way Layout International Mathematical Forum, Vol. 9, 2014, no. 28, 1347-1356 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/imf.2014.47128 Empirical Power of Four Statistical Tests in One Way Layout Lorenzo

More information

Multiple Linear Regression estimation, testing and checking assumptions

Multiple Linear Regression estimation, testing and checking assumptions Multiple Linear Regression estimation, testing and checking assumptions Lecture No. 07 Example 1 The president of a large chain of fast-food restaurants has randomly selected 10 franchises and recorded

More information

Biological Applications of ANOVA - Examples and Readings

Biological Applications of ANOVA - Examples and Readings BIO 575 Biological Applications of ANOVA - Winter Quarter 2010 Page 1 ANOVA Pac Biological Applications of ANOVA - Examples and Readings One-factor Model I (Fixed Effects) This is the same example for

More information

Fractional Polynomial Regression

Fractional Polynomial Regression Chapter 382 Fractional Polynomial Regression Introduction This program fits fractional polynomial models in situations in which there is one dependent (Y) variable and one independent (X) variable. It

More information

Cheat Sheet: Linear Regression

Cheat Sheet: Linear Regression Cheat Sheet: Linear Regression Measurement and Evaluation of HCC Systems Scenario Use regression if you want to test the simultaneous linear effect of several variables varx1, varx2, on a continuous outcome

More information

Further Pure Mathematics 3 GCE Further Mathematics GCE Pure Mathematics and Further Mathematics (Additional) A2 optional unit

Further Pure Mathematics 3 GCE Further Mathematics GCE Pure Mathematics and Further Mathematics (Additional) A2 optional unit Unit FP3 Further Pure Mathematics 3 GCE Further Mathematics GCE Pure Mathematics and Further Mathematics (Additional) A optional unit FP3.1 Unit description Further matrix algebra; vectors, hyperbolic

More information

4.1. Introduction: Comparing Means

4.1. Introduction: Comparing Means 4. Analysis of Variance (ANOVA) 4.1. Introduction: Comparing Means Consider the problem of testing H 0 : µ 1 = µ 2 against H 1 : µ 1 µ 2 in two independent samples of two different populations of possibly

More information

Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function

Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function Journal of Data Science 7(2009), 459-468 Two-by-two ANOVA: Global and Graphical Comparisons Based on an Extension of the Shift Function Rand R. Wilcox University of Southern California Abstract: When comparing

More information

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity

Outline. Possible Reasons. Nature of Heteroscedasticity. Basic Econometrics in Transportation. Heteroscedasticity 1/25 Outline Basic Econometrics in Transportation Heteroscedasticity What is the nature of heteroscedasticity? What are its consequences? How does one detect it? What are the remedial measures? Amir Samimi

More information

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum

T-test: means of Spock's judge versus all other judges 1 12:10 Wednesday, January 5, judge1 N Mean Std Dev Std Err Minimum Maximum T-test: means of Spock's judge versus all other judges 1 The TTEST Procedure Variable: pcwomen judge1 N Mean Std Dev Std Err Minimum Maximum OTHER 37 29.4919 7.4308 1.2216 16.5000 48.9000 SPOCKS 9 14.6222

More information

Estadística II Chapter 5. Regression analysis (second part)

Estadística II Chapter 5. Regression analysis (second part) Estadística II Chapter 5. Regression analysis (second part) Chapter 5. Regression analysis (second part) Contents Diagnostic: Residual analysis The ANOVA (ANalysis Of VAriance) decomposition Nonlinear

More information

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

November 20, Problem Number of points Points obtained Total 50

November 20, Problem Number of points Points obtained Total 50 MATH 124 E MIDTERM 2, v.b Autumn 2018 November 20, 2018 NAME: SIGNATURE: STUDENT ID #: GAB AB AB AB AB AB AB AB AB AB AB AB AB AB QUIZ SECTION: ABB ABB Problem Number of points Points obtained 1 14 2 10

More information

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS

T.I.H.E. IT 233 Statistics and Probability: Sem. 1: 2013 ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS ESTIMATION AND HYPOTHESIS TESTING OF TWO POPULATIONS In our work on hypothesis testing, we used the value of a sample statistic to challenge an accepted value of a population parameter. We focused only

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Advanced Regression Techniques CS 147: Computer Systems Performance Analysis Advanced Regression Techniques 1 / 31 Overview Overview Overview Common Transformations

More information

Predicted Y Scores. The symbol stands for a predicted Y score

Predicted Y Scores. The symbol stands for a predicted Y score REGRESSION 1 Linear Regression Linear regression is a statistical procedure that uses relationships to predict unknown Y scores based on the X scores from a correlated variable. 2 Predicted Y Scores Y

More information

AMS 7 Correlation and Regression Lecture 8

AMS 7 Correlation and Regression Lecture 8 AMS 7 Correlation and Regression Lecture 8 Department of Applied Mathematics and Statistics, University of California, Santa Cruz Suumer 2014 1 / 18 Correlation pairs of continuous observations. Correlation

More information

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression.

Variance. Standard deviation VAR = = value. Unbiased SD = SD = 10/23/2011. Functional Connectivity Correlation and Regression. 10/3/011 Functional Connectivity Correlation and Regression Variance VAR = Standard deviation Standard deviation SD = Unbiased SD = 1 10/3/011 Standard error Confidence interval SE = CI = = t value for

More information

Physics 509: Non-Parametric Statistics and Correlation Testing

Physics 509: Non-Parametric Statistics and Correlation Testing Physics 509: Non-Parametric Statistics and Correlation Testing Scott Oser Lecture #19 Physics 509 1 What is non-parametric statistics? Non-parametric statistics is the application of statistical tests

More information

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear

Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear Regression analysis is a tool for building mathematical and statistical models that characterize relationships between variables Finds a linear relationship between: - one independent variable X and -

More information

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar

Multiple Regression and Model Building Lecture 20 1 May 2006 R. Ryznar Multiple Regression and Model Building 11.220 Lecture 20 1 May 2006 R. Ryznar Building Models: Making Sure the Assumptions Hold 1. There is a linear relationship between the explanatory (independent) variable(s)

More information

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01

An Analysis of College Algebra Exam Scores December 14, James D Jones Math Section 01 An Analysis of College Algebra Exam s December, 000 James D Jones Math - Section 0 An Analysis of College Algebra Exam s Introduction Students often complain about a test being too difficult. Are there

More information

Measuring relationships among multiple responses

Measuring relationships among multiple responses Measuring relationships among multiple responses Linear association (correlation, relatedness, shared information) between pair-wise responses is an important property used in almost all multivariate analyses.

More information

On the Detection of Heteroscedasticity by Using CUSUM Range Distribution

On the Detection of Heteroscedasticity by Using CUSUM Range Distribution International Journal of Statistics and Probability; Vol. 4, No. 3; 2015 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education On the Detection of Heteroscedasticity by

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression MATH 282A Introduction to Computational Statistics University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/ eariasca/math282a.html MATH 282A University

More information

Chapter 6 Part 4. Confidence Intervals

Chapter 6 Part 4. Confidence Intervals Chapter 6 Part 4 Confidence Intervals October 1, 008 Goal: To clearly understand the link between probability distributions and confidence intervals. Skills: Be able to calculate (1 - α)% confidence interval

More information

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model

CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model CHAPTER 2: Assumptions and Properties of Ordinary Least Squares, and Inference in the Linear Regression Model Prof. Alan Wan 1 / 57 Table of contents 1. Assumptions in the Linear Regression Model 2 / 57

More information

Density Temp vs Ratio. temp

Density Temp vs Ratio. temp Temp Ratio Density 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Density 0.0 0.2 0.4 0.6 0.8 1.0 1. (a) 170 175 180 185 temp 1.0 1.5 2.0 2.5 3.0 ratio The histogram shows that the temperature measures have two peaks,

More information

Sigmaplot di Systat Software

Sigmaplot di Systat Software Sigmaplot di Systat Software SigmaPlot Has Extensive Statistical Analysis Features SigmaPlot is now bundled with SigmaStat as an easy-to-use package for complete graphing and data analysis. The statistical

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Checking model assumptions with regression diagnostics

Checking model assumptions with regression diagnostics @graemeleehickey www.glhickey.com graeme.hickey@liverpool.ac.uk Checking model assumptions with regression diagnostics Graeme L. Hickey University of Liverpool Conflicts of interest None Assistant Editor

More information

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests

PSY 307 Statistics for the Behavioral Sciences. Chapter 20 Tests for Ranked Data, Choosing Statistical Tests PSY 307 Statistics for the Behavioral Sciences Chapter 20 Tests for Ranked Data, Choosing Statistical Tests What To Do with Non-normal Distributions Tranformations (pg 382): The shape of the distribution

More information

Name: ID: Math 233 Exam 1. Page 1

Name: ID: Math 233 Exam 1. Page 1 Page 1 Name: ID: This exam has 20 multiple choice questions, worth 5 points each. You are allowed to use a scientific calculator and a 3 5 inch note card. 1. Which of the following pairs of vectors are

More information

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics

Chapter 15: Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Section 15.1: An Overview of Nonparametric Statistics Understand Difference between Parametric and Nonparametric Statistical Procedures Parametric statistical procedures inferential procedures that rely

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

Contents. Acknowledgments. xix

Contents. Acknowledgments. xix Table of Preface Acknowledgments page xv xix 1 Introduction 1 The Role of the Computer in Data Analysis 1 Statistics: Descriptive and Inferential 2 Variables and Constants 3 The Measurement of Variables

More information