The First Thing You Ever Do When Receive a Set of Data Is

Size: px
Start display at page:

Download "The First Thing You Ever Do When Receive a Set of Data Is"

Transcription

1 The First Thing You Ever Do When Receive a Set of Data Is Understand the goal of the study What are the objectives of the study? What would the person like to see from the data? Understand the methodology How are samples being collected? Is there any subjectivity in sample collection? Pay attention to nested design, pseudo replication After Understanding the Objectives and Methodology Calculate some summary statistics to help you understand the nature of the data. Usually you can calculate summary statistics for most types of data. mean() mean of a vector mean(,trim) trimmed mean median() median of a vector quantile() sample quantiles at given probabilities range() showing the minimum and the maximum value var() variance of a vector or covariance matrix of a matrix of data frame cov() covariance of two vectors or data frame cor() correlation coefficients of two vectors or data frame mad() median absolute deviation stem() stem and leaf plot summary() summary statistics of a data frame 1

2 Examples of Summary Statistics quantile() Quantile function needs to vectors as input. The first one contains the observations, and the second one contains probabilities corresponding the quantile. The function returns the empirical quantiles of the first vector Examples of Summary Statistics stem() A stem and leaf plot indicates the distribution of the vector that looks like this 2

3 Examples of Summary Statistics summary() It is helpful to calculate basic statistics of columns of a data frame Distributional Test Test a vector or multiple vectors whether they conforms to a certain distribution chisq.test() ks.test() t.test() var.test() shapiro.test() wilcox.test() Chi squared goodness of fit test Kolmogorov Smirnov goodness of t test One or two sample Student's t test test on variance equality of x and y Shapiro Wilk test of normality One and two sample Wilcoxon Rank Sum and Signed Rank tests 3

4 Distributional Test ks.test() This is a versatile test that allows you to test: Whether a data vector is drawn from a certain distribution Whether two data vectors are drawn from the same distribution Intention Regression can be a full course by itself (or even many courses), so, it is not the intention of this class to teach you about regression theory I will just introduce some functions that allow you to perform regression 4

5 Basic The general structure for regression functions in R consists of a formula object and additional arguments formula objects play a very important role in statistical modeling in R, they are used to specify the model to be fitted. The exact formulation of a formula object depends on the modeling function. Basic However the general form is given by response ~ expression Sometimes the response can be omitted and sometimes the expression is a collection of variables. It is quite flexible in terms of specification 5

6 Linear Regression Linear regression as we have usually known has the following form Where β 0,, βp are the intercept and p regression coefficients and x 1,, x p are the p regression variables. The error term ε has mean zero and is often modeled as a normal distribution with some variance. The multiple regression function in R is lm(formula, data, weights, subset, na.action) E.g., lm(y~x1+x2+x3+xp, ). Linear Regression The operators in the formula objects have different meanings The : is used to model interaction terms in linear models The * is used as a short hand notation for interaction; however, it includes all combinations of possible interactions up to p order The ^ is used to generate interaction terms up to a certain order 6

7 Linear Regression The operators in the formula objects have different meanings The - operator is used to leave out terms in a formula. E.g., -1 removes the intercept in a regression formula The function I is used to suppress the specific meaning of the operators in a linear regression model. For example, if you want to include a transformed x 2 variable in your model, say multiplied by 2, the following formula will not work: Linear Regression After you have fitted a linear model, you want to extract a lot of valuable information about the results of model fits. Here are some functions that allow you to extract information on model fits or diagnostics 7

8 Linear Regression Says that after the diagnostics and you are not satisfied with your model and you would like to make changes, you could write the whole model formulation again. It is much easier just to use the update() function in R to specify the changes that you need to make to your original model. The ~.+Disp construction adds the Disp. variable to whatever model is used in generating the cars.lm object Generalized Linear Model (GLM) Generalized linear model is used to fit a suite of distribution other than Normal distribution such as the common logistic regression The R function for fitting GLM is glm(). The following are the families of distribution that can be fitted using the glm() 8

9 Non linear Regression The non linear regression model specifies non linear combinations of predictors in the model formulation. It generally has the form of An example is The R function to compute non linear regression is nls(). Mixed effects Modeling For linear mixed effects modeling, there are currently two common packages that allow you to model LMM lme4 package (Bates, Maechler and Bolker): lmer() nlme package (Pinheiro and Bate): lme() For generalized linear mixed effects modeling (GLMM) lme4 package (Bates, Maechler and Bolker): glmer() For non linear mixed effects modeling (NLMM) nlme package (Pinheiro and Bate): nlme() 9

10 Design of Experiment Experiment allows us to make inference on causal effects between response and predictros due to the way study is setting up By controlling levels of predictors Minimizing effects from external unwanted factors It follows rigid design in order for us to make inference The common way of analyzing experimental data is the variation of Analysis of Variance (ANOVA) The set up of ANOVA is different among different experimental design However, it is important to note that ANOVA is essentially a Liner Regression One way ANOVA One way ANOVA is used when the experiment consists of one factor, which could have multiple levels The hypothesis 10

11 One way ANOVA Example dataset is a set of 24 blood coagulation times. 24 animals were randomly assigned to four different diets and the samples were taken in a random order (download faraway package from R) One way ANOVA We will find the response of blood coagulation times to different diets 11

12 One way ANOVA What if we fit the linear model without the intercept? Two way ANOVA An experimental design that involves two way ANOVA have two factors There could be three hypothesis The null hypothesis 1, H o = there is no interaction The null hypothesis 2, H o = there is no effect from factor 1 The null hypothesis 3, H o = there is no effect from factor 2 12

13 Two way ANOVA Example: 48 rats were allocated to 3 poisons (I, II, III) and 4 treatments (A, B, C, D). The response was survival time in tens of hours Two way ANOVA Fitting the two way ANOVA and checking the fit 13

14 Two way ANOVA Need to transform the response variable due to the undesired properties of QQ plot and the residuals Two way ANOVA Removing the interaction term since it is insignificant 14

15 Randomized Complete Block Design (RCBD) Blocking is an effective method removing unwanted and unknown variation (which could not be controlled) We will arrange experimental units into blocks in such a way that the intrablock variation is small but interblock variation is large. A block design can be more effective than the Randomized Complete Design (RCD, which is the one way and two way ANOVA examples) Randomized Complete Block Design (RCBD) Example: we have 4 treatments and have 20 patients available. We could divide the patients into 5 blocks of 4 patients each such that the patients within a block have some relevant similarity. Then we would randomly assign the treatments within a block 15

16 Randomized Complete Block Design (RCBD) Example: compare 4 processes (A, B, C, D) for production of penicillin. The raw materials, corn steep liquor, is quite variable and can only be made in blends sufficient for 4 runs. So, we block the blends. The null hypothesis H o = there is no differences between the processes. Randomized Complete Block Design (RCBD) Is there interaction between blocks and treatments? 16

17 Randomized Complete Block Design (RCBD) Let s just assumed that there is no interaction (actually, we are not able to carry out test for interaction, do you know why?) 17

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics

Linear, Generalized Linear, and Mixed-Effects Models in R. Linear and Generalized Linear Models in R Topics Linear, Generalized Linear, and Mixed-Effects Models in R John Fox McMaster University ICPSR 2018 John Fox (McMaster University) Statistical Models in R ICPSR 2018 1 / 19 Linear and Generalized Linear

More information

Introduction to Mixed Models in R

Introduction to Mixed Models in R Introduction to Mixed Models in R Galin Jones School of Statistics University of Minnesota http://www.stat.umn.edu/ galin March 2011 Second in a Series Sponsored by Quantitative Methods Collaborative.

More information

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6

R in Linguistic Analysis. Wassink 2012 University of Washington Week 6 R in Linguistic Analysis Wassink 2012 University of Washington Week 6 Overview R for phoneticians and lab phonologists Johnson 3 Reading Qs Equivalence of means (t-tests) Multiple Regression Principal

More information

3 Joint Distributions 71

3 Joint Distributions 71 2.2.3 The Normal Distribution 54 2.2.4 The Beta Density 58 2.3 Functions of a Random Variable 58 2.4 Concluding Remarks 64 2.5 Problems 64 3 Joint Distributions 71 3.1 Introduction 71 3.2 Discrete Random

More information

Answer Keys to Homework#10

Answer Keys to Homework#10 Answer Keys to Homework#10 Problem 1 Use either restricted or unrestricted mixed models. Problem 2 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean

More information

Assignment 9 Answer Keys

Assignment 9 Answer Keys Assignment 9 Answer Keys Problem 1 (a) First, the respective means for the 8 level combinations are listed in the following table A B C Mean 26.00 + 34.67 + 39.67 + + 49.33 + 42.33 + + 37.67 + + 54.67

More information

WU Weiterbildung. Linear Mixed Models

WU Weiterbildung. Linear Mixed Models Linear Mixed Effects Models WU Weiterbildung SLIDE 1 Outline 1 Estimation: ML vs. REML 2 Special Models On Two Levels Mixed ANOVA Or Random ANOVA Random Intercept Model Random Coefficients Model Intercept-and-Slopes-as-Outcomes

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

Chapter 8 (More on Assumptions for the Simple Linear Regression)

Chapter 8 (More on Assumptions for the Simple Linear Regression) EXST3201 Chapter 8b Geaghan Fall 2005: Page 1 Chapter 8 (More on Assumptions for the Simple Linear Regression) Your textbook considers the following assumptions: Linearity This is not something I usually

More information

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1

Contents. Preface to Second Edition Preface to First Edition Abbreviations PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 Contents Preface to Second Edition Preface to First Edition Abbreviations xv xvii xix PART I PRINCIPLES OF STATISTICAL THINKING AND ANALYSIS 1 1 The Role of Statistical Methods in Modern Industry and Services

More information

Comparison of two samples

Comparison of two samples Comparison of two samples Pierre Legendre, Université de Montréal August 009 - Introduction This lecture will describe how to compare two groups of observations (samples) to determine if they may possibly

More information

Turning a research question into a statistical question.

Turning a research question into a statistical question. Turning a research question into a statistical question. IGINAL QUESTION: Concept Concept Concept ABOUT ONE CONCEPT ABOUT RELATIONSHIPS BETWEEN CONCEPTS TYPE OF QUESTION: DESCRIBE what s going on? DECIDE

More information

My data doesn t look like that..

My data doesn t look like that.. Testing assumptions My data doesn t look like that.. We have made a big deal about testing model assumptions each week. Bill Pine Testing assumptions Testing assumptions We have made a big deal about testing

More information

unadjusted model for baseline cholesterol 22:31 Monday, April 19,

unadjusted model for baseline cholesterol 22:31 Monday, April 19, unadjusted model for baseline cholesterol 22:31 Monday, April 19, 2004 1 Class Level Information Class Levels Values TRETGRP 3 3 4 5 SEX 2 0 1 Number of observations 916 unadjusted model for baseline cholesterol

More information

ST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data

ST4241 Design and Analysis of Clinical Trials Lecture 7: N. Lecture 7: Non-parametric tests for PDG data ST4241 Design and Analysis of Clinical Trials Lecture 7: Non-parametric tests for PDG data Department of Statistics & Applied Probability 8:00-10:00 am, Friday, September 2, 2016 Outline Non-parametric

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93 Contents Preface ix Chapter 1 Introduction 1 1.1 Types of Models That Produce Data 1 1.2 Statistical Models 2 1.3 Fixed and Random Effects 4 1.4 Mixed Models 6 1.5 Typical Studies and the Modeling Issues

More information

Lectures 5 & 6: Hypothesis Testing

Lectures 5 & 6: Hypothesis Testing Lectures 5 & 6: Hypothesis Testing in which you learn to apply the concept of statistical significance to OLS estimates, learn the concept of t values, how to use them in regression work and come across

More information

Week 7.1--IES 612-STA STA doc

Week 7.1--IES 612-STA STA doc Week 7.1--IES 612-STA 4-573-STA 4-576.doc IES 612/STA 4-576 Winter 2009 ANOVA MODELS model adequacy aka RESIDUAL ANALYSIS Numeric data samples from t populations obtained Assume Y ij ~ independent N(μ

More information

Textbook Examples of. SPSS Procedure

Textbook Examples of. SPSS Procedure Textbook s of IBM SPSS Procedures Each SPSS procedure listed below has its own section in the textbook. These sections include a purpose statement that describes the statistical test, identification of

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

MATH 644: Regression Analysis Methods

MATH 644: Regression Analysis Methods MATH 644: Regression Analysis Methods FINAL EXAM Fall, 2012 INSTRUCTIONS TO STUDENTS: 1. This test contains SIX questions. It comprises ELEVEN printed pages. 2. Answer ALL questions for a total of 100

More information

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Business Analytics and Data Mining Modeling Using R Prof. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee Lecture - 04 Basic Statistics Part-1 (Refer Slide Time: 00:33)

More information

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017

Introduction to Regression Analysis. Dr. Devlina Chatterjee 11 th August, 2017 Introduction to Regression Analysis Dr. Devlina Chatterjee 11 th August, 2017 What is regression analysis? Regression analysis is a statistical technique for studying linear relationships. One dependent

More information

36-720: Linear Mixed Models

36-720: Linear Mixed Models 36-720: Linear Mixed Models Brian Junker October 8, 2007 Review: Linear Mixed Models (LMM s) Bayesian Analogues Facilities in R Computational Notes Predictors and Residuals Examples [Related to Christensen

More information

Topic 8. Data Transformations [ST&D section 9.16]

Topic 8. Data Transformations [ST&D section 9.16] Topic 8. Data Transformations [ST&D section 9.16] 8.1 The assumptions of ANOVA For ANOVA, the linear model for the RCBD is: Y ij = µ + τ i + β j + ε ij There are four key assumptions implicit in this model.

More information

1 Multiple Regression

1 Multiple Regression 1 Multiple Regression In this section, we extend the linear model to the case of several quantitative explanatory variables. There are many issues involved in this problem and this section serves only

More information

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018

Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Statistical inference (estimation, hypothesis tests, confidence intervals) Oct 2018 Sampling A trait is measured on each member of a population. f(y) = propn of individuals in the popn with measurement

More information

Logistic Regression in R. by Kerry Machemer 12/04/2015

Logistic Regression in R. by Kerry Machemer 12/04/2015 Logistic Regression in R by Kerry Machemer 12/04/2015 Linear Regression {y i, x i1,, x ip } Linear Regression y i = dependent variable & x i = independent variable(s) y i = α + β 1 x i1 + + β p x ip +

More information

Exam details. Final Review Session. Things to Review

Exam details. Final Review Session. Things to Review Exam details Final Review Session Short answer, similar to book problems Formulae and tables will be given You CAN use a calculator Date and Time: Dec. 7, 006, 1-1:30 pm Location: Osborne Centre, Unit

More information

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model

Linear Regression. In this lecture we will study a particular type of regression model: the linear regression model 1 Linear Regression 2 Linear Regression In this lecture we will study a particular type of regression model: the linear regression model We will first consider the case of the model with one predictor

More information

STAT 501 EXAM I NAME Spring 1999

STAT 501 EXAM I NAME Spring 1999 STAT 501 EXAM I NAME Spring 1999 Instructions: You may use only your calculator and the attached tables and formula sheet. You can detach the tables and formula sheet from the rest of this exam. Show your

More information

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction,

Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, Residual Analysis for two-way ANOVA The twoway model with K replicates, including interaction, is Y ijk = µ ij + ɛ ijk = µ + α i + β j + γ ij + ɛ ijk with i = 1,..., I, j = 1,..., J, k = 1,..., K. In carrying

More information

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013

Outline. Topic 20 - Diagnostics and Remedies. Residuals. Overview. Diagnostics Plots Residual checks Formal Tests. STAT Fall 2013 Topic 20 - Diagnostics and Remedies - Fall 2013 Diagnostics Plots Residual checks Formal Tests Remedial Measures Outline Topic 20 2 General assumptions Overview Normally distributed error terms Independent

More information

STK4900/ Lecture 3. Program

STK4900/ Lecture 3. Program STK4900/9900 - Lecture 3 Program 1. Multiple regression: Data structure and basic questions 2. The multiple linear regression model 3. Categorical predictors 4. Planned experiments and observational studies

More information

Mixed models with correlated measurement errors

Mixed models with correlated measurement errors Mixed models with correlated measurement errors Rasmus Waagepetersen October 9, 2018 Example from Department of Health Technology 25 subjects where exposed to electric pulses of 11 different durations

More information

Introduction and Single Predictor Regression. Correlation

Introduction and Single Predictor Regression. Correlation Introduction and Single Predictor Regression Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Correlation A correlation

More information

One-way ANOVA Model Assumptions

One-way ANOVA Model Assumptions One-way ANOVA Model Assumptions STAT:5201 Week 4: Lecture 1 1 / 31 One-way ANOVA: Model Assumptions Consider the single factor model: Y ij = µ + α }{{} i ij iid with ɛ ij N(0, σ 2 ) mean structure random

More information

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES

PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES PubH 7405: REGRESSION ANALYSIS SLR: DIAGNOSTICS & REMEDIES Normal Error RegressionModel : Y = β 0 + β ε N(0,σ 2 1 x ) + ε The Model has several parts: Normal Distribution, Linear Mean, Constant Variance,

More information

A brief introduction to mixed models

A brief introduction to mixed models A brief introduction to mixed models University of Gothenburg Gothenburg April 6, 2017 Outline An introduction to mixed models based on a few examples: Definition of standard mixed models. Parameter estimation.

More information

14 Multiple Linear Regression

14 Multiple Linear Regression B.Sc./Cert./M.Sc. Qualif. - Statistics: Theory and Practice 14 Multiple Linear Regression 14.1 The multiple linear regression model In simple linear regression, the response variable y is expressed in

More information

Master s Written Examination - Solution

Master s Written Examination - Solution Master s Written Examination - Solution Spring 204 Problem Stat 40 Suppose X and X 2 have the joint pdf f X,X 2 (x, x 2 ) = 2e (x +x 2 ), 0 < x < x 2

More information

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics

TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics TABLES AND FORMULAS FOR MOORE Basic Practice of Statistics Exploring Data: Distributions Look for overall pattern (shape, center, spread) and deviations (outliers). Mean (use a calculator): x = x 1 + x

More information

Impact of serial correlation structures on random effect misspecification with the linear mixed model.

Impact of serial correlation structures on random effect misspecification with the linear mixed model. Impact of serial correlation structures on random effect misspecification with the linear mixed model. Brandon LeBeau University of Iowa file:///c:/users/bleb/onedrive%20 %20University%20of%20Iowa%201/JournalArticlesInProgress/Diss/Study2/Pres/pres.html#(2)

More information

df=degrees of freedom = n - 1

df=degrees of freedom = n - 1 One sample t-test test of the mean Assumptions: Independent, random samples Approximately normal distribution (from intro class: σ is unknown, need to calculate and use s (sample standard deviation)) Hypotheses:

More information

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College

1-Way ANOVA MATH 143. Spring Department of Mathematics and Statistics Calvin College 1-Way ANOVA MATH 143 Department of Mathematics and Statistics Calvin College Spring 2010 The basic ANOVA situation Two variables: 1 Categorical, 1 Quantitative Main Question: Do the (means of) the quantitative

More information

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test.

The ε ij (i.e. the errors or residuals) are normally distributed. This assumption has the least influence on the F test. Lecture 11 Topic 8: Data Transformations Assumptions of the Analysis of Variance 1. Independence of errors The ε ij (i.e. the errors or residuals) are statistically independent from one another. Failure

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

Lecture 9: Linear Regression

Lecture 9: Linear Regression Lecture 9: Linear Regression Goals Develop basic concepts of linear regression from a probabilistic framework Estimating parameters and hypothesis testing with linear models Linear regression in R Regression

More information

Introduction and Background to Multilevel Analysis

Introduction and Background to Multilevel Analysis Introduction and Background to Multilevel Analysis Dr. J. Kyle Roberts Southern Methodist University Simmons School of Education and Human Development Department of Teaching and Learning Background and

More information

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis

PLS205!! Lab 9!! March 6, Topic 13: Covariance Analysis PLS205!! Lab 9!! March 6, 2014 Topic 13: Covariance Analysis Covariable as a tool for increasing precision Carrying out a full ANCOVA Testing ANOVA assumptions Happiness! Covariable as a Tool for Increasing

More information

Rank-Based Methods. Lukas Meier

Rank-Based Methods. Lukas Meier Rank-Based Methods Lukas Meier 20.01.2014 Introduction Up to now we basically always used a parametric family, like the normal distribution N (µ, σ 2 ) for modeling random data. Based on observed data

More information

Weighted Least Squares

Weighted Least Squares Weighted Least Squares The standard linear model assumes that Var(ε i ) = σ 2 for i = 1,..., n. As we have seen, however, there are instances where Var(Y X = x i ) = Var(ε i ) = σ2 w i. Here w 1,..., w

More information

the logic of parametric tests

the logic of parametric tests the logic of parametric tests define the test statistic (e.g. mean) compare the observed test statistic to a distribution calculated for random samples that are drawn from a single (normal) distribution.

More information

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p.

Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. Preface p. xi Introduction and Descriptive Statistics p. 1 Introduction to Statistics p. 3 Statistics, Science, and Observations p. 5 Populations and Samples p. 6 The Scientific Method and the Design of

More information

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij =

K. Model Diagnostics. residuals ˆɛ ij = Y ij ˆµ i N = Y ij Ȳ i semi-studentized residuals ω ij = ˆɛ ij. studentized deleted residuals ɛ ij = K. Model Diagnostics We ve already seen how to check model assumptions prior to fitting a one-way ANOVA. Diagnostics carried out after model fitting by using residuals are more informative for assessing

More information

5.3 Three-Stage Nested Design Example

5.3 Three-Stage Nested Design Example 5.3 Three-Stage Nested Design Example A researcher designs an experiment to study the of a metal alloy. A three-stage nested design was conducted that included Two alloy chemistry compositions. Three ovens

More information

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis jmding/math475/index.

Math 475. Jimin Ding. August 29, Department of Mathematics Washington University in St. Louis   jmding/math475/index. istical A istic istics : istical Department of Mathematics Washington University in St. Louis www.math.wustl.edu/ jmding/math475/index.html August 29, 2013 istical August 29, 2013 1 / 18 istical A istic

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates 2011-03-16 Contents 1 Generalized Linear Mixed Models Generalized Linear Mixed Models When using linear mixed

More information

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL

H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL H-LIKELIHOOD ESTIMATION METHOOD FOR VARYING CLUSTERED BINARY MIXED EFFECTS MODEL Intesar N. El-Saeiti Department of Statistics, Faculty of Science, University of Bengahzi-Libya. entesar.el-saeiti@uob.edu.ly

More information

Diagnostics for mixed/hierarchical linear models

Diagnostics for mixed/hierarchical linear models Graduate Theses and Dissertations Graduate College 2013 Diagnostics for mixed/hierarchical linear models Adam Madison Montgomery Loy Iowa State University Follow this and additional works at: http://lib.dr.iastate.edu/etd

More information

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics

DETAILED CONTENTS PART I INTRODUCTION AND DESCRIPTIVE STATISTICS. 1. Introduction to Statistics DETAILED CONTENTS About the Author Preface to the Instructor To the Student How to Use SPSS With This Book PART I INTRODUCTION AND DESCRIPTIVE STATISTICS 1. Introduction to Statistics 1.1 Descriptive and

More information

Chapter 18 Resampling and Nonparametric Approaches To Data

Chapter 18 Resampling and Nonparametric Approaches To Data Chapter 18 Resampling and Nonparametric Approaches To Data 18.1 Inferences in children s story summaries (McConaughy, 1980): a. Analysis using Wilcoxon s rank-sum test: Younger Children Older Children

More information

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011

Lecture 2: Linear Models. Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 Lecture 2: Linear Models Bruce Walsh lecture notes Seattle SISG -Mixed Model Course version 23 June 2011 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector

More information

Package blme. August 29, 2016

Package blme. August 29, 2016 Version 1.0-4 Date 2015-06-13 Title Bayesian Linear Mixed-Effects Models Author Vincent Dorie Maintainer Vincent Dorie Package blme August 29, 2016 Description Maximum a posteriori

More information

2. TRUE or FALSE: Converting the units of one measured variable alters the correlation of between it and a second variable.

2. TRUE or FALSE: Converting the units of one measured variable alters the correlation of between it and a second variable. 1. The diagnostic plots shown below are from a linear regression that models a patient s score from the SUG-HIGH diabetes risk model as function of their normalized LDL level. a. Based on these plots,

More information

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS

ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS ANALYSIS OF VARIANCE OF BALANCED DAIRY SCIENCE DATA USING SAS Ravinder Malhotra and Vipul Sharma National Dairy Research Institute, Karnal-132001 The most common use of statistics in dairy science is testing

More information

Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model

Oct Analysis of variance models. One-way anova. Three sheep breeds. Finger ridges. Random and. Fixed effects model. The random effects model s s Oct 2017 1 / 34 s Consider N = n 0 + n 1 + + n k 1 observations, which form k groups, of sizes n 0, n 1,..., n k 1. The r-th group has sample mean Ȳ r The overall mean (for all groups combined) is

More information

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Lecture 3: Linear Models. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Lecture 3: Linear Models Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 1 Quick Review of the Major Points The general linear model can be written as y = X! + e y = vector of observed

More information

Correlation in Linear Regression

Correlation in Linear Regression Vrije Universiteit Amsterdam Research Paper Correlation in Linear Regression Author: Yura Perugachi-Diaz Student nr.: 2566305 Supervisor: Dr. Bartek Knapik May 29, 2017 Faculty of Sciences Research Paper

More information

Topic 6. Two-way designs: Randomized Complete Block Design [ST&D Chapter 9 sections 9.1 to 9.7 (except 9.6) and section 15.8]

Topic 6. Two-way designs: Randomized Complete Block Design [ST&D Chapter 9 sections 9.1 to 9.7 (except 9.6) and section 15.8] Topic 6. Two-way designs: Randomized Complete Block Design [ST&D Chapter 9 sections 9.1 to 9.7 (except 9.6) and section 15.8] The completely randomized design Treatments are randomly assigned to e.u. such

More information

2-way analysis of variance

2-way analysis of variance 2-way analysis of variance We may be considering the effect of two factors (A and B) on our response variable, for instance fertilizer and variety on maize yield; or therapy and sex on cholesterol level.

More information

Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research

Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research Test Yourself! Methodological and Statistical Requirements for M.Sc. Early Childhood Research HOW IT WORKS For the M.Sc. Early Childhood Research, sufficient knowledge in methods and statistics is one

More information

Linear Probability Model

Linear Probability Model Linear Probability Model Note on required packages: The following code requires the packages sandwich and lmtest to estimate regression error variance that may change with the explanatory variables. If

More information

The Pennsylvania State University The Graduate School Eberly College of Science INTRABLOCK, INTERBLOCK AND COMBINED ESTIMATES

The Pennsylvania State University The Graduate School Eberly College of Science INTRABLOCK, INTERBLOCK AND COMBINED ESTIMATES The Pennsylvania State University The Graduate School Eberly College of Science INTRABLOCK, INTERBLOCK AND COMBINED ESTIMATES IN INCOMPLETE BLOCK DESIGNS: A NUMERICAL STUDY A Thesis in Statistics by Yasin

More information

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models

Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Mixed models in R using the lme4 package Part 5: Generalized linear mixed models Douglas Bates Madison January 11, 2011 Contents 1 Definition 1 2 Links 2 3 Example 7 4 Model building 9 5 Conclusions 14

More information

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook

THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH. Robert R. SOKAL and F. James ROHLF. State University of New York at Stony Brook BIOMETRY THE PRINCIPLES AND PRACTICE OF STATISTICS IN BIOLOGICAL RESEARCH THIRD E D I T I O N Robert R. SOKAL and F. James ROHLF State University of New York at Stony Brook W. H. FREEMAN AND COMPANY New

More information

Lecture 18: Simple Linear Regression

Lecture 18: Simple Linear Regression Lecture 18: Simple Linear Regression BIOS 553 Department of Biostatistics University of Michigan Fall 2004 The Correlation Coefficient: r The correlation coefficient (r) is a number that measures the strength

More information

3rd Quartile. 1st Quartile) Minimum

3rd Quartile. 1st Quartile) Minimum EXST7034 - Regression Techniques Page 1 Regression diagnostics dependent variable Y3 There are a number of graphic representations which will help with problem detection and which can be used to obtain

More information

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models

R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models R Package glmm: Likelihood-Based Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas user!2017 Reviewing the Linear Model The usual linear model assumptions:

More information

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science

Likelihood Ratio Tests. that Certain Variance Components Are Zero. Ciprian M. Crainiceanu. Department of Statistical Science 1 Likelihood Ratio Tests that Certain Variance Components Are Zero Ciprian M. Crainiceanu Department of Statistical Science www.people.cornell.edu/pages/cmc59 Work done jointly with David Ruppert, School

More information

Chapter 12 - Lecture 2 Inferences about regression coefficient

Chapter 12 - Lecture 2 Inferences about regression coefficient Chapter 12 - Lecture 2 Inferences about regression coefficient April 19th, 2010 Facts about slope Test Statistic Confidence interval Hypothesis testing Test using ANOVA Table Facts about slope In previous

More information

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression

robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression Robust Statistics robustness, efficiency, breakdown point, outliers, rank-based procedures, least absolute regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html

More information

Statistics. Introduction to R for Public Health Researchers. Processing math: 100%

Statistics. Introduction to R for Public Health Researchers. Processing math: 100% Statistics Introduction to R for Public Health Researchers Statistics Now we are going to cover how to perform a variety of basic statistical tests in R. Correlation T-tests/Rank-sum tests Linear Regression

More information

Subject CS1 Actuarial Statistics 1 Core Principles

Subject CS1 Actuarial Statistics 1 Core Principles Institute of Actuaries of India Subject CS1 Actuarial Statistics 1 Core Principles For 2019 Examinations Aim The aim of the Actuarial Statistics 1 subject is to provide a grounding in mathematical and

More information

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information.

STA441: Spring Multiple Regression. This slide show is a free open source document. See the last slide for copyright information. STA441: Spring 2018 Multiple Regression This slide show is a free open source document. See the last slide for copyright information. 1 Least Squares Plane 2 Statistical MODEL There are p-1 explanatory

More information

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as

1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available as ST 51, Summer, Dr. Jason A. Osborne Homework assignment # - Solutions 1. (Rao example 11.15) A study measures oxygen demand (y) (on a log scale) and five explanatory variables (see below). Data are available

More information

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis

Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Statistics 262: Intermediate Biostatistics Regression & Survival Analysis Jonathan Taylor & Kristin Cobb Statistics 262: Intermediate Biostatistics p.1/?? Introduction This course is an applied course,

More information

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form

Outline. Statistical inference for linear mixed models. One-way ANOVA in matrix-vector form Outline Statistical inference for linear mixed models Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark general form of linear mixed models examples of analyses using linear mixed

More information

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one.

Study Sheet. December 10, The course PDF has been updated (6/11). Read the new one. Study Sheet December 10, 2017 The course PDF has been updated (6/11). Read the new one. 1 Definitions to know The mode:= the class or center of the class with the highest frequency. The median : Q 2 is

More information

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2

Multilevel Models in Matrix Form. Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Multilevel Models in Matrix Form Lecture 7 July 27, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Today s Lecture Linear models from a matrix perspective An example of how to do

More information

a. When a data set is not normally distributed, what should you try in order to appropriately make statistical tests on that data?

a. When a data set is not normally distributed, what should you try in order to appropriately make statistical tests on that data? 1 1. Briefly answer the following: a. When a data set is not normally distributed, what should you try in order to appropriately make statistical tests on that data? transformations b. Why are paired samples

More information

Nonparametric Location Tests: k-sample

Nonparametric Location Tests: k-sample Nonparametric Location Tests: k-sample Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota)

More information

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking

Analysis of variance and regression. April 17, Contents Comparison of several groups One-way ANOVA. Two-way ANOVA Interaction Model checking Analysis of variance and regression Contents Comparison of several groups One-way ANOVA April 7, 008 Two-way ANOVA Interaction Model checking ANOVA, April 008 Comparison of or more groups Julie Lyng Forman,

More information

Biological Applications of ANOVA - Examples and Readings

Biological Applications of ANOVA - Examples and Readings BIO 575 Biological Applications of ANOVA - Winter Quarter 2010 Page 1 ANOVA Pac Biological Applications of ANOVA - Examples and Readings One-factor Model I (Fixed Effects) This is the same example for

More information

Survival Regression Models

Survival Regression Models Survival Regression Models David M. Rocke May 18, 2017 David M. Rocke Survival Regression Models May 18, 2017 1 / 32 Background on the Proportional Hazards Model The exponential distribution has constant

More information

Handout 1: Predicting GPA from SAT

Handout 1: Predicting GPA from SAT Handout 1: Predicting GPA from SAT appsrv01.srv.cquest.utoronto.ca> appsrv01.srv.cquest.utoronto.ca> ls Desktop grades.data grades.sas oldstuff sasuser.800 appsrv01.srv.cquest.utoronto.ca> cat grades.data

More information

Sleep data, two drugs Ch13.xls

Sleep data, two drugs Ch13.xls Model Based Statistics in Biology. Part IV. The General Linear Mixed Model.. Chapter 13.3 Fixed*Random Effects (Paired t-test) ReCap. Part I (Chapters 1,2,3,4), Part II (Ch 5, 6, 7) ReCap Part III (Ch

More information

Regression in R. Seth Margolis GradQuant May 31,

Regression in R. Seth Margolis GradQuant May 31, Regression in R Seth Margolis GradQuant May 31, 2018 1 GPA What is Regression Good For? Assessing relationships between variables This probably covers most of what you do 4 3.8 3.6 3.4 Person Intelligence

More information