Lecture 4 Topic 3: General linear models (GLMs), the fundamentals of the analysis of variance (ANOVA), and completely randomized designs (CRDs)

Similar documents
Comparing Means: t-tests for Two Independent Samples

Suggested Answers To Exercises. estimates variability in a sampling distribution of random means. About 68% of means fall

Why ANOVA? Analysis of Variance (ANOVA) One-Way ANOVA F-Test. One-Way ANOVA F-Test. One-Way ANOVA F-Test. Completely Randomized Design

MINITAB Stat Lab 3

If Y is normally Distributed, then and 2 Y Y 10. σ σ

1. The F-test for Equality of Two Variances

SIMPLE LINEAR REGRESSION

Chapter 12 Simple Linear Regression

Alternate Dispersion Measures in Replicated Factorial Experiments

Social Studies 201 Notes for November 14, 2003

Source slideplayer.com/fundamentals of Analytical Chemistry, F.J. Holler, S.R.Crouch. Chapter 6: Random Errors in Chemical Analysis

A Bluffer s Guide to... Sphericity

Social Studies 201 Notes for March 18, 2005

Z a>2 s 1n = X L - m. X L = m + Z a>2 s 1n X L = The decision rule for this one-tail test is

μ + = σ = D 4 σ = D 3 σ = σ = All units in parts (a) and (b) are in V. (1) x chart: Center = μ = 0.75 UCL =

Regression. What is regression? Linear Regression. Cal State Northridge Ψ320 Andrew Ainsworth PhD

NEGATIVE z Scores. TABLE A-2 Standard Normal (z) Distribution: Cumulative Area from the LEFT. (continued)

[Saxena, 2(9): September, 2013] ISSN: Impact Factor: INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY

STATISTICAL SIGNIFICANCE

Comparison of independent process analytical measurements a variographic study

USING NONLINEAR CONTROL ALGORITHMS TO IMPROVE THE QUALITY OF SHAKING TABLE TESTS

( ) ( Statistical Equivalence Testing

Lecture 7: Testing Distributions

Acceptance sampling uses sampling procedure to determine whether to

CHAPTER 6. Estimation

ON THE APPROXIMATION ERROR IN HIGH DIMENSIONAL MODEL REPRESENTATION. Xiaoqun Wang

Standard Guide for Conducting Ruggedness Tests 1

PARAMETERS OF DISPERSION FOR ON-TIME PERFORMANCE OF POSTAL ITEMS WITHIN TRANSIT TIMES MEASUREMENT SYSTEM FOR POSTAL SERVICES

Lecture 10 Filtering: Applied Concepts

Optimal Coordination of Samples in Business Surveys

By Xiaoquan Wen and Matthew Stephens University of Michigan and University of Chicago

Topic 6. Two-way designs: Randomized Complete Block Design [ST&D Chapter 9 sections 9.1 to 9.7 (except 9.6) and section 15.8]

Chapter 2 Sampling and Quantization. In order to investigate sampling and quantization, the difference between analog

HSC PHYSICS ONLINE KINEMATICS EXPERIMENT

COMPARISONS INVOLVING TWO SAMPLE MEANS. Two-tail tests have these types of hypotheses: H A : 1 2

Lecture 8: Period Finding: Simon s Problem over Z N

Inference for Two Stage Cluster Sampling: Equal SSU per PSU. Projections of SSU Random Variables on Each SSU selection.

Finite Element Analysis of a Fiber Bragg Grating Accelerometer for Performance Optimization

Gain and Phase Margins Based Delay Dependent Stability Analysis of Two- Area LFC System with Communication Delays

EE Control Systems LECTURE 14

Clustering Methods without Given Number of Clusters

On the Robustness of the Characteristics Related to (M\M\1) ( \FCFS) Queue System Model

PhysicsAndMathsTutor.com

A tutorial on conformal prediction

Stratified Analysis of Probabilities of Causation

CHAPTER 8 OBSERVER BASED REDUCED ORDER CONTROLLER DESIGN FOR LARGE SCALE LINEAR DISCRETE-TIME CONTROL SYSTEMS

DYNAMIC MODELS FOR CONTROLLER DESIGN

Standard normal distribution. t-distribution, (df=5) t-distribution, (df=2) PDF created with pdffactory Pro trial version

Asymptotics of ABC. Paul Fearnhead 1, Correspondence: Abstract

Nonlinear Single-Particle Dynamics in High Energy Accelerators

NCAAPMT Calculus Challenge Challenge #3 Due: October 26, 2011

7.2 INVERSE TRANSFORMS AND TRANSFORMS OF DERIVATIVES 281

The Influence of the Load Condition upon the Radial Distribution of Electromagnetic Vibration and Noise in a Three-Phase Squirrel-Cage Induction Motor

Research Article Reliability of Foundation Pile Based on Settlement and a Parameter Sensitivity Analysis

Some Approaches to the Analysis of a Group of Repeated Measurements Experiment on Mahogany Tree with Heteroscedustic Model

NONISOTHERMAL OPERATION OF IDEAL REACTORS Plug Flow Reactor

ARTICLE Overcoming the Winner s Curse: Estimating Penetrance Parameters from Case-Control Data

Lecture 21. The Lovasz splitting-off lemma Topics in Combinatorial Optimization April 29th, 2004

L Exercise , page Exercise , page 523.

Streaming Calculations using the Point-Kernel Code RANKERN

White Rose Research Online URL for this paper: Version: Accepted Version

Unified Design Method for Flexure and Debonding in FRP Retrofitted RC Beams

Bogoliubov Transformation in Classical Mechanics

CHAPTER 4 DESIGN OF STATE FEEDBACK CONTROLLERS AND STATE OBSERVERS USING REDUCED ORDER MODEL

Suggestions - Problem Set (a) Show the discriminant condition (1) takes the form. ln ln, # # R R

( ) y = Properties of Gaussian curves: Can also be written as: where

SMALL-SIGNAL STABILITY ASSESSMENT OF THE EUROPEAN POWER SYSTEM BASED ON ADVANCED NEURAL NETWORK METHOD

Lecture 7 Randomized Complete Block Design (RCBD) [ST&D sections (except 9.6) and section 15.8]

Dimensional Analysis A Tool for Guiding Mathematical Calculations

After the invention of the steam engine in the late 1700s by the Scottish engineer

Statistics and Data Analysis

Control Systems Analysis and Design by the Root-Locus Method

A Simple Approach to Synthesizing Naïve Quantized Control for Reference Tracking

Real-Time Identification of Sliding Friction Using LabVIEW FPGA

A Study on Simulating Convolutional Codes and Turbo Codes

THE THERMOELASTIC SQUARE

SIMPLIFIED MODEL FOR EPICYCLIC GEAR INERTIAL CHARACTERISTICS

Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis Chapter 7

Network based Sensor Localization in Multi-Media Application of Precision Agriculture Part 2: Time of Arrival

APPLICATION OF THE SINGLE IMPACT MICROINDENTATION FOR NON- DESTRUCTIVE TESTING OF THE FRACTURE TOUGHNESS OF NONMETALLIC AND POLYMERIC MATERIALS

THE EXPERIMENTAL PERFORMANCE OF A NONLINEAR DYNAMIC VIBRATION ABSORBER

S_LOOP: SINGLE-LOOP FEEDBACK CONTROL SYSTEM ANALYSIS

Efficient Methods of Doppler Processing for Coexisting Land and Weather Clutter

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

UNIT 15 RELIABILITY EVALUATION OF k-out-of-n AND STANDBY SYSTEMS

Molecular Dynamics Simulations of Nonequilibrium Effects Associated with Thermally Activated Exothermic Reactions

PLS205 Lab 2 January 15, Laboratory Topic 3

ex 2. Questionnaires: main treatments. Interviewers/questionnaire: subtreatments

Digital Control System

Real-time identification of sliding friction using LabVIEW FPGA

Codes Correcting Two Deletions

ESTIMATION OF THE HEAT TRANSFER COEFFICIENT IN THE SPRAY COOLING OF CONTINUOUSLY CAST SLABS

A NEW LOAD MODEL OF THE PEDESTRIANS LATERAL ACTION

SERIES COMPENSATION: VOLTAGE COMPENSATION USING DVR (Lectures 41-48)

Design spacecraft external surfaces to ensure 95 percent probability of no mission-critical failures from particle impact.

Confidence Intervals and Hypothesis Testing of a Population Mean (Variance Known)

Multi-dimensional Fuzzy Euler Approximation

Testing the Equality of Two Pareto Distributions

into a discrete time function. Recall that the table of Laplace/z-transforms is constructed by (i) selecting to get

Reliability Analysis of Embedded System with Different Modes of Failure Emphasizing Reboot Delay

Transcription:

Lecture 4 Topic 3: General linear model (GLM), the fundamental of the analyi of variance (ANOVA), and completely randomized deign (CRD) The general linear model One population: An obervation i explained a a mean plu a random deviation ε i (error): Y i = µ + ε i The ε i ' are aumed to be from a population of uncorrelated ε' with mean zero. Independence among ε' i aured by random ampling.

Two population: Each obervation i explained a a grand mean plu an effect of it group (i.e. treatment) plu a random deviation ε i (error): Y ij = µ + τ i + ε ij An equivalent expreion of thi model: µ + τ = µ and µ + τ = µ τ + τ = 0 Yij =Y.. + (Y i. -Y..) + (Yij - Y i.) Example: Imagine an experiment with 0 tomato plant (yum!), each in it own pot. 5 of the pot receive fertilizer and the other 5 do not. The total yield of each plant (in kg) i recorded and the data are preented below: Plant Fertilized Not Fertilized.05..65 0.95 3.76.3 4.08.33 5.84.5 Treatment Mean General Mean Treatment Effect.876.9.534 0.34-0.34 Under the aumption of a general linear model, the yield of each tomato plant in the above experiment ha the following general form: yij = µ + τ + ε j ij For Plant 3 receiving fertilizer, the equation look like thi: y y 3, Fert = µ + τ Fert + ε 3, Fert 3, Fert =.534 + 0.34 0.6 =.76

More than two population (The Model I or fixed model ANOVA). Treatment effect are additive and fixed by the reearcher. τ i = 0 à H 0 : τ = = τ t = 0 H : Some τ i 0. Error are random, independent, and normally ditributed with a common variance about a zero mean. 3. In the cae of a fale H 0 (i.e. ome τ i 0), there will be an additional component of variation due to treatment effect equal to: τ i r t "Significant relative to what?" Significant relative to error. If the effect due to treatment (i.e. ignal) i found to be ignificantly larger than the fluctuation among obervation due to error (i.e noie), the treatment effect i aid to be real and ignificant. 3

The F ditribution From a normally ditributed population (or from two population with equal variance σ ):. Sample n item and calculate their variance. Sample n item and calculate their variance 3. Contruct a ratio of thee two ample variance ( ) / Thi ratio of thi tatitic will be cloe to and it expected ditribution i called the F- ditribution, characterized by two value for df (df = n, df = n )..0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0. 0. 0.0 F(,40) F(8,6) F(6,8) F Figure Three example F-ditribution Value in an F table (Table A6) repreent the area under the curve to the right of the given F- value, with df and df. 3 4 ( n ) σ i ditributed according to χ n i ditributed according to F n, n F (,9), α=0.05 = (t 9, α/ ) ß Analogou to the relationhip 5. =.6 between χ and Z. 4

Example: Pull 0 obervation at random from each of two population. Now tet H 0 : σ = v. H : σ (a two-tailed tet): σ Interpretation: The ratio ( ) F α = 0.05,[ df= 9, df= 9] = 4.03 σ /, taken from ample of 0 individual from normally ditributed population with equal variance, i expected to be larger than 4.03 ( F α / = 0.05,[9,9] ) or lower than 0.4 ( F α / = 0.975,[9,9] ) by chance only 5% of the time. Teting the hypothei of equality of two mean The ratio between two etimate of σ can alo be ued to tet difference between mean: H 0 : µ = µ veru H : µ µ How can we ue variance to tet the difference between mean? By being creative in how we obtain etimate of σ. F = etimate of σ from ample mean etimate of σ from individual The denominator i an etimate of σ provided by the individual within a ample. If there are multiple ample, it i a weighted average of thoe ample variance. The numerator i an etimate of σ provided by the mean among ample. Recall: =, o ( ) n Y n = n Y ( n) F = among = within n Y The fundamental premie underlying ANOVA: When two population have different mean (but the ame variance), the etimate of σ baed on ample mean will include a contribution attributable to the difference among population mean and F will be higher than expected by chance. 5

Example: Yield (00 lb/acre) of wheat varietie and from plot to which the varietie were randomly aigned: Varietie Replication Y i. Y i. i 9 4 5 7 0 85 Y. = 7 6.5 3 9 9 8 00 Y. = 0 4.0 Y.. = 85 Y.. = 8.5 Treatment t = ; Replication n = 5 Begin by auming that the two population have the ame (unknown) variance σ.. Etimate the average variance within ample (the experimental error): = j (Y j Y.) n, = j (Y j Y.) n pooled = (n ) + (n ) = 4*6.5 + 4* 4.0 / (4 + 4) = 5.5 (n )+ (n ) within. Etimate the variance between (or among) ample: Y = t i= (Y i. Y..) t = [(7-8.5) + (0-8.5) ] / (-) = 4.5 n Y = 5 * 4.5 =.5 To tet H 0, we form a ratio of thee two etimate: between F = b / w =.5 / 5.5 = 4.9 Under our aumption (normality, equal variance), thi ratio i ditributed according to an F (t-, t(n-)) = F (,8) ditribution. From Table A.6, we find F 0.05,(,8) = 5.3. F calc = 4.9 < 5.3 = F crit SO, we fail to reject H 0 at α = 0.05. An F value of 4.9 or larger happen jut by chance about 7% of the time for thee degree of freedom. 6

Modern tatitic began in the mind of Ronald Fiher, the firt to recognize that variation i not jut noie drowning ignal, at bet a nuiance to be ignored. Variance itelf i a valid object of tudy, a fingerprint that provide great inight into the mechanim of natural phenomena. In hi word: "The population which are the object of tatitical tudy alway diplay variation in one or more repect. To peak of tatitic a the tudy of variation alo erve to emphaize the contrat between the aim of modern tatitician and thoe of their predeceor. For until comparatively recent time, the vat majority if worker in thi field appear to have had no other aim than to acertain aggregate, or average, value. The variation itelf wa not an object of tudy, but wa recognized rather a a troubleome circumtance which detracted from the value of the average.from the modern point of view, the tudy of the caue of variation of any variable phenomenon, from the yield of wheat to the intellect of [people], hould be begun by the examination and meaurement of the variation which preent itelf." R.A. Fiher Statitical Method for Reearch Worker (95) 7

ANOVA: Single factor deign The Completely Randomized Deign (CRD) CRD i the baic ANOVA deign A ingle factor i varied to form the different treatment Thee treatment are applied randomly to experimental unit There are a total of n = rt independent experimental unit in the experiment H 0 : µ = µ = = µ t veru H : Not all µ i are equal. The reult of the analyi are uually ummarized in an ANOVA table: Source df SS Definition SS MS F Total n - ( Y ij Y.. ) TSS i, j Treatment t r ( Y i. Y.. ) SST SST/(t-) MST/MSE i Error t(r-) = n - t ( Y ij Y i.) TSS - SST SSE/(n-t) i, j The mean quare for error (MSE): The average diperion of the obervation around their repective group mean. It i a valid etimate of a common σ, the experimental error, if the aumption of equal variance i true. The mean quare for treatment (MST): An independent etimate of σ, when the null hypothei i true. The F tet: If there are difference among treatment mean, there will be an additional ource of variation in the experiment due to treatment effect equal to r τ i /(t-). F = MST/MSE Expected MST MSE = σ / (t ) σ + r τ i The F-tet i enitive to the preence of the added component of variation due to treatment effect. In other word, ANOVA permit u to tet whether there are any nonzero treatment effect. 8

Example Inoculation of clover with Rhizobium train [ST&D Table 7.] Treatment 3DOK 3DOK5 3DOK4 3DOK7 3DOK3 Compoite Rep 9.4 7.7 7.0 0.7 4.3 7.3 Rep 3.6 4.8 9.4.0 4.4 9.4 Rep 3 7.0 7.9 9. 0.5.8 9. Rep 4 3. 5..9 8.8.6 6.9 Rep 5 33.0 4.3 5.8 8.6 4. 0.8 Mean 8.8 4.0 4.6 9.9 3.3 8.7 Variance 33.64 4.7 6.94.8.04.56 t = 6, r = 5, overall mean = 9.88 The ANOVA table for thi experiment: Source df SS MS F Treatment 5 847.05 69.4 4.37** Error 4 8.93.79 Total 9 9.98. The mean quare error (MSE =.79) i jut the pooled variance or the average of variance within each treatment (i.e. MSE = Σ i / t).. The F value (4.37) indicate that the variation among treatment i over 4 time larger than the mean variation within treatment. 4.37 > F crit = F (5,4),0.05 =.6, o we reject H 0 9

Expected mean quare and F tet EMS: Algebraic expreion which pecify the underlying model parameter etimated by the calculated mean quare and which are ued to determine the appropriate error term for F tet. EMS table for thi one-way (CRD) claification experiment, featuring t treatment and r replication: Source df MS EMS Trtmt t- MST σ ε + r Error t(r-) MSE σ ε τ t The appropriate tet tatitic (F) i a ratio of mean quare that i choen uch that the expected value of the numerator differ from the expected value of the denominator only by the pecific factor being teted. 0

Teting the aumption aociated with ANOVA. Independence of error: Guaranteed by the random allocation of experimental unit.. Normal ditribution of error: Shapiro-Wilk tet. 3. Homogeneity of variance: Several method are available to tet the aumption that variance i the ame within each of the group defined by the independent factor. Levene' Tet: An ANOVA of the abolute value of the reidual. Y ij = µ + τ i + ε ij. The reidual (ε ij ) are the deviation from the treatment mean. Original data Reidual Treatment A B C Treatment A B C Rep 8 7 6 Rep Rep 9 5 3 Rep 0 - Rep 3 5 3 5 Rep 3 - - Rep 4 6 5 Rep 4-0 - Average 7 5 4 Average 0 0 0 Abolute Value of Reidual Treatment A B C Rep Rep 0 Rep 3 Rep 4 0 Average.5.5

Advantage of the CRD. Simple deign. Can eaily accommodate unequal replication per treatment 3. Lo of information due to miing data i mall 4. Maximum d.f. for etimating the experimental error 5. Can accommodate unequal variance, uing a Welch' variance-weighted ANOVA The diadvantage The experimental error include all the variation in the ytem except for the component due excluively to the treatment. Power The power of a tet i the probability of detecting a nonzero treatment effect. To calculate the power of the F tet in an ANOVA, ue Pearon and Hartley' power function chart (953, Biometrika 38:-30). To begin, calculate φ: φ = r MSE τ t i Thing to notice:. More replication lead to higher φ (and higher power).. Le error in the model (MSE) lead to higher φ. 3. Larger treatment effect (τ i, our "detection ditance") lead to higher φ.

Example: Suppoe an experiment ha t = 6 treatment with r = replication each. Given the MSE and the required α = 5%, you calculate φ =.75. φ = r MSE τ t i To find the power aociated with thi φ, ue Chart v = t- = 5 and the et of curve correponding to α = 5%. Select curve v = t(r-) = 6. The height of thi curve correponding to the abcia of φ =.75 i the power of the tet. In thi cae, the power i lightly greater than 0.55. 3

Sample ize To calculate the number of replication for a given α and deired power: ) Specify the contant ) Start with an arbitrary r to compute φ 3) Ue the appropriate chart to find the power 4) Iterate the proce until a minimum r value i found which atifie the required power for a given α level. We can implify the general power formula if we aume all τ i are zero except the two extreme treatment effect (let' call them τ K and τ L, o that d = µ K - µ L : φ = d * r t * MSE Example: Suppoe that 6 treatment will be involved in a tudy and the anticipated difference between the extreme mean i 5 unit. What i the required ample ize o that thi difference will be detected at α = % and power = 90%, knowing that σ =? (note, t = 6, α = 0.0, β = 0.0, d = 5, and MSE = ). φ = d * r t * MSE r df φ (-β) for α=% 6(-)= 6.77 0. 3 6(3-)=.7 0.7 4 6(4-)= 8.50 0.93 Thu 4 replication are required for each treatment to atify the required condition. 4