EE290H F05. Spanos. Lecture 5: Comparison of Treatments and ANOVA

Similar documents
Table 1: Fish Biomass data set on 26 streams

Assignment 9 Answer Keys

Answer Keys to Homework#10

Regression Analysis. Simple Regression Multivariate Regression Stepwise Regression Replication and Prediction Error EE290H F05

Analysis of Variance. Source DF Squares Square F Value Pr > F. Model <.0001 Error Corrected Total

SMA 6304 / MIT / MIT Manufacturing Systems. Lecture 10: Data and Regression Analysis. Lecturer: Prof. Duane S. Boning

1 Introduction to One-way ANOVA

The goodness-of-fit test Having discussed how to make comparisons between two proportions, we now consider comparisons of multiple proportions.

UNIVERSITY OF CALIFORNIA College of Engineering Department of Electrical Engineering and Computer Sciences. PROBLEM SET No. 5 Official Solutions

INTRODUCTION TO DESIGN AND ANALYSIS OF EXPERIMENTS

Variance Decomposition and Goodness of Fit

Lecture 3: Inference in SLR

Analysis of Variance: Part 1

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Chap The McGraw-Hill Companies, Inc. All rights reserved.

df=degrees of freedom = n - 1

Contents. Acknowledgments. xix

Correlation and the Analysis of Variance Approach to Simple Linear Regression

Design & Analysis of Experiments 7E 2009 Montgomery

Recall that a measure of fit is the sum of squared residuals: where. The F-test statistic may be written as:

Data analysis and Geostatistics - lecture VII

Simple linear regression

Steps to take to do the descriptive part of regression analysis:

DESAIN EKSPERIMEN Analysis of Variances (ANOVA) Semester Genap 2017/2018 Jurusan Teknik Industri Universitas Brawijaya

Analysis Of Variance Compiled by T.O. Antwi-Asare, U.G

Business Statistics. Lecture 10: Course Review

Chapter 8 Handout: Interval Estimates and Hypothesis Testing

20.0 Experimental Design

Paper: ST-161. Techniques for Evidence-Based Decision Making Using SAS Ian Stockwell, The Hilltop UMBC, Baltimore, MD

Regression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.

Comparing Several Means: ANOVA

General Linear Model (Chapter 4)

Independent Samples ANOVA

36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression

Remedial Measures, Brown-Forsythe test, F test

COSC 341 Human Computer Interaction. Dr. Bowen Hui University of British Columbia Okanagan

Design of Experiments. Factorial experiments require a lot of resources

2.830 Homework #6. April 2, 2009

Chapter 7: Statistical Inference (Two Samples)

Practice Problems Section Problems

Variance Decomposition in Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Oct Simple linear regression. Minimum mean square error prediction. Univariate. regression. Calculating intercept and slope

Inferences for Regression

Workshop Research Methods and Statistical Analysis

EXST Regression Techniques Page 1. We can also test the hypothesis H :" œ 0 versus H :"

Analysis of Covariance

MEMORIAL UNIVERSITY OF NEWFOUNDLAND DEPARTMENT OF MATHEMATICS AND STATISTICS FINAL EXAM - STATISTICS FALL 1999

STATISTICS 479 Exam II (100 points)

Introduction to Business Statistics QM 220 Chapter 12

Lecture 14. Analysis of Variance * Correlation and Regression. The McGraw-Hill Companies, Inc., 2000

Lecture 14. Outline. Outline. Analysis of Variance * Correlation and Regression Analysis of Variance (ANOVA)

Econometrics. 4) Statistical inference

Statistics 512: Solution to Homework#11. Problems 1-3 refer to the soybean sausage dataset of Problem 20.8 (ch21pr08.dat).

Visual interpretation with normal approximation

Annoucements. MT2 - Review. one variable. two variables

Design of Engineering Experiments Part 2 Basic Statistical Concepts Simple comparative experiments

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

Plasma Etch Tool Gap Distance DOE Final Report

INTERVAL ESTIMATION AND HYPOTHESES TESTING

1 Independent Practice: Hypothesis tests for one parameter:

STA 101 Final Review

Battery Life. Factory

Hypothesis Testing. ECE 3530 Spring Antonio Paiva

Exercise I.1 I.2 I.3 I.4 II.1 II.2 III.1 III.2 III.3 IV.1 Question (1) (2) (3) (4) (5) (6) (7) (8) (9) (10) Answer

22s:152 Applied Linear Regression. Chapter 8: 1-Way Analysis of Variance (ANOVA) 2-Way Analysis of Variance (ANOVA)

One-Way Analysis of Variance. With regression, we related two quantitative, typically continuous variables.

AMS7: WEEK 7. CLASS 1. More on Hypothesis Testing Monday May 11th, 2015

10/4/2013. Hypothesis Testing & z-test. Hypothesis Testing. Hypothesis Testing

Statistical Inference. Part IV. Statistical Inference

STAT 350: Summer Semester Midterm 1: Solutions

3. Linear Regression With a Single Regressor

SAS Procedures Inference about the Line ffl model statement in proc reg has many options ffl To construct confidence intervals use alpha=, clm, cli, c

Math 423/533: The Main Theoretical Topics

Vladimir Spokoiny Design of Experiments in Science and Industry

Lecture 1 Linear Regression with One Predictor Variable.p2

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

22s:152 Applied Linear Regression. 1-way ANOVA visual:

Mathematical Notation Math Introduction to Applied Statistics

Orthogonal contrasts for a 2x2 factorial design Example p130

Sleep data, two drugs Ch13.xls

What If There Are More Than. Two Factor Levels?

Multiple Regression: Example

Keller: Stats for Mgmt & Econ, 7th Ed July 17, 2006

IES 612/STA 4-573/STA Winter 2008 Week 1--IES 612-STA STA doc

2.830J / 6.780J / ESD.63J Control of Manufacturing Processes (SMA 6303) Spring 2008

ST505/S697R: Fall Homework 2 Solution.

ANOVA CIVL 7012/8012

Week 7.1--IES 612-STA STA doc

Topic 23: Diagnostics and Remedies

Background to Statistics

Introduction to Regression

Institutionen för matematik och matematisk statistik Umeå universitet November 7, Inlämningsuppgift 3. Mariam Shirdel

Diagnostics and Remedial Measures

Chapter 16. Simple Linear Regression and Correlation

Lecture 14: ANOVA and the F-test

Lectures 5 & 6: Hypothesis Testing

Basic Statistics. Resources. Statistical Tables Murdoch & Barnes. Scientific Calculator. Minitab 17.

Using SPSS for One Way Analysis of Variance

Linear Combinations of Group Means

Analysis of Variance

Transcription:

1

Design of Experiments in Semiconductor Manufacturing Comparison of Treatments which recipe works the best? Simple Factorial Experiments to explore impact of few variables Fractional Factorial Experiments to explore impact of many variables Regression Analysis to create analytical expressions that model process behavior Response Surface Methods to visualize process performance over a range of input parameter values

Design of Experiments Objectives: Compare Processing Recipes Find the Parameters that Matter Create Models to Predict Process Results Problems: Experimental / Measurement Error Confusion of Correlation with Causation Complexity of the Effects we study 3

Problems Solved Compare Recipes Choose the recipe that gives the best results Organize experiments to facilitate the analysis Use experimental results to build process models Use models to optimize the process 4

Comparison of Treatments Internal and External References The Importance of Independence Blocking and Randomization Analysis of Variance 5

Using an External Reference to make a Decision An external reference can be used to decide whether a new observation is different than a group of old observations. Example: Create a comparison procedure for lot yield monitoring. Do it without "statistics". Use external reference data (historical data from the same process, but not from the same experiment): 6

Example: Using an External Reference To compare the difference between the average of successive groups of ten lots, I build the histogram from the reference data: Points generated by the good process that fail the test (type I error) Each new point can then be judged on the basis of the reference data. The only assumption here is that the reference data is relevant to my test! 7

Using an Internal Reference... We could generate an "internal" reference distribution from the very data we are comparing. Sampling must be random, so that the data is independently distributed. Independence would allow us to use statistics such as the arithmetic average or the sum of squares. Internal references are based on Randomization. 8

Randomization Example Is recipe A different than recipe B? 660 A A B Recipe Type Etch Rate 650 640 X A X B B 60 630 640 650 Etch Rate 660 630 0 4 6 8 Sample 10 1 9

Randomization Example - cont. There are many ways to decide this... 1. External reference distribution (based on old data.). Assumed, approximate external reference distr. (such as student-t, normal, etc). 3. Internal reference distribution. 4. "Distribution free" tests. Options, 3 and 4 depend on the assumption that the samples are independently distributed. 10

Randomization Example - cont. If there was no difference between A and B, then let me assume that I just have one out of the 10!/5!5! (5) possible arrangements of labels A and B. I use the data to calculate the differences in means for all the combinations: 11

The Origin of the student-t Distribution The student-t distribution was, in fact, defined to approximate such randomized distributions, when the parent distribution is normal! t 0 = (y B - y A ) - (μ A - μ B ) s 1 na + 1 n B For the etch example, t 0 = 0.44 and Pr (t > t 0 ) = 0.34 Randomized Distribution = 0.33 1

Example in Blocking Compare recipes A and B on five machines. If there are inherent differences from one machine to the other, what scheme would you use? Random A A A B A B A B B B Blocked A B B A B A A B B A 13

Example in Blocking - cont. With the blocked scheme, we could calculate the A-B difference for each machine. The machine-to-machine average of these differences could be randomized. d = ±d 1±d ±d 3 ±d 4 ±d 5 5 d - δ s d / n ~ t n-1 In In general, randomize what you don't know and block what you do do know. 14

Analysis of Variance Recipe Type D C B X A X A X A A X A 610 60 630 640 Etch Rate 650 660 Your Question: Are the four treatments the same or or not? The Statistician's Question: Are the discrepancies between the groups greater than the variation within each group? 15

Calculations for our Example (y t - y) i=1 i= i=3 i=4 i=5 Avg s t ν t 1: 650 648 63 645 641 643.0 0.80 4 5.00 : 645 650 638 643 640 643.0 86.80 4 5.00 3: 63 68 630 60 618 63.80 104.80 4 07.36 4: 645 640 648 64 638 64.60 63.0 4 19.36 s R = s T = s T s R = 16

Variation Within Treatment Groups First, lets assume that all groups have the same spread. Lets also assume that each group is normally distributed. The following is used to estimate their common σ: n S t = Σ t (y tj - y t ) j=1 s R = ν 1s 1 +ν s +...+ ν k s k ν 1 + ν +...+ ν k s t = S t n t -1 = S R N - k = S R ν R This is an estimate of the unknown, within group σ It is called the within treatment mean square 17

Variation Between Treatment Groups Let us now form Ho by assuming that all the groups have the same mean. Assuming that there are no real differences between groups, a second estimate of s T would be: s T = k Σ t=1 n t (y t - y) k - 1 = S T νt This is the between treatment mean square If If all all the treatments are the same, then the within and between treatment mean squares are estimating the same number! 18

What if the Treatments are different? If the treatments are different then: s T estimates σ + Σ n t τ t k t = 1 where τt μt - μ / (k - 1) In In other words, the between treatment mean square is is inflated by by a factor proportional to to the spread among the treatments! 19

Final Test for Treatment Significance Therefore, the hypothesis of equivalence is rejected if: s T s R is significantly greater than 1.0 This can be formalized since: s T s R ~ F k-1, N-k 0

More Sums of Squares A measure of the overall variation: k n t S D = Σ Σ (y tj -y) t=1 j=1 s D = S D N - 1 = S D ν D Obviously (actually, this is not so obvious, but it can be proven): S D = S T + S R and ν D = ν T + ν R 1

ANOVA Table Source Sum of DFs of Var squares Mean square between ST vt (k-1) st within SR vr (N-k) sr total SD vd (N-1) sd

ANOVA Table (full) Source Sum DFs of Var of sq Mean sq average SA va ( 1 ) sa between ST vt (k-1) st within SR vr (N-k) sr total S v ( N ) 3

Anova for our example... Data File: CompEtch Source Sum of Squares Deg. of Freedom Mean Squares F-Ratio Prob>F Between Recipe 1.38e+3 3 4.61e+ 1.61e+1 4.9e-5 Error 4.58e+ 16.86e+1 Total 1.84e+3 19 4

Decomposition of Observations Y = A + T + R In Vector Form: yti y yt - y yti -yt... =. +. +....... N 1 k-1 N-k The term degrees of freedom refers to the dimensionality of the space each vector is free to move into. 5

Geometric Interpretation of ANOVA Y = A + D Easy to prove that A D. D = R +T Easy to prove that R T and A R. Y Y D R A T 6

ANOVA Model and Diagnostics y ti = μ t + e ti e ti ~ N (0, σ ) So, the "sufficient statistics" are: as estimators of: s R, y 1, y,..., y k σ, μ 1, μ,..., μ k For our example: y t This model describes the data sufficiently. Its values are the sufficient statistics of the dataset. A: 643.0 B: 643.0 C: 63.80 D: 64.60 s R 8.6 According to to this model, the residuals are IIND. How does one verify that? 7

ANOVA Example: Poly Deposition Thickness in nm 00 190 180 170 160 150 140 130 10 110 100 90 80 70 60 50 Analysis of Variance E B C D A F Recipe Are these recipes significantly different? Source Model Error C Total DF 5 7 3 Sum of Squares 6970 58419 85388 Mean Square 5394 57 F Ratio 0.96 Prob > F 0.0000 8

Residual thick. Residual Plots: -40-30 -0-10 0 10 0 30 40 Residual thick. -40-30 -0-10 0 10 0 30 40 Residual thick. -40-0 -10 0 10 0 30 40 Residual thick. -40-30 -0-10 0 10 0 30 40 9

Residual Plots (cont): R e s i d u a l t h i c ḳ 90 80 70 60 50 40 30 0 10 0-10 -0-30 -40 E B C D E F Deposition Recipe V1 Wafer Vendor V 30

ANOVA Summary Plot Originals Construct ANOVA table Are the treatment effects significant? Plot residuals versus: treatment group mean time sequence other? ANOVA is the basic tool behind most empirical modeling techniques. 31