Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University

Similar documents
Experimental design. Matti Hotokka Department of Physical Chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

Chemometrics. Matti Hotokka Physical chemistry Åbo Akademi University

-However, this definition can be expanded to include: biology (biometrics), environmental science (environmetrics), economics (econometrics).

Application of mathematical, statistical, graphical or symbolic methods to maximize chemical information.

Experimental Design and Optimization

Response Surface Methodology

Response Surface Methodology:

Taguchi Method and Robust Design: Tutorial and Guideline

Practical Statistics for the Analytical Scientist Table of Contents

Basic Statistics. 1. Gross error analyst makes a gross mistake (misread balance or entered wrong value into calculation).

Preface Introduction to Statistics and Data Analysis Overview: Statistical Inference, Samples, Populations, and Experimental Design The Role of

Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 3rd Edition

Validation of an Analytical Method

RESPONSE SURFACE MODELLING, RSM

Introduction to the Design and Analysis of Experiments

Experimental Design Matrix of Realizations for Optimal Sensitivity Analysis

MATH602: APPLIED STATISTICS

ASEAN GUIDELINES FOR VALIDATION OF ANALYTICAL PROCEDURES

TWO-LEVEL FACTORIAL EXPERIMENTS: IRREGULAR FRACTIONS

ON REPLICATION IN DESIGN OF EXPERIMENTS

8 RESPONSE SURFACE DESIGNS

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

OF ANALYSIS FOR DETERMINATION OF PESTICIDES RESIDUES IN FOOD (CX/PR 15/47/10) European Union Competence European Union Vote

SIX SIGMA IMPROVE

Detection and quantification capabilities

Optimal Selection of Blocked Two-Level. Fractional Factorial Designs

Robust Design: An introduction to Taguchi Methods

Analytical Performance & Method. Validation

Moment Aberration Projection for Nonregular Fractional Factorial Designs

Module III Product Quality Improvement. Lecture 4 What is robust design?

CHEM 3420 /7420G Instrumental Analysis

Taguchi Design of Experiments

Test 3 Practice Test A. NOTE: Ignore Q10 (not covered)

Calibration (The Good Curve) Greg Hudson EnviroCompliance Labs, Inc.

Objective Experiments Glossary of Statistical Terms

CHAPTER 6 A STUDY ON DISC BRAKE SQUEAL USING DESIGN OF EXPERIMENTS

R 2 and F -Tests and ANOVA

Passing-Bablok Regression for Method Comparison

Harris: Quantitative Chemical Analysis, Eight Edition CHAPTER 03: EXPERIMENTAL ERROR

DESIGN AND ANALYSIS OF EXPERIMENTS Third Edition

Assignment 10 Design of Experiments (DOE)

1 Mathematics and Statistics in Science

Quantitative Analysis of Financial Markets. Summary of Part II. Key Concepts & Formulas. Christopher Ting. November 11, 2017

Optimization of Muffler and Silencer

Harris: Quantitative Chemical Analysis, Eight Edition CHAPTER 03: EXPERIMENTAL ERROR

Method Validation. Role of Validation. Two levels. Flow of method validation. Method selection

SOME NEW THREE-LEVEL ORTHOGONAL MAIN EFFECTS PLANS ROBUST TO MODEL UNCERTAINTY

CEM 333 Instrumental Analysis

LOOKING FOR RELATIONSHIPS

Response Surface Methodology IV

Multidisciplinary System Design Optimization (MSDO)

The New MDL Procedure How To s. Presented by: Marcy Bolek - Alloway

Signal, Noise, and Detection Limits in Mass Spectrometry

Two-Level Designs to Estimate All Main Effects and Two-Factor Interactions

DOE Wizard Screening Designs

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2015 Notes

Math 423/533: The Main Theoretical Topics

Stat 5303 (Oehlert): Models for Interaction 1

Design of Screening Experiments with Partial Replication

IE 361 Module 18. Reading: Section 2.5 Statistical Methods for Quality Assurance. ISU and Analytics Iowa LLC

2 Analysis of Full Factorial Experiments

Linear Regression Analysis for Survey Data. Professor Ron Fricker Naval Postgraduate School Monterey, California

Mixture Designs Based On Hadamard Matrices

Data Analysis, Standard Error, and Confidence Limits E80 Spring 2012 Notes

The Theory of HPLC. Quantitative and Qualitative HPLC


Data Analysis III. CU- Boulder CHEM-4181 Instrumental Analysis Laboratory. Prof. Jose-Luis Jimenez Spring 2007

Performance characteristics of analytical tests

Stat 5101 Lecture Notes

Simple Linear Regression for the Climate Data

Introduction to Design of Experiments

Schedule. Draft Section of Lab Report Monday 6pm (Jan 27) Summary of Paper 2 Monday 2pm (Feb 3)

Multiple Predictor Variables: ANOVA

Objectives Simple linear regression. Statistical model for linear regression. Estimating the regression parameters

Some Nonregular Designs From the Nordstrom and Robinson Code and Their Statistical Properties

Investigating Models with Two or Three Categories

Instrumental methods of analysis

CS 5014: Research Methods in Computer Science

TAGUCHI ANOVA ANALYSIS

Process Robustness Studies

DEPARTMENT OF ENGINEERING MANAGEMENT. Two-level designs to estimate all main effects and two-factor interactions. Pieter T. Eendebak & Eric D.

" M A #M B. Standard deviation of the population (Greek lowercase letter sigma) σ 2

Review of Statistics 101

EPAs New MDL Procedure What it Means, Why it Works, and How to Comply

Battery Life. Factory

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Chapter 5 EXPERIMENTAL DESIGN AND ANALYSIS

Rule of Thumb Think beyond simple ANOVA when a factor is time or dose think ANCOVA.

LAB 5 INSTRUCTIONS LINEAR REGRESSION AND CORRELATION

Chemometrics Unit 4 Response Surface Methodology

Analysis of Variance and Co-variance. By Manza Ramesh

How To: Analyze a Split-Plot Design Using STATGRAPHICS Centurion

Problems of Forensic Sciences, vol. XLIII, 2000, Received 9 September 1999; accepted 16 May 2000

Step 2: Select Analyze, Mixed Models, and Linear.

DOE module: Practice problem solutions

Cost optimisation by using DoE

Statistical Analysis of Engineering Data The Bare Bones Edition. Precision, Bias, Accuracy, Measures of Precision, Propagation of Error

y ˆ i = ˆ " T u i ( i th fitted value or i th fit)

Kumaun University Nainital

Transcription:

Experimental design Matti Hotokka Department of Physical Chemistry Åbo Akademi University

Contents Elementary concepts Regression Validation Design of Experiments Definitions Random sampling Factorial designs Response surface designs Robust parameter design [1] Otto, Chemometrics, Wiley, 1999. [2] Wu, Hamada, Experiments, Wiley, 2000. [3] Snedecor & Cochran: Statistical Methods, Iowa State Univ. Press [4] Cochran, Experimental designs, Wiley, 1966.

Definitions Analytical function Signal y: to be modeled or optimized. E.g., yield, measuring time, figure-of-merit, deviation from a model etc. y x

Definitions Factors Factor, or feature: ph, concentration, temperature,... A huge number of factors govern every measurement. The chemist must know which are important and must be tested. The others are kept as constant as possible.

Definitions Replications Every measurement is repeated from start a number of times so that a mean, a standard error and a confidence limit can be determined. Observation: mean of a set of parallel measurements. Blank: reference observation with default value of all the important factors, y B.

Definitions Replications vs. Repetitions Repetition: Repeated reading of the meter. Replication: New measurement from start. Repetitions test your ability to read a digital meter. Replications test the experimental errors in the measuring procedure.

Definitions Calibration parameters Sensitivity Detection limit Precision and trueness Specificity and selectivity

Definitions Calibration curve Sensitivity = slope Signal, y y = b0 + b1 x x y y b1 = x b 0 Intercept b 0 can be ignored if the sample is obtained against a blank (reference). Concentration, x

Definitions Analytical range Dynamic range: The valid range of x where the signal y depends functionally on x. Analytical range: the interval of x where the signal y can be determined accurately.

Definitions Signal, y DL Dynamic range Analytical range LoD Concentration, x

Definitions Detection limit Detection limit: lowest value of x where the signal can still be separated from noise. Noise is measured as the variance of the blank. y = y + 3 s x DL B B DL = y DL b b 0 1

Definitions Limit of determination Limit of determination: lowest value of x (concentration) where y can be determined with a useful accuracy.

Definitions Bias Error e in variable x (say, concentration) e = x x true x

Definitions Bias Error e in variable x (say, concentration) ( ) ( ) e = x x = x x + x x true Random error true Bias Random error x true x x Systematic error

Definitions Precision and trueness Precision = repeatability s = ( x x) n i 1 2 Trueness = deviation from true value x RR(%) = 100 x true

Definitions Selectivity Selectivity: possibility to measure in presence of interfering components. Specificity: sensitivity for a given analyte. Analytical resolution: N = x/ x. x x

Random sampling Why randomization All experiments must be made in random order. Response Response Systematic 5 4 3 2 1 True slope Drift Random 2 4 5 3 1 True slope Concentration Concentration

Random sampling Random number lists The random sequences are obtained from tables of random numbers. Nowadays random number generators of pocket calculators may be used. Normally, you get the same sequence every time. This is OK. If you want truly random numbers you should use a random seed.

Random sampling Always randomize Assume that four different concentrations are to be tested. Name them A, B, C, D. Make four parallel measurements for each: A 1, A 2, A 3, A 4 etc.

Random sampling Always randomize First run: Measure A 1, B 1, C 1, D 1. Second run: Start from scratch and do A 2, B 2, etc. A 1, B 1, C 1, D 1 A 2, B 2, C 2, D 2 A 3, B 3, C 3, D 3 Wrong! Systematic errors will not be found. A 4, B 4, C 4, D 4

Random sampling Always randomize Randomize the order of concentrations in the runs. A 1, B 1, C 1, D 1 C 2, D 2, A 2, B 2 D 3, A 3, B 3, C 3 B 4, C 4, D 4, A 4

Random sampling Always randomize Use linear (or non-linear) regression to analyse the results. Specifically, plot the residues to see whether some effects were not captured.

Random sampling Analysis y 1 4 4 2 3 2 3 1 4 3 1 4 2 2 3 1 A B C D Conc.

Random sampling Residues Residues 4 2 3 1 2 3 1 4 3 1 1 4 4 2 2 3 A 1, B 1, C 1, D 1 C 2, D 2, A 2, B 2 D 3, A 3, B 3, C 3 B 4, C 4, D 4, A 4 Drift! A B C D Conc.

Random sampling Types of factors Controlled factors Varied systematically or kept constant Known factors that cannot be controlled E.g., drift of instrument Unknown factors that can be anticipated E.g., impurities of the chemicals Truly unknown effects

Random sampling Blocking Some constant factors cannot be kept fixed but vary from batch to batch, day to day,... Make a series of measurements varying one factor and keeping the other conditions as constant as possible => A 1, B 1, C 1, D 1. This is a block. Then measure A 2, B 2, C 2, D 2 keeping the conditions constant but not necessarily the same as in block 1 if this is not possible.

Random sampling Latin square designs Randomize the blocking experiment. Run Sample 1 2 3 4 1 A B D C 2 D C A B 3 B D C A 4 C A B D Observe the good balance.

Factorial designs What? Typically two-level experiments A low level and a high level for each factor. Typically for screening Study which of the presumed factors really show a significant effect.

Factorial designs Two levels Each factor is tested at a low and a high level. Designate the levels symbolically -1 and +1. Rate of p-phenylenediamine (PPD) oxidation at constant enzyme level of 13.6 mg L -1 is studied using spectrophotometry: Factor Level -1 +1 T, C 35 40 ph 4.8 6.4 [PPD], mm 0.5 27.3

Factorial designs Experiment plan Run Factors T PPD ph y 1 - y 4 y s 4 + - + 8.16 7.93 8.27 8.12 8.12 0.14 7 - + + 14.12 13.88 14.26 14.08 14.09 0.16 1 + - - 6.60 6.74 6.81 6.52 6.67 0.13 3 + + + 14.71 14.56 14.95 14.88 14.78 0.18 6 - + - 11.24 11.14 11.01 11.04 11.11 0.10 5 - - - 6.31 6.45 6.42 6.22 6.35 0.10 8 - - + 7.80 7.40 7.62 7.71 7.63 0.17 2 + + - 11.56 11.86 11.80 11.66 11.72 0.14... but like this, randomized.

Factorial designs 2 k design Run Coded factor levels Main effects Interaction effects T PPD ph TxPPD TxpH PPDxpH 1 +1-1 -1-1 -1 +1 2 +1 +1-1 +1-1 -1 3 +1 +1 +1 +1 +1 +1 4 +1-1 +1-1 +1-1 5-1 -1-1 +1 +1 +1 6-1 +1-1 -1 +1-1 7-1 +1 +1-1 -1 +1 8-1 -1 +1 +1-1 -1 y 6.67 11.72 14.78 8.12 6.35 11.11 14.09 7.63 Experimental accuracy? Four parallel determinations => s = 0.18. D.f.=3. Compute the differences high level - low level: D T =(y 1 +y 2 +y 3 +y 4 )/4 - (y 5 +y 6 +y 7 +y 8 )/4

Factorial designs 2 k design D T = 0.53 D PPD = 5.73 D ph = 2.19 D TxPPD = 0.123 D TxpH = 0.062 D PPDxpH = 0.828 Statistically significant effects at 95 % confidence: D >Student t s = 0.18, 3 degrees of freedom => t = 3.18 D > 3.18 0.18 = 0.57. D PPD, D ph and D PPDxpH are significant. s

Factorial designs Another analysis method Consider the normal distribution. P y 1 y 1 0 x 0 0 Most statistical quantities are normally distributed. x 0 0 1 y

Factorial designs Half-normal quantiles Given the integrated normal distribution in a y vs y plot, calculate what is the x that gives your y value. i Φ + 05. 1 05. 05. N In the example there are six effects, N = 6. What is x if y is 0.5+0.5*(1-0.5)/6 = 0.542? Answer: 0.106. This you obtain from the tables of normal distribution.

Factorial designs Half-normal quantiles Six effects: N = 6. Numbered 1,2,3,...,6 in ascending order. Therefore i y = 0.5+0.5*(i-0.5)/8 x 1 0.542 0.106 2 0.625 0.320 3 0.708 0.550 4 0.792 0.813 5 0.875 1.150 6 0.958 1.730

Factorial designs Half-normal quantiles Sort the D values in ascending order x y 0.106 0.062 TxpH 0.320 0.123 TxPPD 0.550 0.53 T 0.813 0.828 PPDxpH 1.150 2.19 ph 1.730 5.73 PPD 5 4 3 2 1 0 TxpH PPD ph T PPDxpH TxPPD 1 2

Factorial designs Orthogonal arrays The 2 k design gives 64 combinations for k = 8. Too many degrees of freedom! Choose half of the combinations, 2 k-1. However, you cannot choose any set of combinations. The arrays must be orthogonal.

Factorial designs Orthogonal arrays There are many ways of choosing orthogonal arrays. Plackett and Burmann, and Hall, and Taguchi, have published large selections based on Hadamard matrices.

Taguchi table L4 (2 3 ) Taguchi table L4 (2 3 ) Full set of experiments 1 1 1 1 1 2 1 2 1 2 1 1 1 2 2 2 1 2 2 2 1 2 2 2 Eight experiments

Taguchi table L4 (2 3 ) Taguchi table L4 (2 3 ) Taguchi design 1 1 1 1 2 2 2 1 2 2 2 1 Four experiments

Factorial designs Orthogonal arrays What you loose when using orthogonal arrays is (some of) the interaction effects.

Factorial designs More reduction Designs of size 2 k-p, p>1, also have been proposed.

Factorial designs Three-level designs +1 x 2 0-1 -1 0 +1 x 1

Factorial designs Three-level designs Response -1 0 Factor +1

Factorial designs Central composite design 1-1 -1-1 y 1 2 +1-1 -1 y 2 3 +1 +1-1 y 3 4-1 +1-1 y 4 5-1 -1 +1 y 5 6 +1-1 +1 y 6 7 +1 +1 +1 y 7 8-1 +1 +1 y 8 9 -a 0 0 y 9 10 +a 0 0 y 10 11 0 -a 0 y 11 12 0 +a 0 y 12 13 0 0 - a y 13 14 0 0 +a y 14 15, 16, 17 0 0 0 y 15, y 16, y 17

Factorial designs Box-Behnken design

Factorial designs Lattice design

Factorial designs Analysis Use multivariate regression.

Response surfaces Optimization tasks Biggest is best Find a set of factor values that give maximal response (e.g., yield) Smallest is best Find minimum Nominal is best Minimize the difference (measured - nominal)

Response surfaces The response ph PPD

Response surfaces Optimization techniques Any optimization strategy can be used Single factor at a time (the engineering method) may miss the optimum Fixed-size simplex algorithm may work better

Response surfaces Engineering method ph Max Measure at the indicated points. PPD

Response surfaces Simplex method Code the factor values to the range (0,1). Generate the initial simplex. Measure at the indicated points. ph If there are N factors (here N=2) the simplex has N+1 points. Here the points are 0,0;1,0; 0.5, 0.87 1 0 0 1 PPD Unknown surface

Response surfaces Simplex method ph 1 0 w 0 p 1 Remove the worst point. Calculate the centroid of the remaining points. PPD p = 1 N N + 1 v j + 1 j = 1 j w

Response surfaces Simplex method Measure at the indicated point. ph 1 0 w 0 p r 1 Generate a new point. PPD r = p + ( p w)

Response surfaces Simplex method Measure at the indicated points. ph 1 0 w 0 1 PPD

Robust parameters Factor categories Control factors Can be kept fixed once chosen Noise factors Cannot be controlled Create the variations in the quality of the product

Robust parameters Typical procedure Usually, the quality of the product is improved by reducing the noise. Unfortunately the noise factors are difficult (=expensive) to reduce.

Robust parameters 2 k design, a reminder Run Coded factor levels Main effects Interaction effects T PPD ph TxPPD TxpH PPDxpH 1 +1-1 -1-1 -1 +1 2 +1 +1-1 +1-1 -1 3 +1 +1 +1 +1 +1 +1 4 +1-1 +1-1 +1-1 5-1 -1-1 +1 +1 +1 6-1 +1-1 -1 +1-1 7-1 +1 +1-1 -1 +1 8-1 -1 +1 +1-1 -1 y 6.67 11.72 14.78 8.12 6.35 11.11 14.09 7.63

Robust parameters Control of noise Main effects: All control and noise factors. Interaction effects between control factors and noise factors may be quite large. If this is the case, then variations in the product quality may be reduced by adjusting the control factors so that the effect of noise is reduced.

Robust parameters An example Consider the dependence of y (signal level) on x (voltage over detector = control factor). y Width of noise A B x