Robust Default Correlation for Cost Risk Analysis

Similar documents
Chapter If n is odd, the median is the exact middle number If n is even, the median is the average of the two middle numbers

Stat 421-SP2012 Interval Estimation Section

Output Analysis (2, Chapters 10 &11 Law)

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

The standard deviation of the mean

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Random Variables, Sampling and Estimation

1 Inferential Methods for Correlation and Regression Analysis

11 Correlation and Regression

Statistics 511 Additional Materials

Sample Size Determination (Two or More Samples)

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

Topic 9: Sampling Distributions of Estimators

Regression, Part I. A) Correlation describes the relationship between two variables, where neither is independent or a predictor.

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Simulation. Two Rule For Inverting A Distribution Function

Statistical Analysis on Uncertainty for Autocorrelated Measurements and its Applications to Key Comparisons

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Topic 9: Sampling Distributions of Estimators

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Topic 9: Sampling Distributions of Estimators

An Introduction to Randomized Algorithms

Statisticians use the word population to refer the total number of (potential) observations under consideration

Machine Learning Brett Bernstein

ENGI 4421 Confidence Intervals (Two Samples) Page 12-01


Linear Regression Models

ANALYSIS OF EXPERIMENTAL ERRORS

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Topic 10: Introduction to Estimation

Understanding Samples

Common Large/Small Sample Tests 1/55

There is no straightforward approach for choosing the warmup period l.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

A proposed discrete distribution for the statistical modeling of

A Question. Output Analysis. Example. What Are We Doing Wrong? Result from throwing a die. Let X be the random variable

Statistical inference: example 1. Inferential Statistics

Computing Confidence Intervals for Sample Data

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

2 Definition of Variance and the obvious guess

Expectation and Variance of a random variable

Data Analysis and Statistical Methods Statistics 651

Sampling Distributions, Z-Tests, Power

ACCESS TO SCIENCE, ENGINEERING AND AGRICULTURE: MATHEMATICS 1 MATH00030 SEMESTER / Statistics

Economics Spring 2015

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Learning Theory: Lecture Notes

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. Comments:

AP Statistics Review Ch. 8

Analysis of Experimental Data

A statistical method to determine sample size to estimate characteristic value of soil parameters

Basis for simulation techniques

Chapter 6. Sampling and Estimation

Goodness-of-Fit Tests and Categorical Data Analysis (Devore Chapter Fourteen)

Lecture 5. Materials Covered: Chapter 6 Suggested Exercises: 6.7, 6.9, 6.17, 6.20, 6.21, 6.41, 6.49, 6.52, 6.53, 6.62, 6.63.

Stat 139 Homework 7 Solutions, Fall 2015

Overview. p 2. Chapter 9. Pooled Estimate of. q = 1 p. Notation for Two Proportions. Inferences about Two Proportions. Assumptions

Appendix D Some Portfolio Theory Math for Water Supply

(X i X)(Y i Y ) = 1 n

INF Introduction to classifiction Anne Solberg Based on Chapter 2 ( ) in Duda and Hart: Pattern Classification

Unbiased Estimation. February 7-12, 2008

t distribution [34] : used to test a mean against an hypothesized value (H 0 : µ = µ 0 ) or the difference

Statistics 20: Final Exam Solutions Summer Session 2007

The variance of a sum of independent variables is the sum of their variances, since covariances are zero. Therefore. V (xi )= n n 2 σ2 = σ2.

WHAT IS THE PROBABILITY FUNCTION FOR LARGE TSUNAMI WAVES? ABSTRACT

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Lecture 16

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

(all terms are scalars).the minimization is clearer in sum notation:

This is an introductory course in Analysis of Variance and Design of Experiments.

1.010 Uncertainty in Engineering Fall 2008

Department of Civil Engineering-I.I.T. Delhi CEL 899: Environmental Risk Assessment HW5 Solution

Anna Janicka Mathematical Statistics 2018/2019 Lecture 1, Parts 1 & 2

Design and Analysis of Algorithms

Chapter 11 Output Analysis for a Single Model. Banks, Carson, Nelson & Nicol Discrete-Event System Simulation

Estimation of the Mean and the ACVF

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Lecture 12: Hypothesis Testing

Estimation of a population proportion March 23,

Example: Find the SD of the set {x j } = {2, 4, 5, 8, 5, 11, 7}.

Correlation Regression

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

October 25, 2018 BIM 105 Probability and Statistics for Biomedical Engineers 1

Discrete Mathematics and Probability Theory Fall 2016 Walrand Probability: An Overview

4. Partial Sums and the Central Limit Theorem

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

Open book and notes. 120 minutes. Cover page and six pages of exam. No calculators.

For example suppose we divide the interval [0,2] into 5 equal subintervals of length

Properties and Hypothesis Testing

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Homework 5 Solutions

GUIDELINES ON REPRESENTATIVE SAMPLING

Mathacle. PSet Stats, Concepts In Statistics Level Number Name: Date: Confidence Interval Guesswork with Confidence

1 Review and Overview

Sampling Error. Chapter 6 Student Lecture Notes 6-1. Business Statistics: A Decision-Making Approach, 6e. Chapter Goals

Chapter 22. Comparing Two Proportions. Copyright 2010, 2007, 2004 Pearson Education, Inc.

Statistical Properties of OLS estimators

Chapter 5. Inequalities. 5.1 The Markov and Chebyshev inequalities

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Transcription:

Robust Default Correlatio for Cost Risk Aalysis Christia Smart, Ph.D., CCEA Director, Cost Estimatig ad Aalysis Missile Defese Agecy Preseted at the 03 ICEAA Professioal Developmet ad Traiig Workshop Jue, 03

Itroductio Correlatio is a importat cosideratio i cost risk aalysis Whe correlatio is igored, you are makig the de facto assumptio that all risks are idepedet Eve whe you choose ot to decide, you still have made a choice (Rush, Free Will) Assumig o correlatio results i a vast uderstatemet of risk I 996, Do Mackezie wrote that Oe of the more difficult chores i cost risk aalysis is establishig appropriate levels of correlatio (Mackezie 996) Sevetee years later, this is still true This presetatio is a attempt at makig forward progress o this issue

Defiitios Cosider two radom variables, X ad Y. The mea of X, E(X), is deoted by μ x, ad similarly, the mea of Y, E(Y), is deoted by μ y The variace of X, Var(X), is deoted by σ X, ad similarly, the variace of Y, Var(Y), is deoted by σ Y The variace of X ad Y are equal to: Var( X ) Cov( X, X ) E( X ) [ E( X )] [ ] Var ( Y ) Cov( Y, Y ) E( Y ) E( Y ) Correlatio, deoted by the Greek letter r ( rho ), is defied by XY Corr( X,Y ) cov(x,y ) Var( X )Var(Y ) E( XY ) E( X )E(Y ) Var( X )Var(Y ) E( XY ) μ σ X σ Y X μ Y 3

Total System Mea ad Variace For WBS elemets, the mea ad the variace of the total cost are defied by: 4 ( ) i i i i i i X E X E μ i j j i j i ij i i X i Var σ σ σ

Total Variace with Level Correlatio Suppose (for simplicity) There are WBS Elemets Each Each Total Cost Var( C i ) σ Corr C ( i C j) C, < C i k C, C, Κ, C ( ) ( i ) ( i) ( j) Var C Var C Var C Var C k i ( ) ( ) σ σ ( ) σ j i Correlatio 0 Var( C ) σ σ ( ) ( ) σ 5

Impact of Assumig Idepedece For a 00 elemet WBS assumig idepedece amog all WBS elemets whe the true uderlyig correlatio is equal to 0% results i a uderestimate of total system stadard deviatio equal to 80%! Percet Uderestimated 00 80 60 40 0 000 00 30 0 0 0 0. 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Actual Correlatio Source: Why Correlatio Matters i Cost Estimatig, Advaced Traiig Sessio, 3d Aual DOD Cost Aalysis Symposium, Williamsburg, VA, 999. 6

Example of Impact As a example, cosider a system with 0 subsystems, each with mea equal to $0 millio ad stadard deviatio equal to $3 millio For 00 elemets ad,000 elemets, assumig correlatio is zero whe it is actually 0% results i uderestimatig the 80 th percetile by 8-0%, ad if the correlatio is 60%, the 80 th percetile is uderestimated by 5-7% Number of WBS Elemets 80% Cofidece Level (TY$, Millios) Idepedece 0% Correlatio 60% Correlatio 0 $08 $3 $9 00 $,05 $, $,8,000 $0,080 $,09 $,85 7

Default Correlatio Notice i the graph o the previous chart there is a apparet kee i the curve aroud 0% Above 0% correlatio the cosequece of assumig less correlatio begis to dwidle This graph is the basis for assumig 0-30% for default correlatio for elemets betwee which there is o fuctioal correlatio Book (Book 999) recommeds 0% as a default correlatio value because of this However, the graph does ot tell us how much the total stadard deviatio is uderestimated because correlatio is assumed to be 0%, but is actually 60%, for example 8

Uderestimatig Correlatio with the Default 0% For a 00-elemet WBS, if the correlatio is assumed to be 0% but is actually 60%, the total stadard deviatio is uderestimated by 40% 00% Percet Over/Uderestimated 90% 80% 70% 60% 50% 40% 30% 0% 00 30 0 000 0% 0% 0 0. 0. 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Overestimated Uderestimated Actual Correlatio Source: Why Correlatio Matters i Cost Estimatig, Advaced Traiig Sessio, 3d Aual DOD Cost Aalysis Symposium, Williamsburg, VA, 999. 9

Robust Approach A more robust approach to assigig correlatios would be to use the value that results i the least amout of error i the variace It is robust i the sese that without solid evidece to assig a correlatio value, it miimizes the amout by which the total stadard is misestimated due to the correlatio assumptio This robust default measure of correlatio would be a value for correlatio that would miimize the error whe the assumed correlatio differs from the actual uderlyig correlatio 0

Absolute Error We are iterested i the absolute value of the error, sice if we cosider egative ad positive values, they may offset each other Let ε deote the error, the we are iterested i ε, where ε is defied by ε ε if ε > 0 ε if ε < 0

Expected Value of Absolute Error If we assume that the prior distributio of correlatio o the iterval (0,) is uiform, the the expected value of the absolute error ε of the variace as a fuctio of the assumed correlatio is defied by ε f ( )d 0 ε d 0 sice f ( ) Thus the approach is to fid the value of that miimizes the expected (absolute) error This equatio provides the expected error as a fuctio of, ad the we miimize this fuctio with respect to usig techiques from elemetary Calculus

What is the Error? Now that we have determied how to determie the miimum error, we eed to figure out what to miimize We preset several differet choices, calculate the results, ad provide pros ad cos for each 3

Case : Percetage Error (of Actual) Deote the assumed correlatio by ad the actual correlatio by I this first case, we cosider the metric that Book (Book, 999) looked at whe measurig over- ad uder-estimatio of correlatio, which is to cosider the percetage error i variace as a percetage of the actual correlatio 4 ( ) ( ) ( ) ( ) ( ) ( ) σ σ σ ε

Case : Calculatig Expected Absolute Error ( of ) The expected absolute error is calculated* as This is a fuctio of the umber of WBS elemets () ad the assumed correlatio ( ) Miimizig this with respect to we fid that *See the paper for detailed calculatios 5 ( ) ( ) ( ) ( ) ( ) ( ) 0 d d ( ) ( ) 4 ( ) ( ) 4 4

Case : Calculatig Expected Absolute Error ( of ) The limit of this miimum as is 5% This is close to the 0% default value advocated by Book (Book, 999) However, the total error is miimized by this value because of the large pealty assiged whe overestimatig actual correlatios ear zero For example, let 00 ad assume the correlatio is 40%. The absolute percetage error whe the actual correlatio is equal to zero is 537%, while the absolute percetage error whe the actual correlatio is equal to 80% is oly 9% The pealty should ot differ greatly whether you are overestimatig or uderestimatig A easy way to overcome this issue is to examie the percet error as a fuctio of the assumed correlatio, which is cosidered i Case 6

Case : Percetage Error (of Assumed) ( of ) This case is similar to Case, oly the deomiator is differet I this case, 7 ( ) ( ) ( ) ( ) ( ) ( ) σ σ σ ε ( ) ( ) ( ) ( ) ( ) ( ) 3 3 4 ) ( E 3 ε

Case : Percetage Error (of Assumed) ( of ) The value of that miimizes the expected (absolute) error is 3 3 The limit of as is 3 63% 8

Impact of Case The sigle recommeded value from this approach is 63% This is much larger tha the 5% value usig the other approach, or the 0% rule of thumb widely used i practice The impact o stadard deviatio i icreasig default correlatio from 0% to 63% will result i a sigificat icrease i stadard deviatio % Icrease i σ 0 54.3% 30 68.3% 00 74.5%,000 77.% 0,000 77.5% 9

Case 3: Total Absolute Differece The absolute differece could also be cosidered as a metric σ ( ) σ ( ) I this case, the absolute expected value of the error occurs whe 50% 0

Case 4: Case with Trucated Limits ( of ) If we cosider the first case, much of the reaso why the miimum is so low compared to the other cases is the error whe the actual correlatio is close to 0% We kow that i most case the correlatio is ot 0%, ad we kow that it is ot 00% Absolute percetage error for variace as a percet of the actual correlatio for 00 WBS elemets: Expected Absolute Percetage Error 90% 80% 70% 60% 50% 40% 30% 0% 0% 0% 0% 0% 0% 30% 40% 50% 60% 70% 80% 90% 00% Assumed Correlatio

Case 4: Case with Trucated Limits ( of ) If we trucate the actual correlatio to be uiform i the iterval (0.,0.9) the the expected value of the absolute percet error is miimized whe ( 0. 0.9) ( 0.9.) 4 ( ) 4 The limit of this as is 40%

Summary of the Four Cases All four cases miimize the expected value of the absolute error i the variace, but use differet metrics for measurig error Case : Error is measured as a percetage of the variace that results from the actual correlatio, result i the limit is 5% Case : Error is measured as a percetage of the variace that results from the assumed correlatio, result i the limit is 63% Case 3: Error is measured as total differece i variaces, result is 50% Case 4: Error is measured as a percetage of the variace that results from the actual correlatio, with the correlatio rage limited to 0-90%; result is 40% 3

Recommedatio I recommed a percetage differece approach Kowig that the differece betwee the estimated total stadard deviatio ad the actual total stadard deviatio is $00 millio does t tell you much, sice it could be large if the stadard deviatio is $00 millio, or relatively small if the total stadard deviatio is $ billio Calculatig the error based as a percetage of the assumed correlatio is logical The issue with lookig at the error relative to the actual correlatio is that we do t kow the actual correlatio - we oly kow the assumed correlatio. The same is true for CER residuals For the Miimum Ubiased Percet Error (MUPE) ad the Zero bias Miimum Percet Error (ZMPE) CER methods look at the percetage error from the estimate, ot from the actual We should use the same metric i lookig at correlatio Bottom lie: I recommed usig a default value for correlatio that is equal to 63% 4

Empirical Evidece for Correlatio There is some limited empirical evidece o correlatio for spacecraft This rages from 6-40% at the subsystem level Smart calculated a average correlatio i the rage 6-0% for NASA/Air Force Cost Model hardware subsystems (Smart, 004) Covert ad Aderso calculated a average correlatio equal to 6.8% for Umaed Spacecraft Cost Model subsystems (Covert ad Aderso, 005) Mackezie ad Addiso reported correlatios i the rage 0-40% for average uit cost of subsystems NRO data (Mackezie ad Addiso, 000) However, this evidece is oly for oe commodity 5

Summary ( of ) 0% is ofte the default value whe there is o iformatio to provide iformed iput This level is too low Usig a more robust approach, we have show that default values i the rage 40-63% are more reasoable I recommed 63% as a default value Oly dowside is potetial for overestimatio However as a professio we do ot have a reputatio for overestimatio Icreasig default correlatio value may help couter this 6

Summary ( of ) Example of uderestimatio of risk For a risk aalysis coducted for the Tethered Satellite System, the actual cost was more tha double the 95 th percetile of the origial cost risk aalysis 00% 90% Cofidece Level 80% 70% 60% 50% 40% 30% 0% 0% Iitial Budget 5/8 Aalysis 3/8 Aalysis Actual Fial Cost 0%.5.5 3 3.5 4 4.5 5 5.5 6 Normalized Cost 7

Refereces Book, S.A., Why Correlatio Matters i Cost Estimatig, Advaced Traiig Sessio, 3d Aual DOD Cost Aalysis Symposium, Williamsburg, VA, 999. Covert, R. ad T. Aderso, Correlatio Tutorial, Rev. H, preseted at the Cost Drivers Learig Evet, November 005. Gupto, G.M., C.C. Figer, ad M. Bahtia, CreditMetrics Techical Documet, 997, J.P. Morga, New York, available at: http://www.defaultrisk.com/_pdf6j4/creditmetrics_techdoc.pdf Mackezie, D., Cost Variace i Idealized Systems, preseted at the Iteratioal Society of Parametric Aalysts Aual Coferece, Caes, Frace, Jue, 996. Mackezie, D., ad B. Addiso, Space System Cost Variace ad Estimatig Ucertaity, preseted at the Space Systems Cost Aalysis Group meetig, Seattle, WA, October, 000. Smart, C., Average Correlatio Values for NAFCOM 004, upublished white paper, SAIC, 004. Smart, C., Risk Aalysis i the NASA/Air Force Cost Model, preseted at the Joit Aual ISPA/SCEA Coferece, Dever, CO, Jue, 005. Smart, C., Mathematical Techiques for Joit Cost ad Schedule Aalysis, preseted at the 009 NASA Cost Symposium, Cocoa Beach, FL, May, 009. Smart, C., Covered With Oil: Icorporatig Realism i Cost Risk Aalysis, preseted at the 0 Joit Aual Iteratioal Society of Parametric Aalysts ad Society of Cost Estimate ad Aalysis Coferece, Albuquerque, NM, Jue 0. 8