Preliminary Causal Analysis Results with Software Cost Estimation Data. Anandi Hira, Bob Stoddard, Mike Konrad, Barry Boehm

Similar documents
Preliminary Causal Analysis Results with Software Cost Estimation Data

Improve Forecasts: Use Defect Signals

Fuzzy Expert-COCOMO Risk Assessment and Effort Contingency Model in Software Project Management

ANALYTIC COMPARISON. Pearl and Rubin CAUSAL FRAMEWORKS

Joining Effort and Duration in a Probabilistic Method for Predicting Software Cost and Schedule

Strategies for Discovering Mechanisms of Mind using fmri: 6 NUMBERS. Joseph Ramsey, Ruben Sanchez Romero and Clark Glymour

MATHEMATICS OF DATA FUSION

Inferring the Causal Decomposition under the Presence of Deterministic Relations.

The Specification of Causal Models with Tetrad IV: A Review

Case Study on Software Effort Estimation

Modeling the Relationship between Software Effort and Size Using Deming Regression

Abstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables

A Value-Added Predictive Defect Type Distribution Model based on Project Characteristics

Causal Discovery with Latent Variables: the Measurement Problem

Challenges (& Some Solutions) and Making Connections

Advising on Research Methods: A consultant's companion. Herman J. Ader Gideon J. Mellenbergh with contributions by David J. Hand

Study of the effectiveness of cost-estimation models and complexity metrics on small projects

Marginal versus conditional effects: does it make a difference? Mireille Schnitzer, PhD Université de Montréal

Learning Causal Bayesian Networks from Observations and Experiments: A Decision Theoretic Approach

Sensor Network Information Analytical Methods: Analysis of Similarities and Differences

Chapter 8 Mining Additional Perspectives

Measurement Error and Causal Discovery

JOINT PROBABILISTIC INFERENCE OF CAUSAL STRUCTURE

Adopting Curvilinear Component Analysis to Improve Software Cost Estimation Accuracy Model, Application Strategy, and an Experimental Verification

Forecasting Conflict Lecture 4 Models and Metrics

Fast and Accurate Causal Inference from Time Series Data

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), Institute BW/WI & Institute for Computer Science, University of Hildesheim

UNIVERSITY OF THE PHILIPPINES LOS BAÑOS INSTITUTE OF STATISTICS BS Statistics - Course Description

Weighted Stability Index (WSI) Metric Model Mike Libassi Intel Corp 8/11/99

QCA: Strengths, Weaknesses, Policy Relevance

BEYOND ASSORTATIVITY PROCLIVITY INDEX FOR ATTRIBUTED NETWORKS (PRONE) Reihaneh Rabbany Dhivya Eswaran*

Noisy-OR Models with Latent Confounding

Automatic Causal Discovery

Predicting causal effects in large-scale systems from observational data

MEASURING THE COMPLEXITY AND IMPACT OF DESIGN CHANGES

An Introduction to Reversible Jump MCMC for Bayesian Networks, with Application

Causality in Econometrics (3)

Bayesian Discovery of Linear Acyclic Causal Models

Causal Inference on Data Containing Deterministic Relations.

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Topic -2. Probability. Larson & Farber, Elementary Statistics: Picturing the World, 3e 1

Introduction to ecosystem modelling Stages of the modelling process

Revisiting Software Development Effort Estimation Based on Early Phase Development Activities

Bayesian causal forests: dealing with regularization induced confounding and shrinking towards homogeneous effects

AN APPROACH TO FIND THE TRANSITION PROBABILITIES IN MARKOV CHAIN FOR EARLY PREDICTION OF SOFTWARE RELIABILITY

Bayesian network modeling. 1

diluted treatment effect estimation for trigger analysis in online controlled experiments

Time Series: Theory and Methods

Tutorial: Causal Model Search

A523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

Causal Graphical Models in Systems Genetics

Statistics for Particle Physics. Kyle Cranmer. New York University. Kyle Cranmer (NYU) CERN Academic Training, Feb 2-5, 2009

Towards an extension of the PC algorithm to local context-specific independencies detection

Learning the Structure of Linear Latent Variable Models

Bioinformatics: Biology X

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Discovering Geographical Topics in Twitter

Discussion on Fygenson (2007, Statistica Sinica): a DS Perspective

Causal Discovery by Computer

Nonparametric Inference and the Dark Energy Equation of State

Residential Showering Time-of-Day Analysis

Treatment of Expert Opinion Diversity in Bayesian Belief Network Model for Nuclear Digital I&C Safety Software Reliability Assessment

A historical perspective on fine tuning: lessons from causal set cosmology

Human-level concept learning through probabilistic program induction

PEARL VS RUBIN (GELMAN)

Predicting Faults Using the Complexity of Code Change

Crowdsourcing via Tensor Augmentation and Completion (TAC)

CSCI 5822 Probabilistic Model of Human and Machine Learning. Mike Mozer University of Colorado

Un nouvel algorithme de génération des itemsets fermés fréquents

OUTLINE CAUSAL INFERENCE: LOGICAL FOUNDATION AND NEW RESULTS. Judea Pearl University of California Los Angeles (

Job Training Partnership Act (JTPA)

British American Tobacco Group Research & Development. Method - Determination of phenols in mainstream cigarette smoke

GOV 2001/ 1002/ E-2001 Section 10 1 Duration II and Matching

Generative Techniques: Bayes Rule and the Axioms of Probability

MTAT Software Engineering

CAUSALITY AND EXOGENEITY IN ECONOMETRICS

MATCHING FOR EE AND DR IMPACTS

From Causality, Second edition, Contents

CAUSAL GAN: LEARNING CAUSAL IMPLICIT GENERATIVE MODELS WITH ADVERSARIAL TRAINING

Lecture 24, Causal Discovery

ENVIRONMENTAL DATA ANALYSIS WILLIAM MENKE JOSHUA MENKE WITH MATLAB COPYRIGHT 2011 BY ELSEVIER, INC. ALL RIGHTS RESERVED.

Sensitivity Analysis with Several Unmeasured Confounders

Introduction to Machine Learning. Maximum Likelihood and Bayesian Inference. Lecturers: Eran Halperin, Yishay Mansour, Lior Wolf

Causal Discovery. Richard Scheines. Peter Spirtes, Clark Glymour, and many others. Dept. of Philosophy & CALD Carnegie Mellon

An Introduction to Causal Mediation Analysis. Xu Qin University of Chicago Presented at the Central Iowa R User Group Meetup Aug 10, 2016

2011 Pearson Education, Inc

Chapter 4 Multi-factor Treatment Designs with Multiple Error Terms 93

Design and implementation of a new meteorology geographic information system

The Helicoidal Life Cycle A Tool for Software Development and Enhancement

Construction of Quality Prediction Model based on Peer Review Performance Data

COURSE CONTENT for Computer Science & Engineering [CSE]

Lecture 7 Time-dependent Covariates in Cox Regression

Contents. Part I: Fundamentals of Bayesian Inference 1

Bayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2016

K. Nishijima. Definition and use of Bayesian probabilistic networks 1/32

Causal Inference. Prediction and causation are very different. Typical questions are:

Probabilistic Latent Semantic Analysis

Knowledge-based and Expert Systems - 1. Knowledge-based and Expert Systems - 2. Knowledge-based and Expert Systems - 4

At A Glance. UQ16 Mobile App.

Proofs Propositions and Calculuses

Transcription:

Preliminary Causal Analysis Results with Software Cost Estimation Data Anandi Hira, Bob Stoddard, Mike Konrad, Barry Boehm

Parametric Cost Models COCOMO II Effort =2.94 Size E 17 i=1 EM i u Input: size, product and personnel attributes u Effort in Person-Months (PM) u Size in KSLOC (1000 SLOC) u Domain Experts u Data calibration u No causal analysis

Causal Inference Causal Learning/Inference Causal Discovery Causal Estimation Algorithms and Domain Knowledge on Data Algorithms to quantify causal influence

Past Causal-Type Analyses Dr. Boehm COCOMO 81 u In-depth behavioral analyses for effort factors Evidence-Based SE u Experiments u Cause precede effect u Cause covaries with effect u Alternative explanations are implausible Cuoto et al u Granger s causality test for software defect predictability u Doesn t get to heart of causality Hu et al u Bayesian networks with causality constraints for software risk factors

PC Search u Named after Peter Spirtes and Clark Glymour u First scalable discovery algorithm X 1 X 2 Change in X 1 causes change in X 2 X 1 X 2 Insufficient data to select orientation X 1 X 2 May be common confounder of both variables, missing from dataset

Tetrad

Dataset: Unified Code Count (UCC) Project Description u Maintained at USC u Code metrics tool (logical SLOC, cyclomatic complexity) u Implemented in C++ u 45 to 1425 logical SLOC u 2010 to 2014 u Modularized architecture u 4-month time-boxed increments Project Types u Add Functions u New language parsers u New features, such as GUI u Modify Functions u Cyclomatic complexity support (modify existing language parsers with mathematical operation and algorithms)

Dataset Attributes 1. Equivalent SLOC 2. IFPUG Function Points 3. IFPUG Software Non-functional Assessment Process 4. COSMIC Function Points 5. Total Effort 6. Applications Experience 7. Platform Experience 8. Use of Software Tools 9. Personnel Continuity 10. Documentation Match to Needs 11. Analyst Capability 12. Programmer Capability 13. Product Complexity

All Data Points

ESLOC 2000 1800 1600 Normalized Effort (hours) 1400 1200 1000 800 600 400 UCC Projects Calibrated Model 200 0 0 200 400 600 800 1000 1200 1400 1600 Equivalent SLOC

IFPUG FPs 1200 1000 Normalized Effort (hours) 800 600 400 200 Modified Functions Add Functions 0 0 10 20 30 40 50 60 Enhancement Function Points

IFPUG SNAP 450 400 350 Normalized Effort (hours) 300 250 200 150 100 Modified Functions Add Functions 50 0 0 20 40 60 80 100 120 Enhancement SNAP Points

COSMIC FPs 450 400 350 Normalized Effort (hours) 300 250 200 150 100 Modified Functions Add Functions 50 0 0 2 4 6 8 10 12 14 COSMIC Function Points

Add Functions

ESLOC 1200 1000 Normalized Effort (hrs) 800 600 400 200 0 0 500 1000 1500 2000 2500 Equivalent SLOC

IFPUG FPs 1200 1000 Normalized Effort (hours) 800 600 400 200 0 0 5 10 15 20 25 30 35 40 Enhancement Function Points

IFPUG SNAP 450 400 350 Normalized Effort (hours) 300 250 200 150 100 50 0 0 20 40 60 80 100 120 Enhancement SNAP Points

COSMIC FPs 450 400 350 Normalized Effort (hours) 300 250 200 150 100 50 0 0 2 4 6 8 10 12 14 COSMIC Function Points

Modify Functions

ESLOC 900 800 700 Normalized Effort (hours) 600 500 400 300 200 100 0 0 100 200 300 400 500 600 700 Equivalent SLOC

IFPUG FPs 900 800 700 Normalized Effort (hours) 600 500 400 300 200 100 0 0 10 20 30 40 50 60 Enhancement Function Points

IFPUG SNAP 250 200 Normalized Effort (hours) 150 100 50 0 0 10 20 30 40 50 60 70 Enhancement SNAP Points

COSMIC FPs 250 200 Normalized Effort (hours) 150 100 50 0 0 1 2 3 4 5 6 COSMIC Function Points

Conclusion

Conclusions General Conclusions u u u All Data Points u SNAP -> Total Effort u CFPs -> Total Effort u PCAP Total Effort u ACAP PCAP Add Functions u PCAP Total Effort Modify Functions u ESLOC Total Effort u SNAP Total Effort u ACAP PCAP Interesting Results u All Data Points u CFPs -> DOCU u Modify Functions u CFPs -> PCAP u ACAP -> PCAP