Loss Estimation using Monte Carlo Simulation
|
|
- Rosa Owens
- 6 years ago
- Views:
Transcription
1 Loss Estimation using Monte Carlo Simulation Tony Bellotti, Department of Mathematics, Imperial College London Credit Scoring and Credit Control Conference XV Edinburgh, 29 August to 1 September 2017
2 Motivation Accurate estimation of loss based on underlying models of PD, LGD and EAD. Use of Monte Carlo Simulation (integration) to avoid complex analytic solution: giving a distribution of possible loss. Confidence intervals to quantify in expected loss estimates. Applications: Internal risk management, Regulation (Basel 3), Accounting rules (IFRS9, CECL), Stress testing, Profit estimation.
3 Basic Idea Simple idea: Simulate Loss given by For a portfolio of loans, with i = 1 to n accounts, compute n Loss = PD i LGD i EAD i i=1 for a portfolio of loans i = 1 to n, where PD i = probability of default; LGD i = loss given default; EAD i = exposure at default across distributions of these risk factors, informed by models. Devil in the detail: relationship between these three risk factors.
4 Scope of this study For this study, we considered the simplified problem: Assume no population change between training and forecast data (ie IID data). Do not consider inclusion of economic conditions just yet. Show results from both a simulation study plus using real credit card data.
5 The Maths: Defining Loss Consider estimating loss on n accounts in a portfolio. For each account i 1,, n : Let x i be a vector of characteristics of mixed data types. Let Y i 0,1 be default event for account i; 1=default, 0=nondefault. Let L i R be loss-given-default (LGD). Let E i > 0 be exposure-at-default (EAD). Then, total loss on the portfolio is V = n i=1 Y i L i E i. n Then, expected loss is E V = i=1 E Y i L i E i.
6 Introducing the risk models Suppose we have models m 1,m 2,m 3 for probability of default (PD), LGD and log-ead respectively. Hence, P Y i = 1 x i = m 1 x i L i = m 2 x i + ε 2,i log E i = m 3 x i + ε 3,i where ε 2,i and ε 3,i are residual terms.
7 The Maths: Expected Loss Then with change of variables, expected loss E(Y i L i E i ) can be rewritten as m 1 x i m 2 x i + ε 2,i exp m 3 x i + ε 3,i f ε 2,i, ε 3,i Y i = 1, x i dε 2,i dε 3,i which can be approximated using Monte Carlo integration by M EL 1 m M 1 x i m 2 x i + ε 2,i exp m 3 x i + ε 3,i m=1 for random samples ε 2,i, ε 3,i ~f: Assume independence of residuals from x i, ie simulate from the density f ε 2,i, ε 3,i Y i = 1. Estimate using either the empirical distribution or kernel density estimation on training or validation data set. ote: I will not show derivation of these formulae, but these are available upon request by .
8 Quantile estimation of Loss It is valuable to consider the distribution of possible losses, and in particular compute quantiles. This allows confidence intervals (CI) on Loss estimates. The qth quantile v q of V is q = C f v x 1,, x n dv where f is the density over V, conditional on characteristics, and C = v: v v q. ote: here q is known and v q is unknown. For example, to compute a 95%CI, find v q for q = and q = 0.975: v 0.025, v
9 Quantile estimation of Loss using Monte Carlo Using Monte Carlo integration, this integral can be approximated by M n q 1 M I v i m=1 i=1 v q where v i = y i m 2 x i + ε 2,i exp m 3 x i + ε 3,i and random samples y i, ε 2,i, ε 3,i ~f. The loss quantile v q is easily estimated by ranking simulated values n i=1 v i in ascending order and choosing the value at the Mq rank.
10 Quantile estimation: Sampling We need to sample y i, ε 2,i, ε 3,i ~f. 1.otice f y i, ε 2,i, ε 3,i x i = f ε 2,i, ε 3,i y i, x i P y i x i. 2.Hence, for each account i, simulate y i = 0 or 1 from P y i x i = m 1 x i. 3.If y i = 0, it does not matter how ε 2,i, ε 3,i are simulated, since y i = 0 v i = 0, always. 4.If y i = 1, simulate ε 2,i, ε 3,i from f ε 2,i, ε 3,i Y i = 1, assuming that ε 2,i, ε 3,i are independent of x i. 5.The density f ε 2,i, ε 3,i Y i = 1 can be estimated based on a validation data set of previous defaults. Either the empirical distribution or a kernel density estimator (KDE) can be used. ote: it is easy to simulate from a KDE: randomly sample an example from the validation/training data, then add random noise corresponding to the kernel function.
11 Why a simulation study? Simulate credit accounts with default, LGD and EAD outcomes and correlations controlled by different predictor variables. Allows us to control the generating distribution for the data. Allows for testing and debug of models and loss estimation technique, since we know the true values. Endless supply of artificial data allows for repeat experiments and hence samples of results for statistical analysis.
12 Simulation study: Data generation A credit portfolio was simulated with multiple risk factors to simulate default events, LGD and EAD. Risk factors: X1 X2 X3 X4 X5 Default * * * LGD * * * EAD * * All variables are standard normally distributed, All variables are expressed as the sum of an observable and unobservable component; only the observable component can be used in the model built, hence simulating uncertainty. X1 and X2 are common to more than one component, hence inducing a correlation.
13 Simulation study: models and distribution of residuals LGD model R 2 =0.29 Log-EAD model R 2 =0.25 Contour map of density f ε 2,i, ε 3,i Y i = 1 using KDE: LGD residual ε 2,i
14 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5%
15 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5% Main result: Reliable and accurate predictions, but high : +/-10%
16 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5% Increase sample size: more accuracy, but less reliability.
17 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5% Poor models (due to small training set) leads to poor reliability.
18 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5% Accuracy is sensitive to bandwidth in KDE: perhaps just use the empirical distribution for sampling.
19 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5% Using a fixed value for LGD is fine, so long as residual for LGD is used in MC sampling. A similar result when using a fixed value for EAD.
20 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5% Poor PD model (just one predictor variable), leads to poor reliability.
21 Simulation study: results Model details train test EL 95% CI % below Q2.5% (-9.5,+9.8) (-3.1,+3.1) (-9.3,+10.1) 9 8 Bandwidth=high (-10.2,+10.6) 15 0 Fix LGD (-9.5,+9.7) 3 5 o need to include EAD as a predictor variable in the LGD model. Fix LGD, ε 2,i = (-8.5,+8.7) 0 34 Poor PD model (-8.8,+11.6) 46 0 o EAD in LGD model (-9.44,+9.73) 3 3 M=5000 and repeat each experiment 100 times. train, test are numbers of examples in train and test data sets (in 1000 s). EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate. % above Q97.5%
22 UK credit card data study Behavioural data for UK credit cards, observed during Define default as 3 months missed payments within a 12 month period. Predictor variables include client and account ages, application data (employment status, tenure status, months at current address) and behavioural data (balance, utilization, past delinquency). Build simple underlying models for PD using logistic regression, LGD and log-ead using OLS linear regression. Train / test over two different periods:- Data set Observation date train test A July 2008 B September
23 Credit card data: models and distribution of residuals Data set A LGD model R 2 =0.09 Log-EAD model R 2 =0.74 Data set B LGD model R 2 =0.11 Log-EAD model R 2 =0.81 LGD residual ε 2,i LGD residual ε 2,i Contour maps of density f ε 2,i, ε 3,i Y i = 1 using KDE
24 Credit card data study: Results Data set A Data set B Model details EL 95% CI EL 95% CI (-14.7,+20.0) (-17.9,+27.8) Bandwidth=high (-15.2,+20.7) (-19.0,+29.3) Fix LGD (-14.0,+18.3) (-17.7,+28.6) Fix LGD, ε 2,i = (-12.9,+16.4) (-15.1,+21.7) Poor PD model (-15.2,+20.5) (-20.1,+31.9) o EAD in LGD model (-14.4,+19.2) (-18.5,+32.4) M=10000, average over 50 runs with different train / test split. EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate.
25 Credit card data study: Results Data set A Data set B Model details EL 95% CI EL 95% CI (-14.7,+20.0) (-17.9,+27.8) Bandwidth=high (-15.2,+20.7) (-19.0,+29.3) Fix LGD (-14.0,+18.3) (-17.7,+28.6) Fix LGD, ε 2,i = (-12.9,+16.4) (-15.1,+21.7) Poor PD model (-15.2,+20.5) (-20.1,+31.9) o EAD in LGD model Monte Carlo simulation gives accurate EL estimates, on average. However, CI is broad (+/-20%) (-14.4,+19.2) (-18.5,+32.4) M=10000, average over 50 runs with different train / test split. EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate.
26 Credit card data study: Results Data set A Data set B Model details EL 95% CI EL 95% CI (-14.7,+20.0) (-17.9,+27.8) Bandwidth=high (-15.2,+20.7) (-19.0,+29.3) Fix LGD (-14.0,+18.3) (-17.7,+28.6) Fix LGD, ε 2,i = (-12.9,+16.4) (-15.1,+21.7) Poor PD model (-15.2,+20.5) (-20.1,+31.9) o EAD in LGD model Accuracy is sensitive to bandwidth used in KDE (-14.4,+19.2) (-18.5,+32.4) M=10000, average over 50 runs with different train / test split. EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate.
27 Credit card data study: Results Data set A Data set B Model details EL 95% CI EL 95% CI (-14.7,+20.0) (-17.9,+27.8) Bandwidth=high (-15.2,+20.7) (-19.0,+29.3) Fix LGD (-14.0,+18.3) (-17.7,+28.6) Fix LGD, ε 2,i = (-12.9,+16.4) (-15.1,+21.7) Poor PD model (-15.2,+20.5) (-20.1,+31.9) o EAD in LGD model (-14.4,+19.2) (-18.5,+32.4) Accuracy is affected by using a fixed value for LGD. Similar result for EAD. Also, potentially bad result with poor PD model (ie insufficient predictors). M=10000, average over 50 runs with different train / test split. EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate.
28 Credit card data study: Results Data set A Data set B Model details EL 95% CI EL 95% CI (-14.7,+20.0) (-17.9,+27.8) Bandwidth=high (-15.2,+20.7) (-19.0,+29.3) Fix LGD (-14.0,+18.3) (-17.7,+28.6) Fix LGD, ε 2,i = (-12.9,+16.4) (-15.1,+21.7) Poor PD model (-15.2,+20.5) (-20.1,+31.9) o EAD in LGD model (-14.4,+19.2) (-18.5,+32.4) o need to include EAD as a predictor in the LGD model. M=10000, average over 50 runs with different train / test split. EL = % for analytic expected loss estimate, compared to actual loss. = % for Monte Carlo expected loss estimate. 95% CI is % difference from EL estimate.
29 Credit card data: LGD/EAD model residuals When EAD is not explicitly included as a predictor in the LGD model, the correlation between the LGD and log-ead model residuals is stronger, to compensate:- Data set A Data set B LGD residual ε 2,i LGD residual ε 2,i Contour maps of density f ε 2,i, ε 3,i Y i = 1 using KDE
30 Conclusions and future work Monte Carlo simulation can be used to give reliable estimates of Loss, and estimates of in expected loss estimation. But, sensitivity to model risk. Care is needed to ensure the underlying models are correctly specified. Future work:- Test procedure on other data (eg mortgage). Extend the exercise to include dynamic components: environmental/macroeconomic conditions and forecasting. Use reliable prediction techniques (conformal predictors) to output reliable confidence intervals, even with model.
31 Loss Estimation using Monte Carlo Simulation Thank you! I hope you have found this presentation useful. Any questions? Dr Tony Bellotti Senior Lecturer in Statistics Department of Mathematics Imperial College London a.bellotti@imperial.ac.uk Part of the Statistics in Finance Research Group at Imperial College London. Research, Training, Consultancy. ICO:
arxiv: v1 [q-fin.st] 31 May 2017
Identification of Credit Risk Based on Cluster Analysis of Account Behaviours Maha Bakoben 1, 2, Tony Bellotti 1, and Niall Adams 1, 3 1 Department of Mathematics, Imperial College London, London SW7 2AZ,
More informationA Random-effects construction of EMV models - a solution to the Identification problem? Peter E Clarke, Deva Statistical Consulting
A Random-effects construction of EMV models - a solution to the Identification problem? Peter E Clarke, Deva Statistical Consulting Credit Scoring and Credit Control Conference XV Edinburgh, August 2017
More informationIssues using Logistic Regression for Highly Imbalanced data
Issues using Logistic Regression for Highly Imbalanced data Yazhe Li, Niall Adams, Tony Bellotti Imperial College London yli16@imperialacuk Credit Scoring and Credit Control conference, Aug 2017 Yazhe
More informationHow to evaluate credit scorecards - and why using the Gini coefficient has cost you money
How to evaluate credit scorecards - and why using the Gini coefficient has cost you money David J. Hand Imperial College London Quantitative Financial Risk Management Centre August 2009 QFRMC - Imperial
More informationDEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND
DEPARTMENT OF ECONOMICS AND FINANCE COLLEGE OF BUSINESS AND ECONOMICS UNIVERSITY OF CANTERBURY CHRISTCHURCH, NEW ZEALAND Testing For Unit Roots With Cointegrated Data NOTE: This paper is a revision of
More informationQuantifying Weather Risk Analysis
Quantifying Weather Risk Analysis Now that an index has been selected and calibrated, it can be used to conduct a more thorough risk analysis. The objective of such a risk analysis is to gain a better
More informationChapter 11. Regression with a Binary Dependent Variable
Chapter 11 Regression with a Binary Dependent Variable 2 Regression with a Binary Dependent Variable (SW Chapter 11) So far the dependent variable (Y) has been continuous: district-wide average test score
More informationTable of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z).
Table of z values and probabilities for the standard normal distribution. z is the first column plus the top row. Each cell shows P(X z). For example P(X.04) =.8508. For z < 0 subtract the value from,
More informationSolution of the Financial Risk Management Examination
Solution of the Financial Risk Management Examination Thierry Roncalli January 8 th 014 Remark 1 The first five questions are corrected in TR-GDR 1 and in the document of exercise solutions, which is available
More informationInterpreting Regression Results
Interpreting Regression Results Carlo Favero Favero () Interpreting Regression Results 1 / 42 Interpreting Regression Results Interpreting regression results is not a simple exercise. We propose to split
More informationApril Forecast Update for North Atlantic Hurricane Activity in 2019
April Forecast Update for North Atlantic Hurricane Activity in 2019 Issued: 5 th April 2019 by Professor Mark Saunders and Dr Adam Lea Dept. of Space and Climate Physics, UCL (University College London),
More informationINFORMATION VALUE ESTIMATOR FOR CREDIT SCORING MODELS
ECDM Lisbon INFORMATION VALUE ESTIMATOR FOR CREDIT SCORING MODELS Martin Řezáč Dept. of Mathematics and Statistics, Faculty of Science, Masaryk University Introduction Information value is widely used
More informationCalculating credit risk capital charges with the one-factor model
Calculating credit risk capital charges with the one-factor model Susanne Emmer Dirk Tasche September 15, 2003 Abstract Even in the simple Vasicek one-factor credit portfolio model, the exact contributions
More informationTesting for Unit Roots with Cointegrated Data
Discussion Paper No. 2015-57 August 19, 2015 http://www.economics-ejournal.org/economics/discussionpapers/2015-57 Testing for Unit Roots with Cointegrated Data W. Robert Reed Abstract This paper demonstrates
More informationOffice hours: Wednesdays 11 AM- 12 PM (this class preference), Mondays 2 PM - 3 PM (free-for-all), Wednesdays 3 PM - 4 PM (DE class preference)
Review of Probability Theory Tuesday, September 06, 2011 2:05 PM Office hours: Wednesdays 11 AM- 12 PM (this class preference), Mondays 2 PM - 3 PM (free-for-all), Wednesdays 3 PM - 4 PM (DE class preference)
More informationExpected Shortfall is not elicitable so what?
Expected Shortfall is not elicitable so what? Dirk Tasche Bank of England Prudential Regulation Authority 1 dirk.tasche@gmx.net Finance & Stochastics seminar Imperial College, November 20, 2013 1 The opinions
More informationApplied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid
Applied Economics Regression with a Binary Dependent Variable Department of Economics Universidad Carlos III de Madrid See Stock and Watson (chapter 11) 1 / 28 Binary Dependent Variables: What is Different?
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March Presented by: Tanya D. Havlicek, ACAS, MAAA ANTITRUST Notice The Casualty Actuarial Society is committed
More informationOn Backtesting Risk Measurement Models
On Backtesting Risk Measurement Models Hideatsu Tsukahara Department of Economics, Seijo University e-mail address: tsukahar@seijo.ac.jp 1 Introduction In general, the purpose of backtesting is twofold:
More informationRegression. Simple Linear Regression Multiple Linear Regression Polynomial Linear Regression Decision Tree Regression Random Forest Regression
Simple Linear Multiple Linear Polynomial Linear Decision Tree Random Forest Computational Intelligence in Complex Decision Systems 1 / 28 analysis In statistical modeling, regression analysis is a set
More informationMarginal Screening and Post-Selection Inference
Marginal Screening and Post-Selection Inference Ian McKeague August 13, 2017 Ian McKeague (Columbia University) Marginal Screening August 13, 2017 1 / 29 Outline 1 Background on Marginal Screening 2 2
More informationDynamic survival models for credit risks.
Dynamic survival models for credit risks. Viani Djeundje and Jonathan Crook Credit Research Centre, University of Edinburgh Abstract Single event survival models predict the probability that an event will
More informationModelling Under Risk and Uncertainty
Modelling Under Risk and Uncertainty An Introduction to Statistical, Phenomenological and Computational Methods Etienne de Rocquigny Ecole Centrale Paris, Universite Paris-Saclay, France WILEY A John Wiley
More informationstatistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI
statistical methods for tailoring seasonal climate forecasts Andrew W. Robertson, IRI tailored seasonal forecasts why do we make probabilistic forecasts? to reduce our uncertainty about the (unknown) future
More informationMaking sense of Econometrics: Basics
Making sense of Econometrics: Basics Lecture 4: Qualitative influences and Heteroskedasticity Egypt Scholars Economic Society November 1, 2014 Assignment & feedback enter classroom at http://b.socrative.com/login/student/
More informationInternal vs. external validity. External validity. This section is based on Stock and Watson s Chapter 9.
Section 7 Model Assessment This section is based on Stock and Watson s Chapter 9. Internal vs. external validity Internal validity refers to whether the analysis is valid for the population and sample
More informationSPECIFICATION TESTS IN PARAMETRIC VALUE-AT-RISK MODELS
SPECIFICATION TESTS IN PARAMETRIC VALUE-AT-RISK MODELS J. Carlos Escanciano Indiana University, Bloomington, IN, USA Jose Olmo City University, London, UK Abstract One of the implications of the creation
More informationModeling Uncertainty in the Earth Sciences Jef Caers Stanford University
Probability theory and statistical analysis: a review Modeling Uncertainty in the Earth Sciences Jef Caers Stanford University Concepts assumed known Histograms, mean, median, spread, quantiles Probability,
More informationRegression: Main Ideas Setting: Quantitative outcome with a quantitative explanatory variable. Example, cont.
TCELL 9/4/205 36-309/749 Experimental Design for Behavioral and Social Sciences Simple Regression Example Male black wheatear birds carry stones to the nest as a form of sexual display. Soler et al. wanted
More informationThe Dark Corners of the Labor Market
The Dark Corners of the Labor Market Vincent Sterk Conference on Persistent Output Gaps: Causes and Policy Remedies EABCN / University of Cambridge / INET University College London September 2015 Sterk
More informationLecture 9. Matthew Osborne
Lecture 9 Matthew Osborne 22 September 2006 Potential Outcome Model Try to replicate experimental data. Social Experiment: controlled experiment. Caveat: usually very expensive. Natural Experiment: observe
More informationModel generation and model selection in credit scoring
Model generation and model selection in credit scoring Vadim STRIJOV Russian Academy of Sciences Computing Center EURO 2010 Lisbon July 14 th The workflow Client s application & history Client s score:
More informationSIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS
SIMULATION-BASED SENSITIVITY ANALYSIS FOR MATCHING ESTIMATORS TOMMASO NANNICINI universidad carlos iii de madrid UK Stata Users Group Meeting London, September 10, 2007 CONTENT Presentation of a Stata
More informationSCHOOL OF MATHEMATICS AND STATISTICS
RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester
More informationDevelopment. ECON 8830 Anant Nyshadham
Development ECON 8830 Anant Nyshadham Projections & Regressions Linear Projections If we have many potentially related (jointly distributed) variables Outcome of interest Y Explanatory variable of interest
More informationDevelopment of Stochastic Artificial Neural Networks for Hydrological Prediction
Development of Stochastic Artificial Neural Networks for Hydrological Prediction G. B. Kingston, M. F. Lambert and H. R. Maier Centre for Applied Modelling in Water Engineering, School of Civil and Environmental
More information36-309/749 Experimental Design for Behavioral and Social Sciences. Sep. 22, 2015 Lecture 4: Linear Regression
36-309/749 Experimental Design for Behavioral and Social Sciences Sep. 22, 2015 Lecture 4: Linear Regression TCELL Simple Regression Example Male black wheatear birds carry stones to the nest as a form
More informationExam Applied Statistical Regression. Good Luck!
Dr. M. Dettling Summer 2011 Exam Applied Statistical Regression Approved: Tables: Note: Any written material, calculator (without communication facility). Attached. All tests have to be done at the 5%-level.
More informationFE670 Algorithmic Trading Strategies. Stevens Institute of Technology
FE670 Algorithmic Trading Strategies Lecture 3. Factor Models and Their Estimation Steve Yang Stevens Institute of Technology 09/12/2012 Outline 1 The Notion of Factors 2 Factor Analysis via Maximum Likelihood
More informationAsymptotic behaviour of multivariate default probabilities and default correlations under stress
Asymptotic behaviour of multivariate default probabilities and default correlations under stress 7th General AMaMeF and Swissquote Conference EPFL, Lausanne Natalie Packham joint with Michael Kalkbrener
More informationHypothesis testing Goodness of fit Multicollinearity Prediction. Applied Statistics. Lecturer: Serena Arima
Applied Statistics Lecturer: Serena Arima Hypothesis testing for the linear model Under the Gauss-Markov assumptions and the normality of the error terms, we saw that β N(β, σ 2 (X X ) 1 ) and hence s
More informationBayesian Methods in Multilevel Regression
Bayesian Methods in Multilevel Regression Joop Hox MuLOG, 15 september 2000 mcmc What is Statistics?! Statistics is about uncertainty To err is human, to forgive divine, but to include errors in your design
More informationIndependent and conditionally independent counterfactual distributions
Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views
More informationRobustness to Parametric Assumptions in Missing Data Models
Robustness to Parametric Assumptions in Missing Data Models Bryan Graham NYU Keisuke Hirano University of Arizona April 2011 Motivation Motivation We consider the classic missing data problem. In practice
More informationCredit risk modeling using a weighted support vector machine
UNIVERSITEIT UTRECHT MASTER THESIS Credit risk modeling using a weighted support vector machine Author: Jesper DE GROOT Supervisor: Prof. Dr. Jason FRANK Dr. Diederik FOKKEMA A thesis submitted in fulfillment
More informationTesting for Regime Switching in Singaporean Business Cycles
Testing for Regime Switching in Singaporean Business Cycles Robert Breunig School of Economics Faculty of Economics and Commerce Australian National University and Alison Stegman Research School of Pacific
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationPart 6: Multivariate Normal and Linear Models
Part 6: Multivariate Normal and Linear Models 1 Multiple measurements Up until now all of our statistical models have been univariate models models for a single measurement on each member of a sample of
More informationCS6220: DATA MINING TECHNIQUES
CS6220: DATA MINING TECHNIQUES Matrix Data: Prediction Instructor: Yizhou Sun yzsun@ccs.neu.edu September 21, 2015 Announcements TA Monisha s office hour has changed to Thursdays 10-12pm, 462WVH (the same
More informationAccuracy of models with heterogeneous agents
Accuracy of models with heterogeneous agents Wouter J. Den Haan London School of Economics c by Wouter J. Den Haan Introduction Models with heterogeneous agents have many different dimensions Krusell-Smith
More informationinterval forecasting
Interval Forecasting Based on Chapter 7 of the Time Series Forecasting by Chatfield Econometric Forecasting, January 2008 Outline 1 2 3 4 5 Terminology Interval Forecasts Density Forecast Fan Chart Most
More informationSimulation-based robust IV inference for lifetime data
Simulation-based robust IV inference for lifetime data Anand Acharya 1 Lynda Khalaf 1 Marcel Voia 1 Myra Yazbeck 2 David Wensley 3 1 Department of Economics Carleton University 2 Department of Economics
More informationSampling and Sample Size. Shawn Cole Harvard Business School
Sampling and Sample Size Shawn Cole Harvard Business School Calculating Sample Size Effect Size Power Significance Level Variance ICC EffectSize 2 ( ) 1 σ = t( 1 κ ) + tα * * 1+ ρ( m 1) P N ( 1 P) Proportion
More informationQuantile POD for Hit-Miss Data
Quantile POD for Hit-Miss Data Yew-Meng Koh a and William Q. Meeker a a Center for Nondestructive Evaluation, Department of Statistics, Iowa State niversity, Ames, Iowa 50010 Abstract. Probability of detection
More informationCSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015
CSE446: Linear Regression Regulariza5on Bias / Variance Tradeoff Winter 2015 Luke ZeElemoyer Slides adapted from Carlos Guestrin Predic5on of con5nuous variables Billionaire says: Wait, that s not what
More informationStandard Error of Technical Cost Incorporating Parameter Uncertainty
Standard Error of Technical Cost Incorporating Parameter Uncertainty Christopher Morton Insurance Australia Group This presentation has been prepared for the Actuaries Institute 2012 General Insurance
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, 2013
Bayesian Methods Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2013 1 What about prior n Billionaire says: Wait, I know that the thumbtack is close to 50-50. What can you
More informationDecision Trees: Overfitting
Decision Trees: Overfitting Emily Fox University of Washington January 30, 2017 Decision tree recap Loan status: Root 22 18 poor 4 14 Credit? Income? excellent 9 0 3 years 0 4 Fair 9 4 Term? 5 years 9
More informationLosses Given Default in the Presence of Extreme Risks
Losses Given Default in the Presence of Extreme Risks Qihe Tang [a] and Zhongyi Yuan [b] [a] Department of Statistics and Actuarial Science University of Iowa [b] Smeal College of Business Pennsylvania
More informationON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT
ON THE CONSEQUENCES OF MISSPECIFING ASSUMPTIONS CONCERNING RESIDUALS DISTRIBUTION IN A REPEATED MEASURES AND NONLINEAR MIXED MODELLING CONTEXT Rachid el Halimi and Jordi Ocaña Departament d Estadística
More informationEconometrics with Observational Data. Introduction and Identification Todd Wagner February 1, 2017
Econometrics with Observational Data Introduction and Identification Todd Wagner February 1, 2017 Goals for Course To enable researchers to conduct careful quantitative analyses with existing VA (and non-va)
More informationEMERGING MARKETS - Lecture 2: Methodology refresher
EMERGING MARKETS - Lecture 2: Methodology refresher Maria Perrotta April 4, 2013 SITE http://www.hhs.se/site/pages/default.aspx My contact: maria.perrotta@hhs.se Aim of this class There are many different
More informationRegression Discontinuity Designs.
Regression Discontinuity Designs. Department of Economics and Management Irene Brunetti ireneb@ec.unipi.it 31/10/2017 I. Brunetti Labour Economics in an European Perspective 31/10/2017 1 / 36 Introduction
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationAsymptotic distribution of the sample average value-at-risk
Asymptotic distribution of the sample average value-at-risk Stoyan V. Stoyanov Svetlozar T. Rachev September 3, 7 Abstract In this paper, we prove a result for the asymptotic distribution of the sample
More informationStochastic optimization - how to improve computational efficiency?
Stochastic optimization - how to improve computational efficiency? Christian Bucher Center of Mechanics and Structural Dynamics Vienna University of Technology & DYNARDO GmbH, Vienna Presentation at Czech
More informationRailway suicide clusters: how common are they and what predicts them? Lay San Too Jane Pirkis Allison Milner Lyndal Bugeja Matthew J.
Railway suicide clusters: how common are they and what predicts them? Lay San Too Jane Pirkis Allison Milner Lyndal Bugeja Matthew J. Spittal Overview Background Aims Significance Methods Results Conclusions
More informationApril Forecast Update for Atlantic Hurricane Activity in 2016
April Forecast Update for Atlantic Hurricane Activity in 2016 Issued: 5 th April 2016 by Professor Mark Saunders and Dr Adam Lea Dept. of Space and Climate Physics, UCL (University College London), UK
More informationData Uncertainty, MCML and Sampling Density
Data Uncertainty, MCML and Sampling Density Graham Byrnes International Agency for Research on Cancer 27 October 2015 Outline... Correlated Measurement Error Maximal Marginal Likelihood Monte Carlo Maximum
More informationTreatment Effects. Christopher Taber. September 6, Department of Economics University of Wisconsin-Madison
Treatment Effects Christopher Taber Department of Economics University of Wisconsin-Madison September 6, 2017 Notation First a word on notation I like to use i subscripts on random variables to be clear
More informationUNCERTAINTY OF COMPLEX SYSTEMS BY MONTE CARLO SIMULATION
16TH NORTH SEA FLOW MEASUREMENT WORKSHOP 1998 Gleneagles, 6-9 October 1998 UNCERTAINTY OF COMPLEX SYSTEMS BY MONTE CARLO SIMULATION Mr Martin Basil, FLOW Ltd Mr Andrew W Jamieson, Shell UK Exploration
More informationMachine Learning CSE546 Carlos Guestrin University of Washington. September 30, What about continuous variables?
Linear Regression Machine Learning CSE546 Carlos Guestrin University of Washington September 30, 2014 1 What about continuous variables? n Billionaire says: If I am measuring a continuous variable, what
More informationExpected Shortfall is not elicitable so what?
Expected Shortfall is not elicitable so what? Dirk Tasche Bank of England Prudential Regulation Authority 1 dirk.tasche@gmx.net Modern Risk Management of Insurance Firms Hannover, January 23, 2014 1 The
More informationReserving for multiple excess layers
Reserving for multiple excess layers Ben Zehnwirth and Glen Barnett Abstract Patterns and changing trends among several excess-type layers on the same business tend to be closely related. The changes in
More informationGenerated Covariates in Nonparametric Estimation: A Short Review.
Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated
More informationLM threshold unit root tests
Lee, J., Strazicich, M.C., & Chul Yu, B. (2011). LM Threshold Unit Root Tests. Economics Letters, 110(2): 113-116 (Feb 2011). Published by Elsevier (ISSN: 0165-1765). http://0- dx.doi.org.wncln.wncln.org/10.1016/j.econlet.2010.10.014
More informationIntroduction to Logistic Regression
Introduction to Logistic Regression Problem & Data Overview Primary Research Questions: 1. What are the risk factors associated with CHD? Regression Questions: 1. What is Y? 2. What is X? Did player develop
More information1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects.
1. How can you tell if there is serial correlation? 2. AR to model serial correlation. 3. Ignoring serial correlation. 4. GLS. 5. Projects. 1) Identifying serial correlation. Plot Y t versus Y t 1. See
More informationStatistics 572 Semester Review
Statistics 572 Semester Review Final Exam Information: The final exam is Friday, May 16, 10:05-12:05, in Social Science 6104. The format will be 8 True/False and explains questions (3 pts. each/ 24 pts.
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationCalculating credit risk capital charges with the one-factor model
Calculating credit risk capital charges with the one-factor model arxiv:cond-mat/0302402v5 [cond-mat.other] 4 Jan 2005 Susanne Emmer Dirk Tasche Dr. Nagler & Company GmbH, Maximilianstraße 47, 80538 München,
More informationProbabilistic Index Models
Probabilistic Index Models Jan De Neve Department of Data Analysis Ghent University M3 Storrs, Conneticut, USA May 23, 2017 Jan.DeNeve@UGent.be 1 / 37 Introduction 2 / 37 Introduction to Probabilistic
More informationBayesian Semiparametric GARCH Models
Bayesian Semiparametric GARCH Models Xibin (Bill) Zhang and Maxwell L. King Department of Econometrics and Business Statistics Faculty of Business and Economics xibin.zhang@monash.edu Quantitative Methods
More informationMeasuring Scorecard Performance
Measuring corecard Performance Zheng Yang, Yue Wang, Yu Bai, and Xin Zhang Colledge of Economics, ichuan University Chengdu, ichuan, 610064, China Yangzheng9@16.com Abstract. In this paper, we look at
More informationLARGE NUMBERS OF EXPLANATORY VARIABLES. H.S. Battey. WHAO-PSI, St Louis, 9 September 2018
LARGE NUMBERS OF EXPLANATORY VARIABLES HS Battey Department of Mathematics, Imperial College London WHAO-PSI, St Louis, 9 September 2018 Regression, broadly defined Response variable Y i, eg, blood pressure,
More informationTime Series Models for Measuring Market Risk
Time Series Models for Measuring Market Risk José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department June 28, 2007 1/ 32 Outline 1 Introduction 2 Competitive and collaborative
More informationLecture 6: Linear Regression (continued)
Lecture 6: Linear Regression (continued) Reading: Sections 3.1-3.3 STATS 202: Data mining and analysis October 6, 2017 1 / 23 Multiple linear regression Y = β 0 + β 1 X 1 + + β p X p + ε Y ε N (0, σ) i.i.d.
More informationmultilevel modeling: concepts, applications and interpretations
multilevel modeling: concepts, applications and interpretations lynne c. messer 27 october 2010 warning social and reproductive / perinatal epidemiologist concepts why context matters multilevel models
More informationQuantifying Stochastic Model Errors via Robust Optimization
Quantifying Stochastic Model Errors via Robust Optimization IPAM Workshop on Uncertainty Quantification for Multiscale Stochastic Systems and Applications Jan 19, 2016 Henry Lam Industrial & Operations
More informationIdentification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case
Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)
More informationIdentification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case
Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College December 2016 Abstract Lewbel (2012) provides an estimator
More informationEconometrics 2, Class 1
Econometrics 2, Class Problem Set #2 September 9, 25 Remember! Send an email to let me know that you are following these classes: paul.sharp@econ.ku.dk That way I can contact you e.g. if I need to cancel
More informationdata lam=36.9 lam=6.69 lam=4.18 lam=2.92 lam=2.21 time max wavelength modulus of max wavelength cycle
AUTOREGRESSIVE LINEAR MODELS AR(1) MODELS The zero-mean AR(1) model x t = x t,1 + t is a linear regression of the current value of the time series on the previous value. For > 0 it generates positively
More informationRobust Backtesting Tests for Value-at-Risk Models
Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society
More informationParameterized Expectations Algorithm
Parameterized Expectations Algorithm Wouter J. Den Haan London School of Economics c by Wouter J. Den Haan Overview Two PEA algorithms Explaining stochastic simulations PEA Advantages and disadvantages
More informationRelated Concepts: Lecture 9 SEM, Statistical Modeling, AI, and Data Mining. I. Terminology of SEM
Lecture 9 SEM, Statistical Modeling, AI, and Data Mining I. Terminology of SEM Related Concepts: Causal Modeling Path Analysis Structural Equation Modeling Latent variables (Factors measurable, but thru
More informationBAYESIAN DECISION THEORY
Last updated: September 17, 2012 BAYESIAN DECISION THEORY Problems 2 The following problems from the textbook are relevant: 2.1 2.9, 2.11, 2.17 For this week, please at least solve Problem 2.3. We will
More informationLECTURE 5. Introduction to Econometrics. Hypothesis testing
LECTURE 5 Introduction to Econometrics Hypothesis testing October 18, 2016 1 / 26 ON TODAY S LECTURE We are going to discuss how hypotheses about coefficients can be tested in regression models We will
More information