Reduction of Uncertainty in Post-Event Seismic Loss Estimates Using Observation Data and Bayesian Updating

Reduction of Uncertainty in Post-Event Seisic Loss Estiates Using Observation Data and Bayesian Updating Maura Torres Subitted in partial fulfillent of the requireents for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences Colubia University 2017

Abstract Reduction of Uncertainty in Post-Event Loss Estiates Through Observation Data Maura Torres The insurance industry relies on both coercial and in-house software packages to quantify financial risk to natural hazards. For earthquakes, the initial loss estiates fro the industry s catastrophe risk (CAT) odels are based on the probabilistic daage a building would sustain due to a catalog of siulated earthquake events. Based on the occurrence rates of the siulated earthquake events, an exceedance probability (EP) curve is calculated, which provides the probability of exceeding a specific loss threshold. Initially these loss exceedence probabilities help a copany decide what insurance policies are ost cost efficient. In addition they can also provide insights into loss predictions in the event that an actual natural disaster takes place, thus they are prepared to pay out their insured parties the necessary aount. However, there is always an associated uncertainty with the loss calculations produced by these odels. The goal of this research is to reduce this uncertainty by using Bayesian inference with real tie earthquake data to calculate an updated loss. Bayes theory is an iterative process that odifies the loss distribution with every piece of incoing inforation. The posterior updates are calculated by ultiplying a baseline prior distribution with a likelihood function and noralization factor. The first prior is the initial loss distribution fro the siulated events database before any inforation about a real earthquake is available. The crucial step in the update procedure is defining a likelihood function that establishes a relative weight for each siulated earthquake, relating how alike or dislike the attributes of a siulated earthquake are to those of a real earthquake event. To define this likelihood function, the general proposed approach is to quantify real tie earthquake attributes such as agnitude, location and daage, and copare the to an equivalent value for each siulated earthquake fro the CAT odel database. In order to obtain the siulated odel paraeters, the catastrophe risk odel is analyzed for different

building construction types, such as steel and reinforced concrete. For every odel case, the loss, peak ground acceleration per building and siulated event agnitude and locations are recorded. Next, in order to calculate the real earthquake attributes, data was collected for two case studies, the 7.1 agnitude 1997 Punitaqui and the 8.8 agnitude 2010 Chile earthquake. For each of these real earthquake events, the agnitude, location, peak ground acceleration at every available acceleroeter location, and qualitative daage descriptions were recorded. Once the data was collected for both the real and siulated events, they were quantified so they could be copared on equal scales. Using the quantified paraeter values, a likelihood function was defined for each update step. In general, as the nuber of updates increased, the loss estiates tended to converge to a steady value for both the ediu and large event. In addition, the loss for the 7.1 event converged to a saller value than that of the 8.8 event. The proposed ethodology was only applied to earthquakes, but is broad enough to be applied to any type of peril.

Contents List of Tables iv List of Figures v 1 Introduction 1 1.1 Past Applications of Bayesian Theory...................... 1 1.2 Research Objectives................................ 3 1.3 Catastrophe Risk Models............................. 5 2 Methodology at the Conceptual Level 8 2.1 Utility Theory................................... 8 2.2 Bayesian Fraework............................... 11 2.3 Likelihood Function Definitions......................... 13 2.4 Bayesian Update Exaple............................ 14 3 Methodology at the Application Level 19 3.1 Potential Fors of Reported Earthquake Data................. 19 3.2 Required Analysis Input............................. 21 3.3 Analysis Procedure................................ 22 3.3.1 Gather/Generate Model Input Data................... 24 3.3.2 Preprocess Input............................. 25 3.3.2.1 Scaled Magnitude and Location Input............ 25 3.3.2.2 Peak Ground Acceleration................... 26 i

CONTENTS 3.3.2.3 Scaled Daage Update Input Data.............. 27 3.3.2.4 Building Tagging Data..................... 29 3.3.3 Establish A Baseline Probability Density Function........... 30 3.3.4 Define a Likelihood Function....................... 31 3.3.5 Calculate Posterior PDF......................... 34 3.3.6 Calculate Updated Loss......................... 35 4 Nuerical Results 38 4.1 Case Studies.................................... 38 4.1.1 Maule Chile 2010 Earthquake...................... 38 4.1.2 Punitaqui Chile 1997 Earthquake.................... 38 4.1.3 Northridge 1994 Earthquake....................... 39 4.2 Real Earthquake Paraeters.......................... 39 4.2.1 Magnitude and Location......................... 40 4.2.2 Peak Ground Acceleration........................ 40 4.2.3 Reported Daage............................. 40 4.2.4 Building Tagging............................. 41 4.2.5 Suary of Collected Real Paraeters................. 42 4.2.5.1 Chronology of Reported Loss................. 42 4.3 Siulated Earthquake Paraeters........................ 50 4.3.1 CAT Model Analysis........................... 50 4.3.2 CAT Model Earthquake Catalog..................... 52 4.3.3 Model Daage Output.......................... 53 4.3.4 General Loss Trends........................... 53 4.4 Nuerical Results for Case Studies....................... 56 4.4.1 Chile 2010 EQ.............................. 57 4.4.2 Punitaqui 1997 EQ............................ 58 4.4.3 1994 Northridge Earthquake....................... 60 ii

CONTENTS 5 Discussion 63 5.1 Challenges..................................... 63 5.2 Sensitivity Analysis Tagging Factors...................... 64 5.3 Effect of Update Order on Final Posterior Distribution............ 65 6 Conclusions 68 6.1 Conclusion..................................... 68 6.2 Future Work.................................... 68 7 Bibliography 69 Appendices 73 A Notation 74 B List of Equations 77 iii

List of Tables 2.1 Model Losses................................... 14 2.2 Bayesian Loss Estiation Exaple....................... 18 3.1 Types of Earthquake Data............................ 21 3.2 Real and Siulated Input Paraeters...................... 23 4.1 Chile 2010 Magnitude and Location [1]..................... 42 4.2 Chile 1997 Magnitude and Location [2]..................... 42 4.3 Northridge 1994 Magnitude and Location [3].................. 43 4.4 Chile 2010 PGA [4]................................ 43 4.5 Chile 2010 PGA(continued) [4]......................... 44 4.6 Chile 1997 PGA [5]................................ 44 4.7 Northridge 1994 PGA [3]............................. 45 4.8 Northridge 1994 PGA Cont. [3]......................... 46 4.9 Chile 2010 Daage Reports........................... 47 4.10 Chile 1997 Daage Reports [6] [7]........................ 48 4.11 Northridge 1994 Tagging [8]........................... 49 4.12 Notional Building Paraeters.......................... 52 4.13 Notional Building Paraeters.......................... 53 4.14 Ascending Daage Chile............................. 54 4.15 Ascending Daage California.......................... 55 iv

List of Figures 2.1 Prior Probability Density Function....................... 15 2.2 Likelihood Probability Density Function.................... 16 2.3 Posterior Probability Density Function..................... 16 2.4 Noralized Posterior Probability Density Function.............. 17 3.1 Analysis Procedure Flowchart.......................... 24 3.2 General Shape of Noral Likelihood Function................. 32 3.3 Loss Confidence Intervals............................. 36 4.1 Chile CAT odel building location....................... 51 4.2 California CAT odel building location..................... 51 4.3 Maxiu DI Per Building Class Chile..................... 55 4.4 Maxiu DI Per Building Class Northridge.................. 56 4.5 Chile CAT odel EQ s.............................. 57 4.6 California CAT odel EQ s........................... 57 4.7 Chile 2010 Weighted Average Loss........................ 59 4.8 Chile 2010 Confidence intervals......................... 59 4.9 Chile 1997 Confidence intervals......................... 61 4.10 Chile 1997 Weighted Average Loss........................ 61 4.11 Northridge 1994 Average Loss.......................... 62 4.12 Northridge 1994 Confidence intervals...................... 62 5.1 Sensitivity Analysis................................ 65 v

LIST OF FIGURES 5.2 Effect of Update Order.............................. 67 vi

List of Equations 2.1 General Utility Theory............................... 9 2.2 Utility Theory.................................... 10 2.3 Bayes Theore................................... 11 2.4 Total Probability.................................. 12 2.5 Utility Theory Bayesian Adaption......................... 17 3.1 Scaled Magnitude.................................. 25 3.2 Haversine Forula.................................. 25 3.3 Scaled Distance................................... 26 3.4 Recorded PGA.................................... 26 3.6 Average Daage Index............................... 28 3.7 Daage Index.................................... 28 3.8 Siulated Earthquake................................ 28 3.9 Global Loss Matrix................................. 29 3.10 Siulated Daage Index.............................. 29 3.11 Building Tagging Real Earthquake Paraeter.................. 30 3.12 Building Tagging Siulated Earthquake Paraeter............... 30 3.13 Prior PDF...................................... 31 3.14 Gaussian PDF.................................... 32 3.15 Likelihood Function................................. 32 3.16 Bivariate Noral 1................................. 33 3.17 Bivariate Noral 2................................. 33 vii

LIST OF EQUATIONS 3.18 Bivariate Noral: Covariance and Mean..................... 33 3.19 Covariance...................................... 33 3.20 Input Likelihood Function............................. 33 3.21 Evaluating likelihood Function........................... 34 3.22 Posterior PDF.................................... 34 3.23 Recursive Posterior PDF.............................. 35 3.24 Total Loss per EQ Event.............................. 35 3.25 CDF Calculation.................................. 36 3.26 Loss Confidence Intervals (CI)........................... 36 3.27 Average Loss Calculation.............................. 37 5.1 First Update..................................... 65 5.2 Second Update.................................... 66 5.4 Update Step M................................... 67 viii

Acknowledgeents I want to thank y faily and boyfriend for their support and encourageent during y entire educational journey. I would also like to acknowledge y advisor for being patient and a great entor throughout y entire stay at Colubia. He has been and will be the best boss I ll ever have. Not only have I had great guidance through y Ph.D. journey, but also an excellent project as well. Guy Carpenter gave e the opportunity to solve a challenging and stiulating research proble. Everyone in the copany was helpful and served as additional entor on y project. Madeleine, Guillero, and Sergio were especially helpful because they provided feedback and guidance during y internship with Guy Carpenter and any other tie I needed the. As a first generation college student it is a blessing to have received an ivy league education for free with the help of so any intelligent and genuinely kind people. ix

Chapter 1 Introduction 1.1 Past Applications of Bayesian Theory Bayesian Inference is a powerful tool that is used in ultiple fields to help build predictive odels for different phenoena. The basic preise behind Bayes theory is that there exists soe prior distribution or probability that soething will occur. Then soe new inforation becoes available that is relevant to the previous distribution. Using the new data, the prior distribution can be odified via a likelihood function to calculate an iproved posterior distribution. This general fraework can be applied to refine initial estiates of any given paraeter by using pertinent data as it becoes available. A plethora of Bayesian inference applications exist in technical fields like science and engineering, but there have also been applications in social science and even politics. One political application of Bayes theory was to forecast a winner in a presidential election. Linzer defined a ethodology that used a cobination of existing early forecasting odels (prior) and real tie election polls (new inforation) to predict the 2008 presidential election results [9]. He estiates the Bayesian odel with a Markov Chain Monte Carlo sapling procedure to ipleent the forecast odel. Election polls were used 6 onths prior to the election and updates were calculated every two weeks, incorporating the ost recent polls 1

CHAPTER 1. INTRODUCTION results for each state. Using his odel Linzer was able to successfully predict the election of Barack Obaa in the 2008 presidential election. Bayesian analysis can help organizations ake infored decisions that itigate exposure to natural disasters. One application in risk analysis used real tie flood forecasting for the river Rhine [10]. In this case the odeled prior phenoena was the probability of exceeding a certain critical water level at a given location and tie. The prior distributions were constructed using a linear regression odels that used water levels of upstrea stations. By using inforation on real observed water heights, an iproved estiate of posterior water levels were predicted that could be used for risk assessents. There have also been any applications to predict losses of buildings subjected to earthquake hazards. Usually this past research involving Bayesian loss updates tend to focus on one building at a tie and defines an earthquake odel, a structural odel, subjects it to a siulated event and then easures the response of different coponents to asses daage and a onetary loss. One Bayesian odel predicted daage and loss of a seven-story reinforced concrete oent-frae building subjected to the 1971 San Fernando Earthquake [11]. Initially, the analysis used a structural analysis odel to calculate predicted displaceents due to the earthquake. Then it incorporated real tie displaceent and drift of an instruented building via Bayes theore to calculate an iproved posterior predicted response. Another application used reported insured onetary building loss to iprove prior estiates of average losses for buildings in a given zip code when subjected to the 1994 Northridge earthquake [12]. These predictive odels show the versatility of using a Bayesian fraework to iprove estiates in ultiple fields using any and all available inforation that can infor the initial prior estiates. The proposed research ethodology will generalize a Bayesian analysis application that can take into account ultiple types of input data to iprove initial loss estiate for a given portfolio of buildings due to a real tie earthquake. 2

CHAPTER 1. INTRODUCTION 1.2 Research Objectives The goal of this research is to estiate the onetary loss a portfolio of insured buildings will sustain after a ajor earthquake event by using all available earthquake attributes. This inforation is crucial for insurance copanies after an earthquake strikes because they need to ensure they have enough oney to pay their insured parties for their daages. Traditionally, insurance copanies provide financial protection via contracts or policies that proise partial or full reiburseent for a loss due to different kinds of incidents like work injuries, car accidents, or natural disasters. [13] The goal of an insurance policy is to share financial risk between an insured party and an insurance copany. Usually the risk holder is either a person or a business but it can also be another insurance copany, in which case the insurance copany is insured by a (re)insurance copany. Insurance copanies can choose to purchase (re)insurance to itigate the risk that coes with insuring a large portfolio of buildings for natural disasters. In the case of a large destructive event the (re)insurance copany would pay the insurance copany, according to their policy, so they can pay all of their insured parties. In order to ake an infored decision about what policies are ost cost effective (re)insurance copanies use catastrophe risk (CAT) odels. CAT odels estiate the aount of daage buildings within a portfolio will sustain when exposed to specific hazards and the resulting onetary loss given different policies. The odel outputs average expected losses as well as the probability that a certain onetary loss will be exceeded. These odels results helps a copany choose a policy that provides the desired coverage for a specific level of projected risk. The insurance copany decides how uch they re willing to pay for an insurance so that the probability that they will exceed a target loss is below a desirable threshold. CAT odels help an insurance copany chose an initial policy coverage but can also help when a real natural disaster occurs. For exaple, in the event of a real earthquake the insured copany needs an accurate loss estiate because they need to have enough oney to 3

CHAPTER 1. INTRODUCTION pay all their insured parties. The odel losses due to the set of CAT siulated earthquakes along with attributes about the real earthquake can be use used to calculate a new loss for the real event. It is expected that as additional inforation becoes available about the real earthquake the uncertainty in loss estiates will be reduced. As discussed in the previous section, Bayesian analysis is a powerful tool that can help solve this very proble of updating an initial estiate based on available data. Although previous applications have been used to calculate risk for flood analysis and loss for individual buildings, applications to calculating loss for a large set of buildings has not been explored. The proposed approach is to calculate a new loss based on the initial CAT odel loss estiates for the siulated events and real earthquake paraeters to calculate a new posterior loss estiate. The final posterior loss will be a cobination of all the catalog losses, where each event contributes an aount proportional to its siilarities with that of the real event. To quantify this contribution a coparison is established, where attributes of the real event are quantified and copared to equivalent attributes of the siulated events. Once this coparison is defined the relative loss contribution of each siulated event to the expected loss due the real event can be calculated. Assuing that all siulated catalog events are utually exclusive and collectively exhaustive, the su of the loss contributions will add up to unity. The calculated loss contribution of each siulated event can be interpreted as the relatively probability that an event like the siulated will occur, given the real event. Any available attribute of the real event can be used as a basis of coparison but as a start only agnitude, location,peak ground acceleration, building tagging and daage inforation were used. Once the coparison is coplete an updated probability of occurrence can be calculated for each siulated event. The loss due to each siulated event is given by the initial CAT odel analysis, and its contribution to the total loss will change with each update step. While the proposed fraework is general enough to be applied to any peril, earthquake risk is used as a prototype for the initial developent process. As additional attributes describing the real earthquake becoe available, the updated 4

CHAPTER 1. INTRODUCTION estiate based on the CAT odel loss output can be refined, thereby reducing the level of uncertainty. Since incoing inforation will not be available iediately after the earthquake strikes, there will be a subsequent loss update when each additional attribute of the real earthquake becoes available. Before a loss update can be calculated, the CAT odel analysis is required, thus a general overview of the odel ethodology will be described in the following section. 1.3 Catastrophe Risk Models CAT odels are iportant tools for (re)insurance copanies that help establish what the ost probable financial risk is for different natural disasters. There are both private and open source CAT odels that odel different perils for different geographical regions. The proposed Bayesian update ethodology can be applied any odel, even open source ones like Global Earthquake Model. Given certain inputs these odels can calculate a loss a set of assets will sustain due to a peril in a specific region. In general the seisic CAT odels are coposed of four ain odules: a stochastic event odule, a hazard odule, a vulnerability odule, and a financial analysis odule. Before the analysis is conducted there is set of user inputs that need to be defined. The building portfolio ust be defined, which includes the physical location or coordinates of the structure as well as building configurations such as construction aterials, year built and occupancy. The second user input includes inforation about the specific policy that the buildings are insured with, such as preiu and deductibles. Once these initial inputs are defined, the CAT odel can begin analyzing the buildings risk due to seisic events. The first step in calculating the loss due to an earthquake is to have a representative set of siulated events (event catalog) for the location of your insured assets. Different regions have their own distinct set of stochastic events, for exaple California s event catalog would differ fro the one for Japan. In the case of earthquake risk, the event odule defines 5

CHAPTER 1. INTRODUCTION different seisic sources that theoretically produce all possible seisic events that could effect the region of interest. These event catalogs can account for ultiple seisic sources such as subduction zones, line sources, and background seisicity [14]. Given these sources, a stochastic catalog is fored with seisic events of agnitude, location and occurrence rates. Once the stochastic earthquake events are defined, the hazard odels are used to calculate the earthquake intensity at every property location due to every siulated earthquakes, via an attenuation odel. The hazard intensity can be easured as peak ground acceleration (pga), spectral acceleration(sa) or Modified Mercalli Intensity (MMI) Scale. An attenuation odel calculates a hazard intensity at a building location due to a catalog event by using principle attributes of the siulated event. These principle event attributes vary between odels, but usually include fault type, agnitude of the event and epicenter distance. Despite different attenuation odels, in the general the greater the event agnitude and the closer you are the event epicenter the larger the hazard intensity and the further away the saller the hazard intensity. After the attenuation odel is used, additional factors like local site conditions are used to calculate what the aplification of the intensity will be at each given site [14]. Once the hazard deand is defined, the vulnerability odule uses the specific attributes of each building to convert the hazard intensity into a ean daage ratio. This is where the specific characteristics of the insured building coe into play. The building construction aterial, age, nuber of stories, occupancy class and secondary characteristics help define specific daage curves for each asset. These daage curves define what type of daage a building with specific attributes would sustain when subjected a hazard intensity. The specific daage curves vary between odels, but they are validated with physical tests of instruented buildings and building coponents that are subjected to different seisic events. Given the deand( seisic intensity) and capacity of the each property the daage curve calculates a ean daage ratio for every siulated catalog event [14]. 6

CHAPTER 1. INTRODUCTION The last step is the financial analysis odule, which uses ean daage ratios along with inforation about the assets value and policy types to calculate a onetary loss due to each stochastic seisic event [14]. The final CAT odel(the financial odel.) output is a onetary loss value for every building due to every siulated earthquake. Fro these coputed losses and the rates of each siulated earthquake, different loss statistics can be calculated such as average annual loss and exceedance probability curves. Ultiately the odel calculates a total loss due to each siulated event, which is associated with different return periods. By looking at the overall distribution of losses, an insurance copany can ake an infored decision about which policy ters result in an acceptable loss argin. 7

Chapter 2 Methodology at the Conceptual Level 2.1 Utility Theory The goal of this research is to calculate a onetary loss for a portfolio of buildings due to a real earthquake event. One of the priary tools used to calculate this loss is the existing loss output of a catastrophe (CAT) risk odel. Theoretically the catastrophe risk odel has already been used to calculate a loss for the existing portfolio due its own set of siulated catalog of events. Since it is unlikely that the real event will be identical to any of the siulated events, the goal is use the losses due to all of the siulated events 1 : N EQ to calculate the new loss due the real event. If the set of catalog events 1 : N eq is collectively exhaustive and utually exclusive then the target loss prediction due to the real event can be calculated as a linear cobination of the siulated losses via the use of utility theory. Utility theory is used in econoics to ake decisions by calculating an expected cost of an event given soe uncertainty. The expected utility of a priary event B is the su of all the probabilities that a sub-event n=1:n will occur,p n, ties its associated cost, C n [15]. The set of sub events n=1:n are utually exclusive and collectively exhaustive, thus together 8

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL they span the entire subspace of priary event B. N E[B] = P n C n (2.1) n=1 Let us assue that soeone is trying to decide whether or not to accept a bet. They are told if they pay $4 they can roll a die and will be give the dollar equivalent of the nuber on the die. In order to ake this decision they decide to calculate the expected profit if the proposal to gable is either accepted or rejected. In order to apply the theory the total nuber of possible outcoes ust be identified and they ust be both utually exclusive and collectively exhaustive. There are a total of six outcoes; rolling a one, two, three, four, five or six since there are only six faces to a die. It is ipossible to roll one nuber and another nuber at the sae tie thus the 6 outcoes are utually exclusive. In addition six and only six outcoes are possible because you are constrained to roll one of the six given faces of a die, thus the six outcoes are collectively exhaustive. Once these two initial criterion are et the utility can be calculated for both outcoes. In order to apply the expected utility, first the probability of rolling a specific die face ust be calculated. Since the probability of rolling any given nuber is equal and there are a total of six faces to a die, the probability of rolling any given nuber is 1/6. Next the utility for each of the 6 outcoes ust also be calculated, which is given as $1, $2,$3, $4,$5 and $6. If equation 2.2 is used with the given inforation the expected utility for accepting the bet is given by the following equation. E[Bet] = 1/6 $1 + 1/6 $2 + 1/6 $3 + 1/6 $4 + 1/6 $5 + 1/6 $6 = $3.50 Since the expected profit of accepting the bet exceeds the cost to enter the bet the rational decision would be to reject the initial offer. This general fraework of calculating total cost or loss via contribution of a set of utually exclusive and collectively exhaustive events can easily be applied to calculate an expected loss of a portfolio due to a real event. Just as 9

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL calculating an expected value for the the gabling proble helped the user ake a decision, calculating an expected loss due to an earthquake can help an (re)insurance ake a decision of whether or not to borrow additional oney to pay their insured parties. As before in order to apply the theory, it is required that the catalog earthquakes for a set of utually exclusive and collectively exhaustive events. Equation 2.2 can be used to calculate an average expected cost or loss due to a real earthquake B by using a set of catalog earthquake events A 1 : A NEQ with a given loss Loss(A n ) and probability of occurrence, P (A n ). E[Loss(B)] = N EQ n=1 Loss(A n ) P (A n ) (2.2) It is assued that the CAT odel has generated a set of siulated earthquakes denoted by A n where n = 1 : N EQ that can occur in a given geographical area of interest (ie:location of real event B). The given set of A n are utually exclusive and collectively exhaustive. Loss(An) represents the loss of the portfolio buildings due to siulated event An, which is given by the CAT odel for all events. P (A n ) represents the probability that siulated catalog event A n will occur. Calculating this probability is deterined by the relation between the siulated events and the real earthquake, event B. These probabilities can be considered the weight or contribution each siulated event has to the total real loss. Before the event occurs there is no inforation to quantify the siulated event s contribution to the total real loss, thus it is assued that they contribute equally. The initial probability that a given event will occur is equal to 1/N EQ. Once inforation becoes available for event B, a coparison between attributes of the siulated events and the real earthquake can be used to define the probability of occurrence for each siulated event. The objective of the Bayesian fraework is to deterine via updating better estiates of P (A n ) using observed attributes of the real earthquake. 10

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL 2.2 Bayesian Fraework Bayesian analysis is a useful analytical tool to calculate the loss estiate of a portfolio of buildings to a real event. The foundation for Bayesian updating is Bayes Theore, which states that a posterior probability for update step (P (A n B )) is equal to the product of a prior probability (P (A n )) and a likelihood function (P (B A n )) divided by a noralization factor (P (B )) [16]: P (A n B ) = P (A n )P (B A n ) P (B ) = P (A n )P (B A n ) NEQ ; = 1, 2, 3..., M (2.3) n=1 P (A n )P (B A n ) The coponents of Bayes theore are defined in ters of conditional probabilities and the general idea is to use knowledge of observed attributes to iprove an initial estiate of soe unknown probability distribution. These attributes can include an earthquakes agnitudes, location, peak ground acceleration and corresponding daage to buildings. The update procedure can be an iterative process because every tie there is new inforation about an attribute, there will be a new update. Given M attributes of a real earthquake, there will be M corresponding updates. In order to calculate the posterior estiate for a given update step, the prior, likelihood and noralization factor ust be properly defined. The posterior probability (P (A n B )) is the conditional probability of occurrence for siulated earthquake event A n, given all the available observed attributes of the actual event B at update step. An iportant goal of this research is to calculate the posterior probability distribution using all available observed inforation about the real earthquake. The initial prior (P 0 (A n )) is the probability that a given siulated earthquake event A n will occur, before having any inforation about real earthquake event B. Before the updating schee begins, P 0 (A n ) = 1/N EQ ; n = 1 : N EQ assuing that all siulated events in the catalog are equally likely to occur. With each subsequent update step, the prior probability P (A n ) is set equal to the posterior probability P 1 (A n B 1 ) fro the previous 11

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL step ( 1). After the last update, the final posterior probability incorporates all the available observed inforation about the real earthquake event B. The likelihood function (P (B A n )) is the conditional probability of the real earthquake, event B, given the siulated event A n. The likelihood function (LF) is a easure of how alike the attributes (agnitude, location, daage, etc.) of the siulated earthquake event A n are to the corresponding ones of the real earthquake event B. The noralization factor P (B) is the standard total probability of having a real earthquake event B and if N EQ is defined to be the total nuber of siulated events, then it can be easily calculated by equation 2.4. P (B) = N EQ n=1 P (A n )P (B A n ) (2.4) The noralization factor ensures that the integral of the posterior probability density function will equal one, representing a true PDF probability. Because the initial prior probability is set equal to a coon constant value 1/N EQ for all N EQ in the catalog and the posterior probabilities are coputed using Bayesian updating, the ain challenge of this research is therefore to define the LF. Defining the LF should reflect how the actual earthquake attributes relate to the corresponding attributes of the catalog events. The earthquake attributes can include any quantity describing an earthquake and its consequences, but for siplicity, in this study, they were selected as the agnitude, location, peak ground acceleration and various fors of daage. If there were a siulated event in the catalog with the exact sae attributes as the real tie earthquake, the LF would take on its axiu value. But since the earthquake attributes of the siulated events usually differ fro those of the real event, the LF is selected to decay and asyptotically approach a zero value away fro the true values. Many continuous functions could satisfy these properties, but a Gaussian (noral) probability distribution was chosen to represent the LF in the proposed ethodology. 12

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL 2.3 Likelihood Function Definitions A ajor hurdle in Bayesian analysis is choosing an appropriate for for the likelihood function. There are any possible choices depending on the data being represented. A coon choice for the likelihood function is a Gaussian PDF, which was ultiately chosen for this application. An exaple where this is used is for the previously entioned case where an instruented building and analytical odel were used to calculate an updated loss estiate due to the 1971 San Fernando Earthquake [11]. In this application ultiple stochastic odel trials were ran in order to find the deand on the building structural coponents. Initially each trial was given an equal weight, constant prior equal to 1 divided by nuber of trials. In order to calculate an appropriate weight for each Monte Carlo siulation, a Gaussian likelihood was constructed. The likelihood function easured how close the siulation deand was to those recorded using the building instruentation. This Bayesian ethodology is very siilar to the earthquake loss ethodology presented. Just as the initial odel weights are set equal to a constant, the prior probability for each siulated CAT odel is set equal to a constant. Also, the goal of the updates is to reconcile the differences between the Monte Carlo odel and the buildings real behavior, and for the EQ loss estiation the goal is to reconcile the difference between the attributes of the real event and the siulated CAT odel events. [11] Another potential for of a likelihood function is an exponential PDF. An exaple of this application was a study that used Bayesian analysis to calculate life cycle loss for tiber buildings by perforing Monte Carlo siulations using coponent level fragility curves and loss distributions [17]. The goal was to odel different parts of the building structure, such as shear walls or windows, in order to define the probability of exceeding a specific daage state given a deand displaceent or acceleration. Initially the probability of a coponent exceeding any given daage state was equally likely and set equal to a constant (constant Prior). Next, in order to odify the odels daage prediction case studies of real lab test 13

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL of tiber buildings were used. By coparing the siulated Monte Carlo response with the test data, exponential likelihood functions were constructed to odify the initial constant prior distributions. Both cases had soe siilarities with the proposed ethodology. In both cases constant priors were used because before any available data it was assued that every possible scenario was equally likely to occur. Even though the tiber loss estiation used a exponential likelihood function, the basic goal is the sae: establish a weight for an experiental outcoe given available data on a real event and real response. 2.4 Bayesian Update Exaple To illustrate the Bayesian process, a saple portfolio of 5 siulated earthquake events (the catalog events) were used in a single update step. Five randoly selected earthquake agnitudes, equal to M An = [0.35, 3.00, 4.39, 7.00, 9.50], were used. Before any inforation about a real event is available, every event is equally likely to occur. The su of the discrete prior density function ust equal unity and be equal to a constant, thus the prior P 1 (A n ) for update step 1 is equal to 1/N EQ = 1/5 = 0.2 for all of these 5 events. This non-inforative prior probability density distribution is shown in Figure 2.1. In addition to inforation about the event agnitude, an analysis has also been conducted that indicates the loss a single building C would sustain if subjected to the catalog events L C A1, LC A2, LC A3, LC A4, LC A5. Table 2.1 suarizes the preliinary odel losses for all 5 siulated events for building C. If there is no additional inforation then an expected loss can be calculated with the baseline prior distribution and CAT odel losses. E[L] = 0.2 (0+200+500+1000+4000) = $1140. In this case all events contribute equally to the predicted loss. Table 2.1: Model Losses Event A1 A2 A3 A4 A5 L C An 0 200 500 1000 4000 14

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL 0.3 Baseline Prior PDF 0.25 P1(A1) P1(A2) P1(A3) P1(A4) P1(A5) 0.2 P 1 (An) 0.15 0.1 0.05 0 0 2 4 6 8 10 Magnitude Si. Event An Figure 2.1: Prior Probability Density Function After soe tie, a real earthquake, event B, with agnitude M B = 3.82 occurs. The goal becoes to use the initial loss estiates, fro the previous analysis, along with inforation about event B to calculate a new loss estiate for building C due to event B, L C B. Any inforation about the earthquake can potentially be used as an update paraeter. If initially only inforation about the event agnitudes is available, it can be used to calculate a single updated loss. The basic idea is to calculate the new loss as a linear cobination of the losses due the event catalog using different weights. These weights are defined via a probability density function that is calculated using Bayesian analysis. As entioned in the previous section, in order to calculate an updated posterior PDF a likelihood function(lf) ust be defined. A noral distribution was chosen as the likelihood P 1 (B 1 A n ) centered at the agnitude value of the real event µ 1 = M B with standard deviation equal to σ 1 = 0.5 [ax(m n ) in(m n )] = 0.5 (9.5 0.35) = 4.58. Once the LF is defined it ust be evaluated at the agnitude values of the siulated events M An. The catalog event agnitudes closer to the real event are weighted ore than those further away. Figure 2.2 shows the LF and the evaluated points for the five siulated catalog events. Once the LF is defined and evaluated, the next step is is to calculate the unnoralized posterior probability density function by ultiplying the prior with the likelihood values for all 5 siulated events. Initially, the prior is a constant so the posterior pdf is the sae shape 15

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL 0.2 P1(B1 A2) Likelihood Function P1(B1 A3) 0.15 P 1 (B 1 An) 0.1 P1(B1 A1) P1(B1 A4) P1(B1 A5) 0.05 0 M B 0 2 4 6 8 10 Magnitude Si. Event An Figure 2.2: Likelihood Probability Density Function as the LF, but as additional updates occur this will change. Figure 2.3 shows the discrete posterior pdf values for all 5 siulated events. 0.04 0.035 P1(A2)*P1(B1 A2) Posterior PDF P 1 (An)*P 1 (B 1 An) 0.03 0.025 0.02 0.015 0.01 P1(A3)*P1(B1 A3) P1(A4)*P1(B1 A4) P1(A5)*P1(B1 A5) 0.005 P1(A1)*P1(B1 A1) 0 0 2 4 6 8 10 Magnitude Si. Event An Figure 2.3: Posterior Probability Density Function The last step of Bayesian approach is to noralize the distribution by the su of all the unnoralized posteriors to ensure the su of the final discrete posterior PDF adds up to unity. The final noralized posterior pdf is shown in Figure 2.4 This final step is copleted to ensure the final distribution is a true probability density function. In addition it ensures set of siulated events are utually exclusive collectively exhaustive, since the events ake up the entire range of possible outcoes without any overlap. Even though the agnitude was chosen as an event attribute, any available inforation of the real earthquake can be 16

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL used to perfor an update. 0.4 0.35 P1(A2 B1) Noralized Posterior PDF P1(A3 B1) 0.3 P 1 (An B 1 ) 0.25 0.2 0.15 0.1 P1(A4 B1) P1(A5 B1) 0.05 P1(A1 B1) 0 0 2 4 6 8 10 Magnitude Si. Event An Figure 2.4: Noralized Posterior Probability Density Function After the Bayesian analysis is coplete data post processing is perfored to calculate a loss estiate for building C due to event B, L C B. For siplicity a portfolio with a single building was considered with a corresponding loss for all five siulated events (Table 2.1). By applying the basic utility theory, the expected loss for the real earthquake event B can be calculated using the posterior PDF calculated using the Bayesian approach and the odel loss values. The contribution of each earthquake to the total loss of building C is the siulated loss fro the initial analysis, L C An, ties the posterior PDF, P 1(A n B 1 ), for all five events. L C B = E[Loss(B 1 )] = N EQ n=1 L C An P 1 (A n B 1 ) (2.5) Table 2.2 shows the contribution of each siulated event and the total estiate for the first update step. In general a Bayesian loss estiate will be larger or equal to the sallest analysis loss and saller or equal to the largest analysis loss. The loss contribution of event A 3 is the largest because its posterior PDF value is the largest. Yet events that have a sall posterior value, but still have a large initial loss estiate, can still contribute a large loss like the case of event A 4. The final loss estiate for building C due to event B is between the initial loss estiate of event three and four, which akes sense because they are closest in agnitude with the real event. This value is uch lower than the initial estiate of over 17

CHAPTER 2. METHODOLOGY AT THE CONCEPTUAL LEVEL Table 2.2: Bayesian Loss Estiation Exaple Siulated Event L C A n P 1 (A n B 1 ) Loss Contribution A1 0 0.10 0 A2 200 0.37 74 A3 500 0.38 190 A4 1000 0.13 130 A5 4000 0.011 44 E[Loss(B 1 )] 364 $1000. Although this exaple used only one building and five events in the catalog, it could easily be expanded to include ultiple buildings and events. The basic Bayesian procedure reains the sae for ever update step, any given earthquake attribute, nuber of siulated events and nuber of buildings. As long as there is an equivalent attribute for the CAT odel events, any available inforation fro a real event can be used to perfor a loss update. 18

Chapter 3 Methodology at the Application Level 3.1 Potential Fors of Reported Earthquake Data The goal of the Bayesian update procedure is to use all available real inforation to update an initial loss estiate. The first available earthquake inforation will usually be the latitude, longitude and agnitude of the event, which is already in quantitative for, thus it can be used directly for the first update. Additional inforation about the earthquake will continue to arise, enabling subsequent updates of the loss estiate and further reduction in its uncertainty. The type of inforation and the tie at which it becoes available will vary, but it will be helpful to represent all inforation in quantitative fors so it can be incorporated into the Bayesian update process. In general there ay be different preliinary qualitative descriptions of daage for specific buildings. In order to incorporate the into the Bayesian update fraework they have to be apped to a daage level between 0 and 1 where 0 would correspond to no daage and 1 to collapse. There will be different qualitative descriptions, so it ay be difficult to standardize the way the descriptor is apped to the daage index. In order to standardize this apping, a guideline can be used that defines different daage ranges and defines what daage is characteristic of each range. 19

CHAPTER 3. METHODOLOGY AT THE APPLICATION LEVEL Another piece of inforation that becoes available post-event is the tie history of the earthquake at locations where acceleroeters have been installed. At these locations, the peak ground acceleration (PGA) can be extracted. The CAT odels also contain siilar hazard intensity estiates for each siulated earthquake event to which the real PGA values can be copared to. If the frequency tie history is available for both the real and siulated earthquakes it can be used as a paraeter to copute new likelihood functions to update the current loss estiate. Satellite iages and drone footage of the terrain ay also be available. They can coe fro different organizations like USGS reote sensing, NASA and private organizations. Processing this inforation to a corresponding daage level is possible but ay be coputationally challenging because of large size of the iages. Currently, progras and algoriths exist that help quantify a loss incurred by a structure by coparing before and after iages. If the daage is calculated and converted into a daage scale of zero to one it can be used for one of the loss updates. Loss estiates fro insurance copanies ay also becoe available. In order to noralize these values, they can be separated by building types and locations. If additional inforation about the building type isn t available, the loss can be noralized by the nuber of building in the portfolio and used as rough reference point for the loss. In addition, nonstructural loss is associated with loss of use when a business cannot operate due to the seisic event (business interruption). Newspapers and news chains ay share inforation on which businesses are closed and for how long they reain closed. The onetary loss for each day it reains closed would be equal to the projected profit per day of the business, which can be added to the structural onetary loss. Another source of data ay be social edia sites such as Facebook, Twitter and Instagra. After the earthquake occurs, it would be likely that pictures and descriptions of the subsequent daage would be posted in the affected areas. These qualitative descriptions and iages would have to be collected and quantified into a daage index scale of zero to one siilar to any other 20

CHAPTER 3. METHODOLOGY AT THE APPLICATION LEVEL Table 3.1: Types of Earthquake Data Order Inforation Source Quantification 1 Magnitude, Location, Depth USGS Magnitude, latitude, longitude, depth 2 Initial Daage Reports News reports, engineering Map qualitative description [no and First Loss Estiations surveys, insurance copanies daage, collapse] to scale [0, 1] 3 EQ Tie Histories USGS PGA (or other intensity etric) at acceleroeter locations 4 Satellite Iages Before and after SAR iages, USGS reote sensing, coplex coherence NASA, Private Orgs. and intensity correlation, convert to [0,1] 5 Drone Footage Governent agencies, insurance copanies 6 Business Interruption Newspaper 7 Reports fro Social Media Facebook, Twitter, Instagra 8 Final Real Loss Experience Inspections and Clais Construct 3D iage of building, also quantify daage by visual inspection of video and convert [0,1] Loss of use/day= projected profits/day; nuber of days the business is closed TBD (convert descriptions and pictures to quantification) Last official onetary loss based on clais data observed daage for the building portfolio. The last level of inforation to becoe available would be the true losses suffered by an insurance copany, assessed after processing all actual clais. By the tie this inforation is known, an adequate estiate should already be established by the Bayesian update process. Nevertheless, the real loss estiate can be used to validate or calibrate the Bayesian odel to iprove further estiates, particularly in the developent process. Ultiately, incorporating inforation about the real earthquake via the use of Bayesian updates will help iprove the loss estiate for any portfolio of insured buildings in a tiely anner. All of the proposed update inforation is suarized in Table 3.1. 3.2 Required Analysis Input The Bayesian update procedure requires two set of paraeters, one fro the siulated events (X (n) = [X (n) 1, X (n) 2... X (n) ]) and an equivalent vector fro an observed real tie M M event (X ( ) = [X ( ) 1, X ( ) 2... X ( ) ]). These paraeters will vary depending on the update step and available real event inforation. The first update paraeters will be the earth- 21