INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis

Size: px

Start display at page:

Download "INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS. Tao Jiang. A Thesis"

Shona Martin
5 years ago
Views:

1 INFORMATION APPROACH FOR CHANGE POINT DETECTION OF WEIBULL MODELS WITH APPLICATIONS Tao Jiang A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of MASTER OF ARTS August 2015 Committee: Wei Ning, Advisor Arjun K. Gupta Junfeng Shang

3 ABSTRACT iii Wei Ning, Advisor In many practical applications, it has been observed that there might exist changes in real data sets. By noticing those changes, people can make the best decisions to achieve benefits and avoid losses. The problem of change point analysis always means identifying the number and the locations of change points. Schwarz (1978) introduced a new model selection criterion named Schwarz Information Criterion (SIC), which is mathematically tractable and widely used in change point detection problems. In this thesis, statistical analysis of Weibull distribution related to application of Schwarz Information Criterion (SIC), modified Schwarz Information Criterion (MIC), and improved Schwarz Information Criterion (IIC) will be explored. Among these information criteria, MIC was published by Chen et al. (2006); and IIC is a conjecture suggested in this thesis. In the first parts of Chapter 2, 3, and 4, corresponding to SIC, MIC, and IIC, the theoretical basis and mathematical formula have been derived. The Weibull distribution is employed in each situation to be the specific models. After theoretical derivation, in the second parts, simulation tests for each proposed situations are conducted to estimate the performance of the information criteria. Power comparisons with the suggested simulation tests are investigated to compare the performance of all three information criteria. The change point location has been changed to indicate the variety of performance of detecting procedures. In the last part of each chapter, applications to real data sets are provided to illustrate the testing procedures based on each information criterion. The only difference among these three information criteria are their penalty terms. The idea of this thesis is to compare the performance of each model selection information criterion and to figure out the rule of improving the power of change point detection procedure.

4 iv This work is dedicated to my parents Guoyong Jiang and Shaoqing Li, grandfather Shenzhang Jiang, and grandmother Mingzhi Yu for their love. Tao Jiang

5 ACKNOWLEDGMENTS v First of all, I would like to thank my advisor, Dr. Wei Ning, for his patient guidance and kind encouragement. During these four years in Bowling Green, Dr. Ning has helped me not only as an advisor in research, but also as an instructor in class and as a friend in life. He always offers his help to me whenever I need it. Without Dr. Ning, I could never be who I am today. I have been extremely lucky to have Dr. Ning to be my advisor. I would also like to thank my committee members, Dr. Arjun K. Gupta and Dr. Junfeng Shang, for spending their time on my thesis. Their advice and suggestion blow away the dark clouds in my research. Dr. Tong Sun and Dr. Rieuwert Blok accepted me into the program and offered me assistantships so that my study as a master student here could be funded. I appreciate their help. This really changed my life. All the people I met here were nice and selfless. I must thank for their friendship. They supported me each time when I was down. Finally, I would like to thank my parents and my girlfriend. They give me a family, which is more important than anything else to me.

6 vi TABLE OF CONTENTS LIST OF FIGURES viii LIST OF TABLES ix CHAPTER 1 INTRODUCTION Change Point Analysis Binary Segmentation Procedure Weibull Distribution Literature Review CHAPTER 2 TRADITIONAL SCHWARZ INFORMATION CRITERION FOR CHANGE POINT DETECTION IN WEIBULL MODELS Theoretical Derivation for One-parameter Weibull Distribution One-parameter Simulation Tests Theoretical Derivation for Two-parameter Weibull Distribution Two-parameter Simulation Tests Application of SIC to Minimum Temperature Data in Uppsala CHAPTER 3 MODIFIED SCHWARZ INFORMATION CRITERION FOR CHANGE POINT DETECTION IN WEIBULL MODELS Theoretical Derivation for One-parameter Weibull Distribution One-parameter Simulation Tests Theoretical Derivation for Two-parameter Weibull Distribution Two-parameter Simulation Tests Application of MIC to Minimum Temperature Data in Uppsala

7 CHAPTER 4 IMPROVED SCHWARZ INFORMATION CRITERION FOR CHANGE POINT vii DETECTION IN WEIBULL MODELS A Conjecture for Improved Schwarz Information Criterion Theoretical Derivation for One-parameter Weibull Distribution One-parameter Simulation Tests Theoretical Derivation for Two-parameter Weibull Distribution Two-parameter Simulation Tests Application of IIC to Minimum Temperature Data in Uppsala CHAPTER 5 COMPARISON AND SUMMARY 37 BIBLIOGRAPHY 39 APPENDIX A SELECTED R PROGRAMS 40

8 viii LIST OF FIGURES 2.1 Power comparison for SIC with change point at different locations for shapechangeable Weibull distribution, n = Power comparison with Different Sample Sizes Minimum temperatures in Uppsala, Sweden, Power comparison for MIC and SIC with change point at different locations for shape-changeable Weibull distribution, n = Two-parameter-changeable power comparison for MIC and SIC with change point at different locations for shape-changeable Weibull distribution, n = ( 2h n 1)2 value for different change locations [1 ( 2h n 1)2 ] value for different change locations Power comparison for IIC and MIC with change point at different locations for shape-changeable Weibull distribution, n = Difference in Penalty Terms of SIC, MIC, and IIC Power Comparison of SIC, MIC, and IIC, n =

9 ix LIST OF TABLES 2.1 Power comparison for SIC with change point at different locations, n = Power comparison with change point at different locations, n = Two parameter power comparison with change point at different locations, n = Two Parameter power comparison with change point at different locations, n = Minimum temperatures in Uppsala, Change Points of Minimum temperatures in Uppsala, Power comparison for MIC with change point at different locations, n = Power comparison for MIC with change point at different locations, n = Two parameter power comparison for MIC with change point at different locations, n = Two Parameter power comparison for MIC with change point at different locations, n = Difference of Penalty Terms in SIC and MIC for Weibull Distribution Change Points of Minimum temperatures in Uppsala, Power comparison for IIC with change point at different locations, n = Power comparison for IIC with change point at different locations, n = Two parameter power comparison for IIC with change point at different locations, n = Two Parameter power comparison for IIC with change point at different locations, n = Change Points of Minimum temperatures in Uppsala, Penalty Terms of SIC, MIC, and IIC for Two-parameter-changeable Weibull Distribution Under the Alternative Hypothesis

10 5.2 Power Comparison of SIC, MIC, and IIC, n = x

11 1 CHAPTER 1 INTRODUCTION 1.1 Change Point Analysis Changes might appear in all kinds of data sets, such as market price, temperature, and reaction rate. By noticing those changes, people can make the best decisions to achieve benefits and avoid losses. The problem of change point analysis means identifying the number and the locations of change points. Therefore, there are always two steps for change point problem: 1) Check if there is(are) any change(changes) in the observed data; 2) If there exist possible changes, estimate the number and the locations of changes. The general description for change point analysis can be found in the book, Parametric statistical change point analysis [1]. Let x 1, x 2,..., x n be a sequence of n independent random variables. Each random variable x i has probability distribution function F i respectively. Now we need to construct the hypothesis to test the null hypothesis H 0 : F 1 = F 2 =... = F n, while the alternative hypothesis is H a : F 1 =... = F k1 F k1 +1 =... = F k2... F km 1 +1 =... = F km F km+1 =... = F n. Here 1 < k 1 < k 2 <... < k m < n and there are m change points in this sequence of random variables, where m 1. We know that k 1, k 2,..., k m are those respective positions of change points. 1.2 Binary Segmentation Procedure Binary segmentation procedure proposed by Vostrikova (1981) is a popular method to deal with multiple change point situation [9]. This method can simplify a multiple change point problem by converting it to a single change point problem. It is able to detect the positions and the num-

12 2 ber of change points simultaneously. It has advantage in saving computational time. The binary segmentation procedure can be summarized as the follows. Let x 1, x 2,..., x n be a sequence of n independent random variables. Each random variable x i has a probability distribution function F i (θ i ). When we would like to test the null hypothesis versus the alternative hypothesis, we will use the binary segmentation method and a detection method to search for all possible change points. Here are several brief steps about a general method of binary segmentation technique in the process of looking for change points. 1) We are going to test possible change points one by one from the most significant one to the least one. Therefore, in the first step, we assume there is at most one change in the probability distribution functions. That is, we test the null hypothesis, H 0 : F 1 = F 2 =... = F n and the alternative hypothesis, H a : F 1 =... = F k F k+1 =... = F n, where k is the position of the first change point. If H 0 is rejected, then there is a change occurring at the kth observation. If H 0 is not rejected, then we do not have sufficient evidence to say that there is a significant change point in data. 2) If there is no change point in step 1, we stop testing procedure and come to our conclusion of no change. If there exists a change point, then we need to test the two subsequences we derive from step 1 by doing similar hypothesis tests. For example, we consider one subsequence of those two first. The test is between the null hypothesis, H 0 : F 1 = F 2 =... = F k and the alternative hypothesis, H a : F 1 =... = F k1 F k1 +1 =... = F k, where k 1 is the position of the possible change point in the first subsequence. We follow the same procedure to deal with the second subsequence. 3) If we can still find change points, then we keep separating data and doing the tests until we could not find any change points in each subsequence. 4) We summarize the results and collect all the change points we find in step 1 to step 3.

13 3 1.3 Weibull Distribution The Weibull distribution is one of a widely used lifetime distributions in reliability engineering. It is a versatile distribution that can take on the characteristics of other types of distributions, based on the value of the shape parameter. The formula for the probability density function of the general Weibull distribution is k θ (x f(x; λ, k, θ) = λ λ )k 1 e ( x θ λ )k x 0, 0 x < 0. where k > 0 is the shape parameter, and λ > 0 is the scale parameter, and θ is the location parameter of the distribution. The case where θ = 0 and λ = 1 is called the standard Weibull distribution. The case where θ = 0 is called the two-parameter Weibull distribution. This thesis will mainly focus on the two-parameter Weibull distribution situation. Jandhyala et al. (1999) used Likelihood Ratio Test (LRT) method to detect the change point of temperatures in Uppsala [4]. The two-parameter Weibull distribution, which is a Weibull distribution without location parameter θ, was employed to fit the models. Jandhyala (1999) only changed scale parameter λ > 0 to detect change point. However, the Weibull distribution can have up to three parameters in all. Therefore, in this thesis, we will (1) change either the shape parameter k or the scale parameter λ only and compare our results with Jandhyala et al. (1999); (2) change both the shape parameter k and the scale parameter λ > 0 and compare our results with Jandhyala et al. (1999). Both of the above two tests will be conducted by using SIC, MIC, and IIC. 1.4 Literature Review There are quite a few methods that could be used in change point detection tests. Bayesian analysis test, maximum likelihood ratio test, stochastic process test, and nonparametric test are the

14 most widely used among them. 4 Chernoff and Zacks (1964) estimated the current mean of a normal distribution which was subjected to changes in time [3]. The technique of using Bayesian inference was applied as a technical device to yield insight leading to simple robust procedures. A quadratic loss function was used to derive a Bayesian estimator of the current mean for a priori probability distribution on the entire real line. Sen and Srivastava (1975) considered the procedure for testing whether the means of each variable in a sequence of independent random variables could be taken to be the same, against alternatives that a shift might have occurred after some point [6]. They derived both the exact and asymptotic distributions of their test statistic to test whether a single change exists in the mean of a sequence of normal random variables. Five years later, in 1980, they made a technical report about tests for detecting change in the multivariate mean [7]. Srivastava and Worsley (1986) hypothesized a sequence of independent multivariate normal vectors with equal but possibly unknown variance matrices to have equal mean vectors [8]. They tested whether the mean vectors had changed after an unknown point in the sequence. They studied the multiple changes in the multivariate normal mean and approximated the null distribution of the likelihood ratio test statistic based on Bonferroni inequality. Jandhyala et al. (1999) developed change-point methodology for identifying dynamic trends in the scale and shape parameters of a Weibull distribution [4]. Their methodology included asymptotics of the likelihood ratio statistic for detecting unknown changes in the parameters as well as asymptotics of the maximum likelihood estimate of the unknown change point. Jandhyala s work (1999) was the motivation of this thesis. Instead of likelihood ratio test, previously mentioned information criteria, such as SIC, MIC, and IIC, would be used to detect unknown changes in the parameters. Moreover, the situation when more than one parameter is changed will be considered. Finally, a new conjecture called improved information criterion (IIC) will be suggested and compared with currently existed information criteria.

15 5 CHAPTER 2 TRADITIONAL SCHWARZ INFORMATION CRITERION FOR CHANGE POINT DETECTION IN WEIBULL MODELS 2.1 Theoretical Derivation for One-parameter Weibull Distribution Using model selection criterion is one of the most widely accepted methods for detecting change points. There are quite a lot popular model selection criteria, such as Schwarz Information Criterion (SIC) and Akaike Information Criterion (AIC). Those model selection criteria have their own advantages over likelihood ratio test. First, unlike LRT, model selection criterion can avoid complicated asymptotic distribution deviation. Second, model selection criterion can detect changes, if any, and estimate change location simultaneously. SIC is a criterion for model selection among a finite set of models [5]. It is partially based on the likelihood function. When a model is fitted, it is possible that too many parameters are added into the model in order to have a larger likelihood. Unfortunately, this might cause overfitting and hide useful models from us. In SIC, a penalty term for the number of parameters in the model is introduced. This helps us to overcome the problem. Firstly, the Weibull distribution is considered to be an example. Under the null hypothesis H 0, the SIC has been defined as SIC(n) = 2l n (^k, ^λ) + [dim(^k) + dim(^λ)] log(n) = 2l n (^k, ^λ) + 2 log(n) (2.1.1) Under the alternative hypothesis, now the shape parameter k is assumed to be changeable and the scale parameter λ is controlled. Based on the previous Weibull distribution probability density function, the log-likelihood function for the change point problem has the form l n (^k 1h, ^k 2h, ^λ, h) = log f(x i, ^k 1h, ^λ) + log f(x i, ^k 2h, ^λ), (2.1.2)

16 6 where ^k 1h is the maximum likelihood estimator (MLE) of the shape parameter k for the first part of the data set, ^k 2h is the MLE of k for the second part of the data set, and ^λ is the MLE of the scale parameter λ for the entire data set. Also, h is the position of the assumed change point; and ^k 1h, ^k 2h maximize l n (^k 1h, ^k 2h, ^λ, h) for a given h. Therefore, the Schwarz Information Criterion (SIC) for the change point problem becomes SIC(h) = 2l n (^k 1h, ^k 2h, ^λ, h) + [2dim(^k 1h ) + dim(^λ) + 1] log n = 2 log f(x i, ^k 1h, ^λ) 2 = 2 log[(^λ^k 1h ) h h 2 log[(^λ^k 2h ) n h 1h 1 i exp( ^λ n 2h 1 i log f(x i, ^k 2h, ^λ) + [2dim(^k 1h ) + dim(^λ) + 1] log n exp( ^λ 1h i )] 2h i )] + 4 log n. (2.1.3) Since the only changeable parameter is the shape parameter k, dim(^k 1h ) has the value of 1 in this situation. Therefore, SIC(h) can be simplified as SIC(h) = 2 log[(^λ^k 1h ) h n 2h 1 i h exp( ^λ 1h 1 i exp( ^λ 2h i )] + 4 log n. 1h i )] 2 log[(^λ^k 2h ) n h (2.1.4) Equation (2.1.3) shows the SIC for shape-changeable Weibull distribution. The same idea and method can help to derive the SIC for scale-changeable Weibull distribution as well. In Chapter 2, the SIC for shape-scale-changeable Weibull distribution will also be considered. This is the basic structure of Chapter 2(SIC), Chapter 3(MIC), and Chapter 4(IIC). To achieve the SIC for scale-changeable Weibull distribution, the log-likelihood function has

17 been used again, 7 l n (^λ 1h, ^λ 2h, ^k, h) = log f(x i, ^λ 1h, ^k) + log f(x i, ^λ 2h, ^k). (2.1.5) Then, the SIC for the change point problem has become SIC(h) = 2l n (^λ 1h, ^λ 2h, ^k, h) + [2dim(^λ 1h ) + dim(^k) + 1] log n, (2.1.6) where h is still the position of an assumed change point; and ^λ 1h, ^λ 2h maximize l n (^λ 1h, ^λ 2h, ^k, h) for a given h. In the two-parameter Weibull distribution, the SIC becomes SIC(h) = 2 log f(x i, ^λ 1h, ^k) 2 = 2 log[(^k^λ 1h ) h h n 1 i exp( ^λ 2h 1 i exp( ^λ 1h log f(x i, ^λ 2h, ^k) + [2dim(^λ 1h ) + dim(^k) + 1] log n n h i )] 2 log[(^λ 2h^k) i )] + [2dim(^λ 1h ) + dim(^k) + 1] log n. (2.1.7) The only changeable parameter is the scale parameter λ, dim(^λ 1h ) has the value of 1 in this situation. Therefore, SIC(h) can be simplified as SIC(h) = 2 log[(^k^λ 1h ) h n 1 i h exp( ^λ 2h 1 i exp( ^λ 1h n h i )] 2 log[(^λ 2h^k) i )] + 4 log n. (2.1.8) 2.2 One-parameter Simulation Tests To investigate the performance of SIC to detect change point at different locations, simulation power tests have been conducted. Either one of the parameters of the Weibull Distribution has

18 8 been controlled in the simulation. We conduct simulation 1000 times with different change point location h and sample size n = 100. The initial Weibull distribution is always W(1,1). When the shape parameter k is changeable, the new Weibull distribution is W(2,1), W(1.75,1), or W(1.5,1). When the scale parameter λ is changeable, the new Weibull distribution is W(1,2), W(1,1.75), or W(1,1.5). Table 2.1: Power comparison for SIC with change point at different locations, n = 100 W(1,1) h=5 h=10 h=15 h=20 h=25 h=30 h=35 h=40 h=45 h=50 W(2,1) k W(1.75,1) W(1.5,1) W(1,2) λ W(1,1.75) W(1,1.5) From the Table 2.1, it is obvious that as the difference between the parameter increase, the power of the test also increases. For example, when the change point location is h = 10, the power for f 1 = W (1, 1) and f 2 = W (2, 1) is Meanwhile, for f 1 = W (1, 1) and f 2 = W (1.5, 1), the power is Besides, It could be observed that the power of the test is within an acceptable range. Table 2.2: Power comparison with change point at different locations, n = 50 W(1,1) h=3 h=5 h=8 h=10 h=13 h=15 h=18 h=20 h=23 h=25 W(2,1) k W(1.75,1) W(1.5,1) W(1,2) λ W(1,1.75) W(1,1.5) Another interesting conclusion is that when the change point location is in the middle of the data series (shown in right hand side of Table 2.1), the power of the test is higher than the power when the change point location is in the beginning of the data series (shown in left hand side of Table 2.1). For instance, the power for f 1 = W (1, 1) and f 2 = W (2, 1) is 0.587, when the change

19 9 Figure 2.1: Power comparison for SIC with change point at different locations for shapechangeable Weibull distribution, n = 100 point location is h = 10; and for the change point location is h = 45, the power for f 1 = W (1, 1) and f 2 = W (2, 1) is This phenomenon motivates us to try to figure out a method that is cooperating with the situation when the change happens at the beginning or the end of the data series. Two alternative model selection information criteria will be introduced in the following chapters. In order to illustrate the effect of the sample size, a different sample size n = 50 has been chosen to be compared with the original n = 100. Other settings are kept to be the same. Based on Table 2.2, it is clear that as the sample size increases, the power of the test will also increase. Take the power of W(1,1) and W(2,1) when h = 25 as an example. For n = 100, the power is 0.951; while for n = 50, the power is This conclusion is illustrated in Figure 2.2. The x-axis indicates the locations of the change point in the whole data series. For instance, when the sample size is 100, the location of the 10th data is the location of the 5th data is also 5 50 = 0.1; and when the sample size is 50, = 0.1. This is the reason why they are put together to be

20 10 Figure 2.2: Power comparison with Different Sample Sizes compared. 2.3 Theoretical Derivation for Two-parameter Weibull Distribution After the one-parameter Weibull distribution, we extend the model to be two-parameter Weibull distribution. Now both the shape parameter k and the scale parameter are assumed to be changeable. Based on the previous Weibull distribution model, the log-likelihood function for the change point problem has the form l n (^k 1h, ^k 2h, ^λ 1h, ^λ 2h, h) = log f(x i, ^k 1h, ^λ 1h ) + log f(x i, ^k 2h, ^λ 2h ). (2.3.1) Therefore, the Schwarz Information Criterion (SIC) for the change point problem becomes SIC(h) = 2l n (^k 1h, ^k 2h, ^λ 1h, ^λ 2h, h) + [2dim(^k 1h ) + 2dim(^λ 1h ) + 1] log n, (2.3.2)

21 where h is the position of a change point; and ^k 1h,^k 2h, ^λ 2h, and ^λ 2h maximize l n (^k 1h, ^k 2h, ^λ 1h, ^λ 2h, h) for given h. In the two-parameter Weibull distribution, the SIC becomes 11 SIC(h) = 2 log f(x i, ^k 1h, ^λ 1h ) 2 = 2 log[(^λ 1h^k1h ) h h n 2h 1 i exp( ^λ 2h 1h 1 i exp( ^λ 1h log f(x i, ^k 2h, ^λ 2h ) + [2dim(^k 1h ) + 2dim(^λ 1h ) + 1] log n 1h i )] 2 log[(^λ 2h^k2h ) n h 2h i )] + [2dim(^k 1h ) + 2dim(^λ 1h ) + 1] log n. (2.3.3) Since the dimension of both ^k 1h and ^λ 1h are 1, 2dim(^k 1h ) + 2dim(^λ 1h ) + 1 has the value of 5 in this situation. Therefore, SIC(h) can be simplified to be SIC(h) = 2 log[(^λ 1h^k1h ) h n 2h 1 i h exp( ^λ 2h 1h 1 i exp( ^λ 1h 2h i )] + 5 log n. 1h i )] 2 log[(^λ 2h^k2h ) n h (2.3.4) 2.4 Two-parameter Simulation Tests To investigate the performance of SIC to detect change point at different locations, simulations have been conducted. This time, both of the two parameters of Weibull distribution are changeable. We conduct simulation 1000 times with different change point location h, sample size n = 100. The initial Weibull distribution is always W(1,1). The new Weibull distribution for the second part of the data set is W(2,2), W(1.75,1.75), or W(1.5,1.5). Table 2.3: Two parameter power comparison with change point at different locations, n = 100 W(1,1) h=5 h=10 h=15 h=20 h=25 h=30 h=35 h=40 h=45 h=50 W(2,2) W(1.75,1.75) W(1.5,1.5)

22 12 From the Table 2.3, the conclusions are the same as the Table 2.1. As the difference between the parameter increases, the power of the test also increases. For example, when the change point location is h = 10, the power for f 1 = W (1, 1) and f 2 = W (2, 2) is For f 1 = W (1, 1) and f 2 = W (1.5, 1.5), the power is It could be observed that the power of the test is within an acceptable range. Another conclusion is also the same as the Table 2.1. When the change point location is in the middle of the data series (shown in right hand side of Table 2.3), the power of the test is higher than the power when the change point location is in the beginning or the end of the data series (shown in left hand side of Table 2.3). For instance, the power for f 1 = W (1, 1) and f 2 = W (2, 2) is 0.505, when the change point location is h = 10; and for the change point location is h = 45, the power for f 1 = W (1, 1) and f 2 = W (2, 2) is A different sample size n = 50 has also been chose to be compared with the original n = 100 for two parameter Weibull models. Other settings remain the same. Table 2.4: Two Parameter power comparison with change point at different locations, n = 50 W(1,1) h=3 h=5 h=8 h=10 h=13 h=15 h=18 h=20 h=23 h=25 W(2,2) W(1.75,1.75) W(1.5,1.5) Based on Table 2.4, it is clear that as the sample size increases, the power of the test will also increases. Take the power of W(1,1) and W(2,2) when h = 25 as an example. For n = 100, the power is 0.966; while for n = 50, the power is Application of SIC to Minimum Temperature Data in Uppsala The Uppsala s minimum temperature data set is taken from Jandhyala et al. (1999). This data set gives the minimum temperatures in Uppsala, Sweden from 1774 to Jandhyala (1999) applied likelihood ratio test to the whole data to detect change point. The analysis clearly identifies a significant change at the year 1837 (h = 64). In this section, SIC will be applied to detect the possible change points in this data set. Both

23 13 Figure 2.3: Minimum temperatures in Uppsala, Sweden, the shape parameter k and the scale parameter λ are considered to be changeable. The data is given in Table 2.5; and the plot of data is shown in Figure 2.3. In the first step, the SIC for the null hypothesis is , and the minimum SIC for the alternative hypothesis is with the change point location h = 64. Therefore, h = 64 is the most obvious change point in the data set, which is the same as the result from Jandhyala (1999). After that, based on the binary segmentation, the whole data set is separated to be (1 : 64) and (65 : 208) two parts. By conducting the same procedure, two other possible change points are found, h = 55 and h = 139. The conclusion of the change point detection is illustrated in Table 2.6. The possible reason for the year 1837, the most obvious change point among them, and year 1828 to be change points is the famous Industrial Revolution, which is considered to have largely

24 Table 2.5: Minimum temperatures in Uppsala, Table 2.6: Change Points of Minimum temperatures in Uppsala, Change Point Location Year Possible Reasons The First Industrial Revolution: Steam Engine The Second Industrial Revolution: Coal Power (136) (1909) (Too close to Year 1912) Petroleum industry affected the global climate environment. By doing likelihood ratio test (LRT), Jandhyala et al/ (1999) found two change points, year 1837 and year SIC detected two more points than LRT. However, year 1909 is too close to year The SIC for the null hypothesis is The SIC for the alternative hypothesis is We think only three observations could not afford enough information to make a decision, so we choose to not reject the null hypothesis. Finally, the conclusion for SIC is that yea 1828, year 1837, and year 1912 are the change points.

25 15 CHAPTER 3 MODIFIED SCHWARZ INFORMATION CRITERION FOR CHANGE POINT DETECTION IN WEIBULL MODELS 3.1 Theoretical Derivation for One-parameter Weibull Distribution For change point analysis, the general penalty term of the Schwarz Information Criterion (SIC) is only related to the sample size and the number of free parameters to be estimated. In Chapter 2, we have derived the form [2dim(^k 1h ) + 2dim(^λ 1h ) + 1] log n for the shape-scale-changeable two parameter Weibull distribution. However, Zhang and Siegmund (2007) pointed out that the traditional SIC can detect change points more efficiently when the change points are in the central part of the data [10]. If there is a change point at the beginning or in the end of the data, SIC may not be sensitive enough to detect change points. Meanwhile, the importance of developing a reliable information criterion has long been recognized by the academia. The Schwarz information criterion (SIC) is a popular method for model selection. Its property can still be improved by adjusting the penalty term for change point scenario. The problem here is that we need to examine a penalty term that is robust in most situations. In other words, there is a need for a better understanding of penalty term and its modification. Hence, another study area of the thesis is modification of the penalty term of the Schwarz Information Criterion. Chen et al. (2006) suggested the necessary of the complexity of the penalty term so that it could be related to the locations of changes [2]. Same as traditional SIC, here one parameter-changeable MIC is derived first. The shape parameter k is assumed to be changeable and the scale parameter λ is controlled first. In the shapechangeable Weibull situation, when the change point location h is in the middle area of the data set, which is close to n, all the parameters k 2 1 and k 2 will be effective. However, when the change point location h is close to 1 or n, the beginning or the end of data set, either k 1 and or k 2 will become redundant. Hence, as the change point location h getting close to 1 or n, h is an increasing

26 16 undesirable parameter. Therefore, a modified information criterion has been suggested, so called modified information criterion (MIC). We adopt the idea of Chen et al. (2006) to modify the detecting procedure of Weibull distribution. Under the null hypothesis H 0, the MIC has been defined to be MIC(n) = 2l n (^k, ^λ) + [dim(^k) + dim(^λ)] log(n) = 2l n (^k, ^λ) + 2 log(n) (3.1.1) It is clear that there is no difference between MIC(n) and SIC(n) under the hull hypothesis. Under the alternative hypothesis, in MIC, the penalty term has been revised to be [2dim(^k 1h ) + dim(^λ) + ( 2h n 1)2 ] log n. (3.1.2) Then for the change point location 1 h < n, let MIC(h) = 2l n (^k 1h, ^k 2h, ^λ, h) + [2dim(^k 1h ) + dim(^λ) + ( 2h n 1)2 ] log n. (3.1.3) If min MIC(h) < MIC(n), we have sufficient evidence that we should reject the null hypothesis 1 h<n and so there exists at least one change point in the data set. The change point location h is when M IC(h) arrives its minimum value. Once the first change point is detected, according to the binary segmentation, the whole data set could be separated into two parts and then the same procedure is conducted if there is any further change point suspected. The change point detection in a series of data will be stopped when min MIC(h) > MIC(n) for this series of data. If thus, the null 1 h<n hypothesis that no change point exists will be accepted. Chen et al. (2006) considered that if the change point location was h, the variance of ^k 1h would be proportional to 1 and the variance of ^k h 2h would be proportional to 1. Hence the total variance n h could be proportional to 1 h + 1 n h = 1 n[ 1 4 ( h n 1 2 )2 ]. (3.1.4)

27 17 Therefore, they concluded if a change was suspected to be at the edge of a data series, relatively stronger evidence, which has higher model selection information criterion, is required to identify the change point. This means a larger penalty term is suggested in this situation. They showed that the MIC statistic has χ 2 limiting distribution for any regular distribution family. The conclusions on weak convergence would not be affected as the related probability statements would not be affected by how one sequence was related to the other. Here define S n = MIC(n) min 1 h<n MIC(h) + [dim(^k) + dim(^λ)] log n, (3.1.5) where M IC(n) and min MIC(h) are already mentioned in equation (3.1.1) and equation (3.1.3). 1 h<n Under Wald conditions and the regularity conditions, as n converges to infinity, S n converges to χ 1 2 in distribution under the null hypothesis. In addition, in shape-scale-changeable Weibull distribution, as n converges to infinity, S n converges to χ 2 2. Proofs are similar to Chen et al. (2006). On the other hand, under the alternative hypothesis, there exists a change point at h such that as n converges to infinity, h n falls in the region (0, 1), and S n converges to infinity in probability. To sum up, this is H 0 : S n d χ d 2, (3.1.6) H a : S n p, (3.1.7) where d is the dimension of the changeable parameter. For the shape-changeable Weibull distribution, under the alternative hypothesis, the log-likelihood function has been l n (^k 1h, ^k 2h, ^λ, h) = log f(x i, ^k 1h, ^λ) + log f(x i, ^k 2h, ^λ). (3.1.8) Therefore, the Modified Schwarz Information Criterion (MIC) for the change point problem be-

28 comes 18 MIC(h) = 2l n (^k 1h, ^k 2h, ^λ, h) + [2dim(^k 1h ) + dim(^λ) + ( 2h n 1)2 ] log n, (3.1.9) where h is the position of a change point; and ^k 1h, ^k 2h maximize l n (^k 1h, ^k 2h, h) for a given h. In the two-parameter Weibull distribution, the MIC becomes MIC(h) = 2 log f(x i, ^k 1h ) 2 = 2 log[(λ^k 1h ) h h n 2h 1 i exp( λ 1h 1 i exp( λ log f(x i, ^k 2h ) + [2dim(^k 1h ) + dim(^λ) + ( 2h n 1)2 ] log n 1h i )] 2 log[(λ^k 2h ) n h 2h i )] + [2dim(^k 1h ) + dim(^λ) + ( 2h n 1)2 ] log n. (3.1.10) Since the only changeable parameter is the shape parameter k, both dim(^k 1h ) and dim(^λ) have the value of 1 in this situation. Therefore, MIC(h) can be simplified as MIC(h) = 2 log[(λ^k 1h ) h n 2h 1 i h exp( λ 1h 1 i exp( λ 1h i )] 2 log[(λ^k 2h ) n h 2h i )] + [3 + ( 2h n 1)2 ] log n. (3.1.11) Under the null hypothesis, MIC(n) can be MIC(n) = 2l n (^k, ^λ) + 2 log(n). (3.1.12) Equation (3.1.11) and equation (3.1.12) show us the MIC for shape-changeable Weibull distribution. The same idea and method can help to derive the MIC for scale-changeable Weibull distribution as well. To achieve the MIC for scale-changeable Weibull distribution, the log-likelihood function has

29 been used again, l n (^λ 1h, ^λ 2h, ^k, h) = 19 log f(x i, ^λ 1h, ^k) + log f(x i, ^λ 2h, ^k). (3.1.13) Then, the MIC for the change point problem has become MIC(h) = 2l n (^λ 1h, ^λ 2h, ^k, h) + [2dim(^λ 1h ) + dim(^k) + ( 2h n 1)2 ] log n, (3.1.14) where h is still the position of a change point; and ^λ 1h, ^λ 2h maximize l n (^λ 1h, ^λ 2h, ^k, h) for a given h. In the two-parameter Weibull distribution, the MIC becomes MIC(h) = 2 log f(x i, ^λ 1h, ^k) 2 + ( 2h n 1)2 ] log n = 2 log[(^k^λ 1h ) h h n 1 i exp( ^λ 2h 1 i exp( ^λ 1h log f(x i, ^λ 2h, ^k) + [2dim(^λ 1h ) + dim(^k) n h i )] 2 log[(^λ 2h^k) i )] + [2dim(^λ 1h ) + dim(^k) + ( 2h n 1)2 ] log n. (3.1.15) This time, the only changeable parameter is the scale parameter λ, both dim(^λ 1h ) and dim(^k) have the values of 1 in this situation. Therefore, MIC(h) can be simplified as MIC(h) = 2 log[(^k^λ 1h ) h n 1 i h exp( ^λ 2h 1 i exp( ^λ 1h n h i )] 2 log[(^λ 2h^k) i )] + [3 + ( 2h n 1)2 ] log n. (3.1.16)

30 One-parameter Simulation Tests To investigate the performance of MIC to detect change point at different locations, simulation power tests have been conducted. Either one of the parameters of the Weibull Distribution has been controlled in the simulation tests. We conduct simulation 1000 times with different change point location h and sample size n = 100. The initial Weibull distribution is always W(1,1). When the shape parameter k is changeable, the new Weibull distribution is W(2,1), W(1.75,1), or W(1.5,1); and when the scale parameter λ is changeable, the new Weibull distribution is W(1,2), W(1,1.75), or W(1,1.5). Table 3.1: Power comparison for MIC with change point at different locations, n = 100 W(1,1) h=5 h=10 h=15 h=20 h=25 h=30 h=35 h=40 h=45 h=50 W(2,1) k W(1.75,1) W(1.5,1) W(1,2) λ W(1,1.75) W(1,1.5) From the Table 3.1, it is obvious that as the difference between the parameter increase, the power of the test also increase. For example, when the change point location is h = 10, the power for f 1 = W (1, 1) and f 2 = W (2, 1) is For f 1 = W (1, 1) and f 2 = W (1.5, 1), the power is It could be observed that the power of the test is within an acceptable range. Compared with the traditional SIC, the simulation results of MIC only have small differences. Based on the simulation test results, the power of MIC is slightly higher than the power of SIC, shown in Figure 3.1. However, the general trend of power of MIC is the same as the trend of the power of SIC, higher in the middle and lower in both ends. In order to see the effect of sample size, a different sample size n = 50 has been chosen to compare with the original n = 100. Everything else has been kept to be exactly the same. Based on Table 3.2, it is clear that as the sample size increase, the power of the test will also increase. Take the power of W(1,1) and W(2,1) when h = 25 as an example. For n = 100, the

31 21 Figure 3.1: Power comparison for MIC and SIC with change point at different locations for shapechangeable Weibull distribution, n = 100 Table 3.2: Power comparison for MIC with change point at different locations, n = 50 W(1,1) h=3 h=5 h=8 h=10 h=13 h=15 h=18 h=20 h=23 h=25 W(2,1) k W(1.75,1) W(1.5,1) W(1,2) λ W(1,1.75) W(1,1.5) power is 0.954; while for n = 50, the power is Theoretical Derivation for Two-parameter Weibull Distribution In the two parameter Weibull situation, when the change point location h is in the middle of the data set, which is close to n, all the parameters k 2 1, λ 1, k 2, and λ 2 will be effective. However, when the change point location h is close to 1 or n, the beginning or the end of data set, either k 1 and λ 1 or k 2 and λ 2 will become redundant. Hence, as the change point location h getting close to 1 or n, h is an increasing undesirable parameter.

32 Under the null hypothesis H 0, the MIC has been defined to be 22 MIC(n) = 2l n (^k, ^λ) + [dim(^k) + dim(^λ)] log(n) = 2l n (^k, ^λ) + 2 log(n) (3.3.1) It is clear that no matter what kind of Weibull distribution is assumed, there is no difference between these MIC(n) under the hull hypothesis. Under the alternative hypothesis, in MIC, the penalty term has been revised to be [2dim(^k 1h ) + 2dim(^λ 1h ) + ( 2h n 1)2 ] log n. (3.3.2) Then for the change point location 1 h < n, let MIC(h) = 2l n (^k 1h, ^k 2h, ^λ 1h, ^λ 2h, h) + [2dim(^k 1h ) + 2dim(^λ 1h ) + ( 2h n 1)2 ] log n. (3.3.3) If min MIC(h) < MIC(n), we have sufficient evidence to reject the null hypothesis and 1 h<n so there exists at least one change point in the data set. The change point location h is when M IC(h) arrives its minimum value. Once the first change point is detected, according to the binary segmentation, the whole data set could be separated into two parts and then the same procedure is conducted if there is any further change point suspected. The change point detection in a series of data will be stopped when min MIC(h) > MIC(n) for this series of data. If thus, the null 1 h<n hypothesis that no change point exists will be accepted. After the one-parameter-changeable Weibull distribution, we extend the model to the twoparameter-changeable Weibull distribution. Now both the shape parameter k and the scale parameter λ are assumed to be changeable. Based on the previous Weibull distribution model, the log likelihood function for the change point problem has the form l n (^k 1h, ^k 2h, ^λ 1h, ^λ 2h, h) = log f(x i, ^k 1h, ^λ 1h ) + log f(x i, ^k 2h, ^λ 2h ). (3.3.4)

33 Therefore, the MIC for the change point problem becomes 23 MIC(h) = 2l n (^k 1h, ^k 2h, ^λ 1h, ^λ 2h, h) + [2dim(^k 1h ) + 2dim(^λ 1h ) + ( 2h n 1)2 ] log n, (3.3.5) where h is the position of a change point, ^k 1h,^k 2h, ^λ 2h, and ^λ 2h maximize l n (^k 1h, ^k 2h, ^λ 1h, ^λ 2h, h) for a given h. In the two-parameter-changeable Weibull distribution, the MIC becomes MIC(h) = 2 log f(x i, ^k 1h, ^λ 1h ) 2 + 2dim(^λ 1h ) + ( 2h n 1)2 ] log n = 2 log[(^λ 1h^k1h ) h h n 2h 1 i exp( ^λ 2h 1h 1 i exp( ^λ 1h log f(x i, ^k 2h, ^λ 2h ) + [2dim(^k 1h ) 1h i )] 2 log[(^λ 2h^k2h ) n h 2h i )] + [2dim(^k 1h ) + 2dim(^λ 1h ) + ( 2h n 1)2 ] log n. (3.3.6) Since the dimension of both ^k 1h and ^λ 1h are 1, 2dim(^k 1h ) + 2dim(^λ 1h ) + ( 2h n 1)2 has the value of 4 + ( 2h n 1)2 in this situation. Therefore, MIC(h) can be simplified to be MIC(h) = 2 log[(^λ 1h^k1h ) h n 2h 1 i h exp( ^λ 2h 1h 1 i exp( ^λ 1h 1h i )] 2 log[(^λ 2h^k2h ) n h 2h i )] + [4 + ( 2h n 1)2 ] log n. (3.3.7) 3.4 Two-parameter Simulation Tests To investigate the performance of MIC to detect change point at different locations, simulation power tests have been conducted. This time, both of the two parameters of Weibull distribution are changeable. We conduct simulation 1000 times with different change point location h, sample size n = 100. The initial Weibull distribution is always W(1,1). The new Weibull distribution for the second part of the data set is W(2,2), W(1.75,1.75), or W(1.5,1.5).

34 24 Table 3.3: Two parameter power comparison for MIC with change point at different locations, n = 100 W(1,1) h=5 h=10 h=15 h=20 h=25 h=30 h=35 h=40 h=45 h=50 W(2,2) W(1.75,1.75) W(1.5,1.5) From the Table 3.3, the conclusions are the same as the Table 3.1. As the difference between the parameter increase, the power of the test also increase. For example, when the change point location is h = 10, the power for f 1 = W (1, 1) and f 2 = W (2, 2) is meanwhile for f 1 = W (1, 1) and f 2 = W (1.5, 1.5), the power is It could be observed that the power of the test is within an acceptable range. A different sample size n = 50 has also been chose to be compared with the original n = 100 for two parameter Weibull models. Everything else has been kept to be exactly the same. Table 3.4: Two Parameter power comparison for MIC with change point at different locations, n = 50 W(1,1) h=3 h=5 h=8 h=10 h=13 h=15 h=18 h=20 h=23 h=25 W(2,2) W(1.75,1.75) W(1.5,1.5) Based on Table 3.4, it is clear that as the sample size increase, the power of the test will also increases. Take the power of W(1,1) and W(2,2) when h = 25 as an example. For n = 100, the power is 0.968; while for n = 50, the power is Compared with the power of traditional SIC, MIC also has higher value for two-parameterchangeable Weibull distribution, as shown in Figure 3.2. Moreover, we can see MIC has more obviously higher power when the change point location h is in the middle area of the data set. The reason is the MIC penalty term, ( 2h n 1)2, which is different from the traditional value 1 in SIC. When the change point location is in the beginning or the end of the data set, as h 1 or h n, and n, ( 2h n 1)2 1. (3.4.1)

35 25 Figure 3.2: Two-parameter-changeable power comparison for MIC and SIC with change point at different locations for shape-changeable Weibull distribution, n = 50 This is very close to traditional SIC. However, when the change point location is in the middle of the data set, h n 2, Then this quadratic term ( 2h n ( 2h n 1)2 0. (3.4.2) 1)2 will be canceled. When the change location is exactly the middle of the term, ( 2h n 1)2 = 0 and the penalty term of MIC will be log n smaller than that of SIC, shown in Table 3.5. As the information criterion getting smaller, it is easier to reject the null hypothesis and to detect changes in the data. Table 3.5: Difference of Penalty Terms in SIC and MIC for Weibull Distribution For the same h SIC MIC Under H a 2l n + 5 log n 2l n + [4 + ( 2h n 1)2 ] log n h 1 or h n, and n 2l n + 5 log n 2l n + 5 log n h n 2 2l n + 5 log n 2l n + 4 log n To sum up, the main difference between MIC and SIC is that MIC has a higher power than SIC to detect change when the change happens in the middle area of the data set.

36 Application of MIC to Minimum Temperature Data in Uppsala In spite of the difference in the formula and the power simulation tests, there is no difference when MIC is applied to the minimum temperature data in Uppsala. Since the sample size of this data set is 208, based on the result in Table 3.5, the maximum difference (this happens when h = n 2 ) between the value of MIC and SIC would be log 208 = This value is very small compared with the value of log-likelihood for this data set. Now for MIC, the analysis also clearly identifies a significant change at the year 1837 (h = 64). In the first step, the SIC for the null hypothesis is , and the minimum SIC for the alternative hypothesis is with the change point location h = 64. After that, based on binary segmentation, the whole data set is separated to be (1 : 64) and (65 : 208) two parts. By conducting the same procedure, these rest possible change points are found, h = 55, h = 136,and h = 139. The conclusion of the change point detection is illustrated in Table 3.5. The result is exactly the same as SIC. Table 3.6: Change Points of Minimum temperatures in Uppsala, Change Point Location Year Possible Reasons The First Industrial Revolution: Steam Engine The Second Industrial Revolution: Coal Power (136) (1909) (Too close to Year 1912) Petroleum industry

37 27 CHAPTER 4 IMPROVED SCHWARZ INFORMATION CRITERION FOR CHANGE POINT DETECTION IN WEIBULL MODELS 4.1 A Conjecture for Improved Schwarz Information Criterion As mentioned in Chapter 2.2 and 2.4, the traditional SIC has high power when the change happens in the middle of the data, while it has lower power when the change is located in the beginning and the middle of the data set. In Chapter 3.4, we conclude that the MIC has an even higher power than SIC when the change is in the middle area, but its performance is similar to SIC if change is in two ends. However, if SIC is already so powerful in the middle area, why should it be strengthened by MIC? If SIC is relatively weak at data edges, why not improve this part instead? This is the motivation of Improved Schwarz Information Criterion (IIC), to increase the power of the test when change happens at edges. Figure 4.1: ( 2h n 1)2 value for different change locations It is known that as the information criterion getting lower, it is easier to reject the null hypoth-

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For