Master Thesis. Change Detection in Telecommunication Data using Time Series Analysis and Statistical Hypothesis Testing.

Size: px

Start display at page:

Download "Master Thesis. Change Detection in Telecommunication Data using Time Series Analysis and Statistical Hypothesis Testing."

Kelley Hubbard
5 years ago
Views:

1 Master Thesis Change Detection in Telecommunication Data using Time Series Analysis and Statistical Hypothesis Testing Tilda Eriksson LiTH-MAT-EX 2013/04 SE

3 Change Detection in Telecommunication Data using Time Series Analysis and Statistical Hypothesis Testing Applied Mathematics, Linköping University, Institute of Technology Tilda Eriksson LiTH-MAT-EX 2013/04 SE Master Thesis: 30 hp Level: A Supervisors: Lars-Olof Björketun, Ericsson GSM RAN Torkel Erhardsson, Mathematical Statistics, Linköping University Examiner: Torkel Erhardsson, Mathematical Statistics, Linköping University Linköping: June 2013

5 Abstract In the base station system of the GSM mobile network there are a large number of counters tracking the behaviour of the system. When the software of the system is updated, we wish to find out which of the counters that have changed their behaviour. This thesis work has shown that the counter data can be modelled as a stochastic time series with a daily profile and a noise term. The change detection can be done by estimating the daily profile and the variance of the noise term and perform statistical hypothesis tests of whether the mean value and/or the daily profile of the counter data before and after the software update can be considered equal. When the chosen counter data has been analysed, it seems to be reasonable in most cases to assume that the noise terms are approximately independent and normally distributed, which justifies the hypothesis tests. When the change detection is tested on data where the software is unchanged and on data with known software updates, the results are as expected in most cases. Thus the method seems to be applicable under the conditions studied. Eriksson, v

6 vi Keywords: counter data, software update, change detection, stochastic, time series, daily profile, noise, mean value, statistical hypothesis tests URL for electronic version:

7 Acknowledgements I would like to thank Helene and Lars-Olof at Ericsson, for making me feel welcome and for giving me the opportunity to grow, both personally and in my field of study. I would also like to thank Torkel at Linköping University, for helping me when my knowledge in statistics was insufficient. Thanks to my parents, for always letting me know that I could study whatever I wanted. And to my fellow students at Yi, who made my initial years at the university very pleasant. Special thanks to Richard and Maud, for being true friends through difficult times. And to my husband. Thank you for your unconditional love and for always believing in me. Finally, to myself, for never giving up. Linköping, June 2013 Tilda Eriksson Eriksson, vii

8 viii

9 Nomenclature Most of the reoccurring abbreviations and symbols are described here. Symbols X Random Variable x Observation of a Random Variable µ Population Mean σ 2 Population Variance σ Population Standard Deviation x Sample Mean s 2 Sample Variance s Sample Standard Deviation H 0 Null Hypothesis H 1 Alternative Hypothesis T Test Statistic t obs Observation of the Test Statistic C Critical Region, Rejection Region α Level of Significance λ α/2 α-quantile P P-value df Degrees of Freedom n Number of Observations X t Stochastic Time Series s t Seasonal Component, Daily Profile p Period m t Trend Component Y t Noise Term h Lag Eriksson, ix

10 x Abbreviations acf Auto Correlation Function cdf Cumulative Distribution Function iid Independent and Identically Distributed pdf Probability Density Function BRP Basic Recording Period BSC Base Station Controller BSS Base Station System BTS Base Transceiver Station CN Core Network DL Downlink EGPRS Enhanced General Packet Radio Service GSM Global System for Mobile communications KPI Key Performance Indicator MS Mobile Station OSS Operation and Support System PDCH Packet Data CHannel PI Performance Indicator RAN Radio Access Network ROP Result Output Period RPI Resource Performance Indicator STS Statistics and Traffic Measurement Subsystem TBF Temporary Block Flow TS Time Slots

11 Contents 1 Introduction Background Problem Statement Methods used To Perform the Thesis Work Scope Topics Covered Counter Data Overview of the Ericsson GSM System The STS Counters Collection and Behaviour of the Counter Values Processing of Counter Data Key Performance Indicators and Change Detection Statistical Hypothesis Testing Random Variables And Their Probability Distributions Independent and Identically Distributed Random Variables The Normal Distribution and the Central Limit Theorem Statistical Description of Data How to Perform a Hypothesis Test Using P-values Comparing the Means of Two Samples Why The Test Statistic Is t-distributed Comparing the Variance of Two Samples Testing Normality Histogram Lilliefors Test Time Series Analysis Time Series and Hypothesis Tests Estimation of Trend, Seasonality and the Remaining Noise Term Testing the Independence Assumption of the Noise Eriksson, xi

12 xii Contents 5 Analysis of Counter Data Selection of Data to Work With The Daily Profile Outliers Missing Values Testing Independence and Normality of the Noise The Change Detection Method The Change Detection Method and Its Limitations Change Detection of Mean Value Change Detection of Daily Profile Normality Assumption Level of Significance Applying the Change Detection Method To Counter Data Selection of Data Data Volume The Throughput KPI and Underlying PI:s and Counters Interpreting the Figures Counter Data Without Known Changes Level of Significance Counter Data With Known Changes Discussion Why Time Series Analysis and Statistical Hypothesis Testing Problems With Hypothesis Tests Analysis of the Result Assumptions Change Detection Weekdays Versus Week How the Method Could Be Used Future Work Conclusions

13 List of Figures 2.1 The GSM system model The Counters Divided Into Object Types In the STS The KPI-Pyramid An example of a pdf An example of a cdf The pdf of normal distributions with standard deviation σ The α-quantile Two-sided quantile for a standard normal distribution The null distribution and rejection of H One-sided P-value Two-sided P-value The t-distribution The F-distribution The pdf of an F-distributed test statistic and its two-sided rejection region Histogram of random samples from a normal distribution Histogram of a large number of random samples from a normal distribution with a fitted normal pdf Definition of the test statistic for the Kolmogorov-Smirnov and Lilliefors test A time series and its estimated seasonal component and trend The estimated noise term of figure Normally distributed iid noise The sample acf of a time series with a trend and seasonal behaviour The sample acf of an iid-sequence An example of counter data An example of counter data with missing values An example of counter data with an outlier An example of estimated trend and daily profile for counter data An example of estimated trend and daily profile for counter data An example of the sample acf for counter data with a daily profile Counter data with a removed outlier An example of the sample acf for the noise An example of the sample acf for the noise Testing the normality assumption of the noise Eriksson, xiii

14 xiv List of Figures 7.1 Relations between the KPI, PI:s and counters used No significant change in User Data Volume No significant change on KPI-level No significant change on PI level No significant change on counter level Change detected Change still detected when the possible outliers are removed Change detection of Throughput Change detection of Simultaneous TBFs per PDCH Change detection of DLTBFPEPDCH Change detection of DLEPDCH Change detection of Radio Link Bitrate Change detection of MC19DLACK Change detection of MC19DLSCHED Change detection of Multislot Utilization Change detection of Average Maximum Number of TS reservable by MSs Change detection of MAXEGTSDL Change detection of TRAFF2ETBFSCAN Change Detection of User Data Volume Change Detection of Daily Profile Change Detection of Daily Profile Change Detection of Daily Profile

15 Chapter 1 Introduction This first chapter will present some background to the problem formulation, fomulate the questions to be answered in the forthcoming chapters and describe what topics are covered. 1.1 Background The mobile telecommunication system is a very large infrastructure with more than 4,5 Billion GSM only subscribers around the world. The GSM system has been developed over 20 years and is continuously enhanced. This has resulted in a complex system. One way of monitoring that the GSM system behaves as expected is to use statistics and counters. In the Ericsson BSS system (the radio part of the GSM system) there are over 2000 different types of counters, often with instances in every cell, that track the behaviour of the system. Due to new software releases the behaviour of the system changes. Often these changes are visible in the counter data. Today the changes in the counter data is detected by hand. This is a very time consuming process and many changes are undetected due to the large number of counters. The purpose of this master thesis is to investigate if the changes in the counters due to new software releases can be detected using mathematical methods and computer calculations. If this is achieved, it can reduce the number of counters that has to be analysed by hand. An earlier attempt was made to solve the problem, but the change detection failed because of assumptions that the counter data behaved in a deterministic way. Due to the stochastic behaviour of the system, probability theory and statistics seems to be a better approach. 1.2 Problem Statement The thesis problem statement is: Can any mathematical method/methods to detect changes in counter data after new software releases be found? And if a method is found, how can it be implemented and tested on counter data? To answer the question a couple of other questions arises. How do the counters behave? What type of changes are we looking for? Is the detection to be made on unprocessed data or data that has been processed in some way? Eriksson,

16 2 Chapter 1. Introduction Which data is to be used to perform the analysis? Should knowledge of the system be used or should the data be treated as unknown? How do we decide if the method is working or not? 1.3 Methods used To Perform the Thesis Work Methods used to perform the thesis work was to gain knowledge about the counter data to work with and to gain some knowledge about the system, to put the counters into a broader context. Literature on different mathematical methods, mainly in statistics and also some literature in machine learning and signal processing was also studied. And a set of data to work with was chosen. When a possible method was found, the assumptions that has to be made on the data was tested. Then the method was implemented in Matlab and tested on parts of the counter data. Finally, an analysis of the method was performed. 1.4 Scope The purpose of the method searched for in this thesis work is only to detect changes, not to determine whether the changes should or should not occur. The method is also limited to the use of a reference week, a fixed time for software update and a test week after the update. Consideration is therefore not taken to long term trends and seasonal behaviour of the data over longer time periods. The period of the seasonal component of the data is considered as 24 hours for all data and every day of the week is treated as equal. When the data is plotted, the x-axis shows the number of hours elapsed. 1.5 Topics Covered There are eight chapters (including this introduction). Main topics dealt with are: Chapter 2: We get to know the counters, the system that they are part of, the behaviour of the counters and how the counter data is used and processed. Chapter 3: This chapter is devoted to mathematical concepts used in this thesis work regarding statistics and probability theory, mainly about statistical hypothesis testing and the underlying assumptions. Chapter 4: This chapter presents some theory of time series and how to study the properties of the noise term. Chapter 5: In this chapter the results from the analysis of the counter data is presented. Chapter 6: Describes the method for change detection in counter data. Chapter 7: Here the results from applying the change detection method to counter data is presented.

17 1.5. Topics Covered 3 Chapter 8: Here we find a discussion about the method and the results, examples of how the method could be used and thoughts about future work on the subject. Here we also find conclusions and motivations for the choice of method.

18 4 Chapter 1. Introduction

19 Chapter 2 Counter Data There are about 3000 different types of counters in the Ericsson Base Station System of the GSM-network. They track the behaviour of the system. In this chapter we will get an overview of the counters and how the counter data is processed and used. 2.1 Overview of the Ericsson GSM System The Ericsson GSM system is divided into three parts. The Core Network (CN), the Operation and Support System (OSS), and the Base Station System (BSS). The CN includes control servers that sets up calls, handle subscription information, location of the user, authentication, ciphering, etc. The OSS is a support node and the BSS is the radio part of the system that contains Base Transceiver Stations (BTS), which is the radio equipment with antenna, transmitters and receivers, and a control unit, the Base Station Controller (BSC). See Figure 2.1. Figure 2.1: The GSM system model Eriksson,

The geographical area over which the mobile users are distributed is divided into cells and one BSC handles many cells.

20 6 Chapter 2. Counter Data In different parts of the BSC there are counters tracking the behaviour of the system and the traffic that it handles. The counters considered in this report are called PEG-counters and can only be incremented. They count system- or user events, resource usage and traffic volumes. The geographical area over which the mobile users are distributed is divided into cells and one BSC handles many cells. The counters often have instances in every cell, so one counter can have many instances since the number of cells is large. The counters track the behaviour of the system on system level, BSClevel and cell level and there is a subsystem in the BSC that collects and handles the counter values. This subsystem is called Static and Traffic Measurement Subsystem (STS). 2.2 The STS Counters The Static and Traffic Measurement Subsystem collects counter values from the counters in the BSC. The STS collects the counter values with a fixed time interval of 5 or 15 minutes, the Basic Recording Period (BRP), saving a value that is the difference between the current value and the last value. The counter values in the STS is thus a measure of what has happened during the latest time interval, the BRP-interval. Since the counters in the BSC are only incremented, the values in the STS will be positive numbers or zero. Figure 2.2: The Counters Divided Into Object Types In the STS In the STS, the counters are divided into object types. There are up to 30 counters in every object type, they are mostly linked to each other in system behaviour. Figure 2.2 shows how the counters are divided into object types in the STS.

2.3. Processing of Counter Data 7 2.2.1 Collection and Behaviour of the Counter Values When the counter values are collected from the STS it is done at fixed time intervals called ROP (Result Output Period).

21 2.3. Processing of Counter Data Collection and Behaviour of the Counter Values When the counter values are collected from the STS it is done at fixed time intervals called ROP (Result Output Period). Since the counter values are collected every 5 or 15 minutes in the STS, the values collected between ROPintervals are accumulated. Thus the value collected from the STS will still be a measure of what has happened during the latest time interval, but now the time interval is ROP instead of BRP. The ROP-interval is often an hour, and thus the STS counters consists of observations of the number of events during an hour, recorded every hour. 2.3 Processing of Counter Data The counter data is used to observe the system behaviour. It can be used to analyse trends in the traffic behaviour, fault indication and as a measure of end user performance. Since the data is collected from a large number of cells, it can be relevant to aggregate the collected data to BSC level, to get an overview of the system behaviour in all cells. This is done by a sum over all the cells. The data can also be aggregated in time. For example the data on BSC level hour is a summation of all the collected values from every cell during the previous hour Key Performance Indicators and Change Detection The Key Performance Indicators (KPI) describes the end-users perception of the performance of the BSS. The KPI:s are calculated from counter data. Over 100 counters are used. Figure 2.3: The KPI-Pyramid There are different levels of the KPI:s. On the highest level are the KPI:values that describe the end-user perception. The next level shows the supporting Per-

22 8 Chapter 2. Counter Data formance Indicators (PI), which describe system performance. After that comes the Resource Performance Indicators (RPI), which are not dealt with in this report but could be treated the same way. See figure 2.3. The KPI:s and PI:s are calculated on cell level and can be aggregated on BSC- or system level. When the system performance is evaluated, the KPI- and supporting PIvalues are used. To see if the system behaviour has changed after a software update, the KPI-values are analysed and when a KPI-value is found different the analysis continues further down the pyramid of figure 2.3 to see where the system change takes place. Since focus are on KPI- and PI-values, counters that are not involved in the KPI-evaluations are often missed in the change detection.

23 Chapter 3 Statistical Hypothesis Testing In this chapter the concept of random variables will be introduced, together with their distributions and how to perform statistical hypothesis tests. 3.1 Random Variables And Their Probability Distributions A deterministic function shows the same value over and over again if you have the same input, but a random variable can change value every time you observe it, even if the input is unchanged. The value varies in an uncontrollable way, but the probability of its behaviour can often be determined. The probability of a certain outcome x of a random variable X is denoted P (X = x) and is specified by the probability distribution of that random variable. The probability distribution is a kind of weight function, with higher values for the outcomes that occur more often and much smaller values for the outcomes that occur very seldom. If the variable is continuous, the probability distribution is called probability density function, pdf, and is denoted f X (x). When the pdf is integrated over all possible values of X, its integral equals one. An area that equals one is thus distributed over all the possible outcomes and the most probable outcome has the largest piece of the area. Figure 3.1 is showing an example of a pdf. Any function f(x) 0, x R that satisfies f(x)dx = 1 can be used as a probability density function. [2] R For a continuous random variabe X, the probability that X take values in A is defined as: P (X A) = f(x)dx If A = (, x], P (X A) is called the cumulative distribution function, cdf, denoted F X (x): A F X (x) = P (X x) = x f(t)dt The cumulative distribution function thus accumulates the probability from up to a value x. [7] Figure 3.2 is showing an example of a cdf. Eriksson,

24 10 Chapter 3. Statistical Hypothesis Testing Figure 3.1: An example of a pdf. Figure 3.2: An example of a cdf. To describe the behaviour of a random variable, a measure of the central tendency of its values and the dispersion around that value can be useful. The expected value, also called the mean value or average value, is a measure of that central tendency. It is the average of many independent observations of that

25 3.1. Random Variables And Their Probability Distributions 11 particular random variable.[2] The expected value is defined as follows: E(X) = xf X (x)dx = µ R The measure of dispersion around the central tendency is called the variance. It is defined as the average squared deviations from the mean: [7] V (X) = E[(X µ) 2 ] = (x µ) 2 f X (x)dx = σ 2 To get the dispersion in the same unit as X itself the square root of the variance is often used, called the standard deviation, σ Independent and Identically Distributed Random Variables Random variables X 1,..., X n are called iid random variables if they are independent and identically distributed. The random variables are identically distributed if they have the same probability distribution and they are independent if: f X1,...,X n (x 1,..., x n ) = f X1 (x 1 ) f Xn (x n ) This means that the knowledge of the value of one random variable does not affect the knowledge of the other The Normal Distribution and the Central Limit Theorem A commonly known distribution is the Normal Distribution, also called Gaussian Distribution. Many phenomena around us is approximately normally distributed and it is used throughout much of the statistical theory. The mathematical properties of the normal distribution also make it very useful. The probability density function of the normal distribution is calculated as follows: R f X (x) = 1 σ (x µ) 2 2π e 2σ 2 And the cumulative distribution function: F X (x) = 1 σ 2π x e (t µ) 2 2σ 2 dt Figure 3.3 shows the probability density function of normal random variables with different standard deviation. A random variable that is normally distributed with expected value µ and standard deviation σ is denoted X N(µ, σ). When µ = 0 and σ = 1 you get the standardized normal distribution with probability density function: and cumulative distribution function: ϕ(x) = 1 e x2 2 2π Φ(x) = 1 2π x e t2 2 dt

26 12 Chapter 3. Statistical Hypothesis Testing Figure 3.3: The pdf of normal distributions with standard deviation σ. Figure 3.4: The α-quantile. The x-value where the area of the probability density function above x equals α is called the α-quantile. See figure 3.4. For a standardized normal distribution the α-quantile is denoted λ α. Φ( x) is the area of ϕ(x) from to x. Since ϕ(x) is an even function

27 3.2. Statistical Description of Data 13 ϕ( x) = ϕ(x) and the area of ϕ(x) from x to is equal to Φ( x). Since the whole area is one Φ( x) = 1 Φ(x). [2] The total area below x and above x is thus 2 Φ( x). If that area is α, then x = λ α/2. See figure 3.5. This fact is used in two-sided hypothesis tests. Figure 3.5: Two-sided quantile for a standard normal distribution. The normal distribution has many applications. One reason for that is that the sum of many iid random variables are approximately normally distributed with mean value nµ and standard deviation σ n, where n is the number of random variables. This is called the central limit theorem. The sample mean: x = 1 n of a sequence of n iid random variables is thus approximately normally distributed with mean µ and standard deviation σ/ n, another fact used in the hypothesis testing. Linear combinations of independent and normally distributed random variables are also normally distributed. If X 1 N(µ 1, σ 1 ) and X 2 N(µ 2, σ 2 ), then X 1 X 2 N(µ 1 µ 2, σ1 2 + σ2 2 ). [2] n i=1 3.2 Statistical Description of Data A set of data, or sample, is treated as observations of a larger population. When the data is described, the purpose is to describe the behaviour of the population as accurately as possible. Even though the population does not have to be real, the name arises from the fact that in statistics, the object of interest is often a group of individuals, a population. x i

28 14 Chapter 3. Statistical Hypothesis Testing We wish the sample to reflect the characteristics of the whole population. Thus it has to be drawn at random. A random sample of observations of the random variable X is a set of iid random variables X 1,..., X n each with the distribution of X. [7] The probability model for the population of the data is usually not known. It has to be estimated using the sample. The estimator is a function of the sample observations and such a function is called a statistic. It is a random variable since it is a function of random variables: T = t(x 1,..., X n ) Since it is a random variable, a distribution function describes its behaviour. It is called the sampling distribution and it is derived from the distribution of the population. For example the population mean, µ, is estimated using the sample mean: X = 1 n X i n which is a statistic with a sample distribution and corresponding expected value and variance. The expected value of the sample mean is the mean of the whole population. The sample variance, S 2 is calculated: S 2 = 1 n 1 i=1 n (X i X) 2 i=1 S is the sample standard deviation and the expected value of the sample variance is the population variance, σ How to Perform a Hypothesis Test When performing a hypothesis test we want to test an assumption about a random sample, x = (x 1,..., x n ). The assumption could be that the sample observations come from a normal distribution, that the distribution mean has a certain value etc. The assumption that you want to test is called the null hypothesis, H 0. The null hypothesis is tested against an alternative hypothesis H 1 that is considered true if H 0 is rejected. If H 1 is a single interval, for example if the assumption is µ = 0 and H 1 : µ > 0, the test is called one-sided and if instead H 1 : µ 0 (µ > 0 or µ < 0) the test is called two sided. First you have to find a so called test statistic: T = t(x), X = (X 1,..., X n ) The test statistic is a function of the random variables that the sample observations are observations of. The test statistic tells us how those random variables behave if the null hypothesis is true. It is based on knowledge of the distribution under the null hypothesis, the null distribution, and different assumptions requires different test statistics with different null distributions. There are some well specified test statistics to use for different kinds of tests. H 0 is rejected if the observation of the test statistic, t obs = t(x), belongs to a so called critical region, C. The critical region is chosen such that P (T C) = α

29 3.3. How to Perform a Hypothesis Test 15 Figure 3.6: The null distribution and rejection of H 0. if H 0 is true. If the test is one-sided the critical region is the values above the α-quantile of the null distribution, see section If the test is two sided the critical area is below t α/2 and above t α/2. See figure 3.6. The null distribution then has its most probable values between t α/2 and t α/2 and if t obs takes values there the assumption of H 0 cannot be rejected. α is called the level of significance of the hypothesis test and the test is called significant at level α if the hypothesis is rejected and the assumption is considered false. Usual values of α are 0.05, 0.01 and [2] Suppose that the value of t obs C. Then H 0 is rejected, thus we assume that the assumption we want to test does not hold. But it could also be the case that the hypothesis is true and that t obs takes a value that is less probable but still probable under the null hypothesis. Therefore α is the probability that H 0 is rejected if H 0 is true. This is called type I error. Another kind of error is the so called type II error, which is the probability of not rejecting H 0 when H 0 is not true. If H 0 is not rejected this does not necessarily mean that H 0 is true Using P-values If the hypothesis test is one-sided, the P-value is the area of the null distribution above the observed value of the test statistic, t obs, and is defined as follows: P = P (T t obs ) This is shown in figure 3.7. If the P-value is smaller than the level of significance, α, the null hypothesis is rejected. This means that we do not believe in H 0 if the result is unlikely when H 0 is true. [2] When the hypothesis test is two-sided a two-sided P-value is calculated: P = P ( T > t obs )

16 Chapter 3. Statistical Hypothesis Testing Figure 3.7: One-sided P-value. Figure 3.8: Two-sided P-value. The null hypothesis is rejected if P < α, and as seen in figure 3.

30 16 Chapter 3. Statistical Hypothesis Testing Figure 3.7: One-sided P-value. Figure 3.8: Two-sided P-value. The null hypothesis is rejected if P < α, and as seen in figure 3.8 the rejection of H 0 is more accurate for smaller P-values. 3.4 Comparing the Means of Two Samples When comparing the means of two samples from normal distributions, a t- distributed test statistic is used. The density function of the t-distribution is shown in figure 3.9 with different degrees of freedom, df. The degrees of freedom is the number of samples minus the number of estimated parameters. [9] If df =, the t-distribution becomes the standard normal distribution. The test is called a two-sample t-test. There are two different cases, one

31 3.4. Comparing the Means of Two Samples 17 Figure 3.9: The t-distribution. where the variance of the two samples are considered equal and one where the variances are not equal. In both cases a t-test is used, but with different appearance of the test statistic. If the sample standard deviation of the two samples are considered equal, the test statistic is defined as follows: X 1 T = X 2 1 S p n n 2 where S p is called the pooled standard deviation, and S 2 p is an estimate of the common variance: [8] S 2 p = (n 1 1)S (n 2 1)S 2 2 n 1 + n 2 2 The term df = n 1 + n 2 2 is the number of degrees of freedom and S1 2 and S2 2 are the respective sample variances. If the variances are not considered equal, σ1 2 σ2, 2 the test statistic is defined as: T = X 1 X 2 S1 2 n 1 + S2 2 n 2 where S 1 and S 2 are the sample standard deviations. This test statistic is only approximately t-distributed. The number of degrees of freedom is calculated as: [9] df = ( S2 1 n 1 + S2 2 n 2 ) 2 (S 2 1 /n1)2 n (S2 2 /n2)2 n 2 1

32 18 Chapter 3. Statistical Hypothesis Testing We want to test the null hypothesis, H 0 : µ 1 = µ 2 (or µ 1 µ 2 = 0) against the alternative hypothesis H 1 : µ 1 µ 2. The test is thus two-sided. If the null hypothesis is true, the test statistic is t-distributed with the given degree of freedom according to equal variance or not. The observation of the test statistic is: t obs = x 1 x 2 1 s p n n 2 if the variances are considered equal and: t obs = x 1 x 2 s 2 1 n 1 + s2 2 n 2 if the variances are unequal. If t obs > t α/2,df the null hypothesis is rejected. t α/2,df means that the area above t α/2 of a t-distributed probability density fuction with df degrees of freedom equals α/2. α is the pre-defined level of significance. See figure 3.6. You can also calculate the P-value: P = P ( T > t obs ) and reject H 0 if P > α. The test is based on an assumption that the samples are random and come from normal distributions. Even if the distributions are not normal this kind of test can still be used, since the test is based on the sample mean and the sample mean is approximately normally distributed if the number of observations is large enough, according to the central limit theorem Why The Test Statistic Is t-distributed When we compare the mean of two different samples we want to say something about the mean of the entire population, not just the observations drawn from it. But since only samples of the two populations are available to us, we have to estimate the mean of the population using the sample mean. This gives rise to uncertainty in the estimated mean and the sample mean has its own probability distribution, expected value and standard deviation. If the standard deviation is not known, it has to be estimated as well. This gives rise to more uncertainty. If the standard deviation is known: [2] X µ σ/ n N(0, 1) Since s, the estimate of σ, has to be used instead of σ, instead of X µ σ/ n X µ s/ n has to be used, where s is the estimate of the sample standard deviation: s = 1 n (x i x) n 1 2 i=1 The stochastic variable to be studied is thus:

33 3.5. Comparing the Variance of Two Samples 19 T = X µ S/, T t(n 1) n T is t-distributed with n 1 degrees of freedom. If you exchange X with X 1 X 2 and use the fact that: X 1 X σ1 2 2 N(µ 1 µ 2, + σ2 2 ) n 1 n 2 you get the test statistics for the two sample t-test, see section Comparing the Variance of Two Samples Figure 3.10: The F-distribution. The ratio of the sample variances of two independent samples from normal distributions is F-distributed with n 1 1 numerator and n 2 1 denominator degrees of freedom, where n 1 and n 2 are the number of observations of the samples. [8] The F-distribution is shown in figure This fact can be used when testing if two samples from normal distributions have equal variance. The test statistic for the test is F = S2 1 S 2 2 The null hypothesis H 0 : σ 2 1 = σ 2 2 is rejected if f obs > F α/2,n1 1,n 2 1 or if f obs < F 1 α/2,n1 1,n 2 1 for a two sided test. See figure When calculating a two-sided P-value for an F-distributed test statistic, care must be taken as to whether the value of f obs is in the upper or lower tail of the

20 Chapter 3. Statistical Hypothesis Testing Figure 3.11: The pdf of an F-distributed test statistic and its two-sided rejection region. F-distribution.

The two sided P-value is: P = 2 (1 P (F n1 1,n 2 1 > f obs )) if it is in the lower tail and: P = 2 P (F n1 1,n 2 1 > f obs ) if it is in the upper tail.

34 20 Chapter 3. Statistical Hypothesis Testing Figure 3.11: The pdf of an F-distributed test statistic and its two-sided rejection region. F-distribution. If f obs is below the median of the null F-distribution it is in the lower tail and if it is greater than the median it is in the upper tail. The two sided P-value is: P = 2 (1 P (F n1 1,n 2 1 > f obs )) if it is in the lower tail and: P = 2 P (F n1 1,n 2 1 > f obs ) if it is in the upper tail.[5] The F-test is more sensitive to the normality assumption than the t-test, since the central limit theorem can not be used. 3.6 Testing Normality The assumptions that has to be made to use a statistical hypothesis test of the kind described in section 3.4 and 3.5 is that the samples are independent and identically distributed, hopefully with a normal distribution. To test the normality assumption two different methods are considered in this report, one of them visual Histogram A histogram is a graphical representation of the distribution of a set of observations and it can be used for density estimation. The values of the observations are divided into bins and the height of each bin represents the number of observations in that bin. See figure One rule of thumb for the number of bins is the square root of the number of observations. [10]

35 3.6. Testing Normality 21 Figure 3.12: Histogram of random samples from a normal distribution. Figure 3.13: Histogram of a large number of random samples from a normal distribution with a fitted normal pdf. Since the histogram is an estimate of the probability distribution for a set of data, a probability density function can be fitted to the data if the population distribution is known. If we want to test if the observations come from a specific

36 22 Chapter 3. Statistical Hypothesis Testing distribution we can compare the shape of the histogram to the fitted pdf. See figure If we compare the shape of figure 3.12 with the shape of figure 3.13, with the same probability distribution but a much larger number of observations, we see that the shape of the histogram approaches the shape of the population pdf when the number of observations gets larger Lilliefors Test The Kolmogorov-Smirnov test is a statistical test to determine if a set of observations comes from a normal distribution. The test statistic for the Kolmogorov- Smirnov test, D, is the largest absolute difference between the sample cdf, F n (x), and the population cdf, F (x). [7] See figure D = sup x F (x) F n (x) This test statistic is distribution free and a table must be used for the critical values. Figure 3.14: Definition of the test statistic for the Kolmogorov-Smirnov and Lilliefors test. When the mean and variance are not specified and have to be estimated from the sample, the Kolmogorov-Smirnov test has to be modified. This is done by using the Lilliefors approach of the Kolmogorov-Smirnov test, where the tables for the test statistic is changed to fit a normal distribution with estimated mean and variance. [6] The test statistc is then: D = sup x F (x) F n (x) where F n (x) is sample cdf and F (x) is the normal population cdf with estimated mean, µ = x, and variance, σ 2 = s 2. If D exceeds the critical value in the

37 3.6. Testing Normality 23 Lilliefors table, the null hypothesis that the observations come from a normal distribution is rejected.

38 24 Chapter 3. Statistical Hypothesis Testing

39 Chapter 4 Time Series Analysis In this chapter we will learn how to estimate trend and seasonal components in a time series and how that can be used for testing assumptions on the remaining noise term. 4.1 Time Series and Hypothesis Tests A time series is a set of observations {x t } n t=1, each one being recorded at a specific time t. [3] If the time series consists of observations of random variables, i.e. if x t is an observation of X t, the time series is a realization of a stochastic process. Like in the case of random variables, every time series is one of many possible realizations of the same underlying stochastic process. Thus two time series that are observations of the same process are not necessarily equal. When the set of times is a discrete set, the time series is a discrete-time series. This is the case when observations are made on fixed time intervals. Another property of a time series is stationarity. A time series is said to be stationary if it has the same behaviour regardless of when you observe it. More precisely, X t, t = 0, 1,... and the time shifted series X t+h, t = 0, 1,... has to have the have the same statistical behaviour. In many applications the behaviour of a given time series is not specified and has to be analysed. You have to look for dependence among the observations, periodic behaviour, long-term trends, etc. A suitable model for the time series has to be found and often the signal has to be separated from the noise term. A probability model for the remaining noise term has to be found and this can be used as a diagnostic tool for the accuracy of the model of the signal. When statistical hypothesis tests are to be performed on a time series, we wish its behaviour to be like that of iid random variables. If a sequence of random variables is iid and the mean of the series is zero the stochastic process is called an iid-process and is denoted IID(0, σ 2 ). If there is a seasonal behaviour in the time series there is a dependence among the observations. This dependence has to be eliminated for the hypothesis test to be used. Eriksson,

40 26 Chapter 4. Time Series Analysis 4.2 Estimation of Trend, Seasonality and the Remaining Noise Term In a time series we can often find an underlying trend and a seasonal variation, a repeated pattern over a shorter time interval. The seasonal variation can be modelled as a periodic component, s t, with a fixed period p, such that s t = s t+p. A model can then be created for the time series by dividing the series into a trend component m t, the seasonal component s t and a noise term Y t. The noise term is a random variable that describes the deviations from the model. This kind of model is called the classical decomposition: X t = m t + s t + Y t Since the trend is described in the trend component, the mean value of the noise term is zero. The sum of the periodic component over a whole period is also zero: [3] p s k = 0 k=1 Figure 4.1: A time series and its estimated seasonal component and trend. Figure 4.1 shows a time series with estimated trend and seasonal behaviour. The trend is a deterministic component that describes the level of the signal. The seasonal component is also a deterministic component that describes the seasonal behaviour around the trend. The remaining stochastic noise term is the deviations of the signal from those components. To estimate the seasonal component, a suitable period is choosen, with p = 2q if p is even. Then a trend is estimated using a moving average filter, which is a weighted average of neighbouring points that is used to flatten the

41 4.2. Estimation of Trend, Seasonality and the Remaining Noise Term 27 fast variations in the data. This leaves the slow components, the trend. A special kind of moving average filter is used that is suitable for estimation of seasonal components: [3] ˆm t = (0.5x t q + x t q x t+q x t+q )/p, q < t n q If p instead is odd, then p = 2q + 1 and: ˆm t = 1 2q + 1 q x t j, j= q q + 1 t n q After the trend is estimated, then for every k = 1,...p, the average of the deviations from the trend is calculated, w k. How many points that can be used for calculating the average of the deviations is decided by the number of periods in the time series when the first and last q values is eliminated in the trendestimation. The deviations from the trend is calculated for every k = 1,...p as: {(x k+jp ˆm k+jp ), q < k + jp n q} and w k is then calculated by taking the average of those deviations over the number of periods fitted in the data available. To make sure that the seasonal component sum to zero, the seasonal component is estimated as: since s k = w k 1 p p w i, i=1 k = 1,..., p p s k = k=1 p (w k 1 p k=1 p w i ) = i=1 p w k p 1 p k=1 p w i = i=1 p w k k=1 p w i = 0 and since s is periodic with period p, s k = s k p for k > p. Now the seasonal component has been estimated. To estimate the trend component the seasonal component is first removed from the original data. The deseasonalized component, d t, is defined as: d t = x t s t, t = 1,..., n Then another moving average filter is used to estimate the trend since the earlier trend estimation we did was just for calculating the seasonal component. This trend estimation uses a moving average filter of the type: m t = 1 2q + 1 q d t j, j= q q + 1 t n q It also eliminates the components of fast variations and is thus a low-pass filter since the components of low frequencies passes the filter. This is not the only way of estimating the trend. Instead of this trend estimation, a constant representing the average value could be used. i=1

42 28 Chapter 4. Time Series Analysis To get the noise term the trend is eliminated from the deseasonalized time series and we get: Y t = x t s t m t The estimated noise term of the time series in figure 4.1 is shown in figure 4.2. Figure 4.2: The estimated noise term of figure Testing the Independence Assumption of the Noise As discussed throughout this report, for many hypothesis tests we need the samples to be iid. When the mean of two samples is to be compared, the data is averaged over the number of observations. If the number of observations is a whole number of periods, the seasonal behaviour is extinguished. Thus the behaviour of the noise term can be used to test the assumptions of the hypothesis test. We want to check if the noise term is approximately independent IID(0, σ 2 ) and if it is normally distributed N(0, σ 2 ). To see if the noise sequence is independent a first test is to plot the sequence. A typical behaviour if there are dependence among the observations is that many observations in a row are having the same sign. Figure 4.3 shows an independent noise sequence that is normally distributed. Another more distinct method to see if the estimated noise sequence is independent is to use the so called sample autocorrelation function, sample acf. The autocorrelation of a sequence at lag h is the correlation between the observations in the sequence. The autocorrelation function, acf, for a stationary

43 4.3. Testing the Independence Assumption of the Noise 29 Figure 4.3: Normally distributed iid noise. time series is defined as: ρ X (h) = γ X(h) γ X (0) = Corr(X t+h, X t ) where γ X (h) = Cov(X t+h, X t ) is the covariance defined as: γ X (r, s) = Cov(X r, X s ) = E[(X r µ X (r))(x s µ X (s))] To test the independence of the noise sequence the sample acf is used as an estimate of the acf, since we do not have a model for the noise sequence but only observed data. The sample autocorrelation function is: ˆρ(h) = ˆγ(h) ˆγ(0), n < h < n where the sample autocovariance function is: ˆγ(h) = 1 n n h (x t+ h x)(x t x), n < h < n t=1 and the sample mean is calculated as: x = 1 n If the data is containting a trend the sample acf will exhibit slow decay when h increases and if there is a periodic component it will exhibit a similar n i=1 x i

44 30 Chapter 4. Time Series Analysis Figure 4.4: The sample acf of a time series with a trend and seasonal behaviour. Figure 4.5: The sample acf of an iid-sequence. behaviour with the same periodicity. [3] Figure 4.4 shows the sample acf for a sequence with trend and seasonal behaviour. For an iid sequence the autocorrelation is ρ X (0) = 1 and ρ X (h) = 0, h > 0, so if the sample acf is close to zero for all nonzero lags, the noise sequence can

45 4.3. Testing the Independence Assumption of the Noise 31 be regarded as iid-noise. See figure 4.5. For a formal test, one can use the fact that for large n the sample autocorrelations of an iid sequence with finite variance are approximately iid with distribution N(0, 1 n ). We can therefore check if the observed noise sequence is consistent with iid noise by examine the sample autocorrelations of the residuals. If the sequence is iid, 95% of the sample autocorrelations should fall between the bounds ±1.96/ n. [3] To check if the noise sequence is normally distributed a histogram or the Lilliefors Test can be used, see and

46 32 Chapter 4. Time Series Analysis

47 Chapter 5 Analysis of Counter Data In this chapter we will get to know the behaviour of the counter data. 5.1 Selection of Data to Work With When performing an analysis of counter data, a set of data that represents a usual behaviour of the system is preferred. Therefore data associated with a commonly used KPI-value and its underlying PI- and counter-values is used. The data is aggregated to BSC-level and one sample represents the latest hour. Figure 5.1: An example of counter data. Figure 5.1, figure 5.2 and figure 5.3 show some examples of counter data. In figure 5.2 we see that some data points are missing. Missing values are handled in section 5.4. Eriksson,

48 34 Chapter 5. Analysis of Counter Data Figure 5.2: An example of counter data with missing values. Figure 5.3: An example of counter data with an outlier.

49 5.2. The Daily Profile 35 In figure 5.3 we see a data point that lies far outside the other data points. This can be an indication that there is something wrong with that particular collection of data and that point needs to be removed for the analysis to be as accurate as possible. The so called outliers are handled in section 5.3. As seen in all the figures shown here, the data exhibits a seasonal behaviour. This seasonal behaviour is the variations over day and comes from the traffic pattern with peak hour or hours, called busy hour(s) since the traffic is high. 5.2 The Daily Profile The data exhibit a similar behaviour every 24 hours. This is the daily profile and it is estimated using the technique in section 4.2. Figures 5.4 and 5.5 show the estimation of the trend and daily profile for two counters and their remaining noise term. The daily profile can also be seen in the autocorrelation function. In figure 5.6 the sample acf, ρ X (h) exhibits a slow decay for larger values of h and the periodicity p = 24 is clearly seen. This is a typical behaviour for a time series with a periodic component, see section Outliers First the classical decomposition model of the data, see section 4.2, is estimated ignoring values that are not numbers. The variance of the data is estimated and then the deviations from the model are calculated. The data point where the deviation is largest is eliminated and the variance is estimated again. If the new variance is significantly smaller than the previous variance, the removed point is regarded as an outlier. If the variance is not significantly smaller it means that the point with the largest deviation from the model is still close to the other data points and it is put back since it was not regarded as an outlier. Figure 5.7 shows data with a removed outlier. This method is a heuristic, since the one-sided hypothesis test applied is used under violated assumptions. There is a strong correlation between the first and second sample. But when tested on the data studied, it seems that the data points that lie far outside the other data points are removed. After the outliers have been removed a new estimation of daily profile and trend has to be made, since the outliers could have affected the previous result. 5.4 Missing Values For the estimation of the daily profile, the time series has to have 24 samples a day. Therefore a sample missing has to be regarded as a missing value for the period to be correct. If there is a missing value at t = i, the value of s k where s k = s k+jp = s i is estimated using the average of the deviations: {(x k+jp ˆm k+jp ), q < k + jp n q} but with the addition of the criteria k + jp i, see section 4.2. The missing value x i is then replaced with the estimated value for that particular time, m i + s i. Missing days are not treated in this thesis work.

50 36 Chapter 5. Analysis of Counter Data Figure 5.4: An example of estimated trend and daily profile for counter data.

51 5.4. Missing Values 37 Figure 5.5: An example of estimated trend and daily profile for counter data.

52 38 Chapter 5. Analysis of Counter Data Figure 5.6: An example of the sample acf for counter data with a daily profile. Figure 5.7: Counter data with a removed outlier.

53 5.5. Testing Independence and Normality of the Noise Testing Independence and Normality of the Noise The noise term is what is left when the daily profile and the trend are eliminated from the data. In this report it is seen as the deviations around the mean value, when the data is averaged over a whole number of days. This is due to the fact that the seasonal components averaged over a whole period is zero, see section 4.2, and that the mean is approximately constant, see figure 5.4 and 5.5. Thus the variance of the noise term is used in the t-test for equal mean in the change detection method. To see if the noise term is iid, we use the autocorrelation function as seen in section 4.3. Figure 5.8: An example of the sample acf for the noise. For most of the counters there are still dependence in the noise term, since the autocorrelation has more than small deviations from zero for non-zero lags. In most cases the sample acf looks like something between figure 5.8 and figure 5.9. This looks reasonaly good if we compare with the original signal, an example shown in figure 5.6. For the normality assumption of the noise term, a histogram and the Lilliefors test is used, see section and section Most of the counters exhibit a behaviour close to that of a normally distributed population, but since the number of points is less than 200 it is not always easy to see in a histogram. The Lilliefors test with a 0.01 level of significance is used and then the hypothesis of normality cannot be rejected for most of the counters. See figure Sometimes the analysis of the whole week shows that the noise is not normally distributed. When changing the data to only weekdays the analysis looks different. Then the hypothesis of normality can not be rejected.

54 40 Chapter 5. Analysis of Counter Data Figure 5.9: An example of the sample acf for the noise. Figure 5.10: Testing the normality assumption of the noise.

55 Chapter 6 The Change Detection Method This chapter presents the change detection method and the theory behind it. In the next chapter we will see the results from applying the change detection to counter data. 6.1 The Change Detection Method and Its Limitations A shown in chapter 5, the counter data can be modelled as a stochastic time series with a daily profile. To detect changes in the data we cannot just compare if the values are the same, due to the stochastic behaviour. We also need to know which changes to detect. In this thesis work focus are on changes in the mean value and in the daily profile, since it is two factors that characterize the data. The change detection method uses about two weeks of data. The first week, or set of data, is a reference week or reference data. For this week we want to freeze as many factors as possible to be able to detect changes due to software changes and not to changes in the environment. The second week, or set of data, is the test week or test data. Different days of the week have different traffic patterns. Since only a limited amount of data is used, there will be too few data-points to estimate a model for each day, and therefore different days has to be treated as equal. The test data has to be from the same weekdays as the reference data to eliminate the risk that changes due to difference between weekdays will appear. No consideration is taken to long term trends and seasonal patterns over longer periods than 24 hours, and this can affect the result. Therefore the change detection can only answer the question if the test week is significantly changed from the reference week. Care must be taken when choosing the data for the reference week since it has to behave as a typical week if we want to say something about whether the test week behaves as a typical week and not just as the reference week. One thing to be done to get a more correct model, and at the same time use Eriksson,

56 42 Chapter 6. The Change Detection Method the amount of data provided, is to use only weekdays since there is a greater difference in traffic pattern between weekdays and weekends than there is between different weekdays. (Note that differences in this pattern can occur in different countries.) Since less data can be used, a trade-off must be made between using more data or a better model. One more limitation of the change detection method is that it can show different results when performing it on different number of days. Care must be taken to what you want to achieve by using the method. If the data changes in the end of the week, a change will occur according to the change detection method if you test against the whole week, but if you only choose the beginning of the week, the result will be different since the change was in the end of the week. Before the change detection takes place, missing values and outliers are handled, see section 5.4 and section 5.3. Then the daily profile is estimated, the noise variance is calculated and statistical hypothesis tests are performed to test whether the mean value and/or the daily profile of the data has changed Change Detection of Mean Value The change detection focuses on the mean value of the counter data. This is a change easy to understand and to calculate. As discussed before, comparison of the mean of the two data-sets cannot just be performed by comparing if the two values are equal. Due to the stochastic behaviour, the two means can differ but still be considered equal in a probabilistic sense. When performing the change detection of the mean value, the data is averaged over day and thus the daily profile is eliminated. Since the data is also averaged over the same number of days and the same weekdays before and after the software change, we judge that also variations due to differences between weekdays are eliminated reasonably well. The noise term describes the deviations from the mean under the same probability model. Therefore the variances of the noise terms is used in the hypothesis test. For the hypothesis test of equal mean we use a two-sided t-test, see section 3.4. Since we have to know if the variance is equal or not, a two-sided hypothesis test of equal variance has to be performed on the noise variance prior to the t-test. The F-test of equal variance is described in section 3.5. If the hypothesis of equal mean is rejected at the pre-defined level of significance, α, the test result is considered significant and a change has occurred in the test data. To tune the parameter α, data without a software change can be used. If a change is detected, the P-value can be used as a measure of how accurate the rejection of the hypothesis that there is no change is. If the P-value is less than α, the hypothesis of no change is rejected and then the closer to zero the P-value is, the more can we trust the detection of a change, see P-value section Change Detection of Daily Profile The change detection is also done on the daily profile. The daily profile looks different over the week and is affected by changes in behaviour of the mobile users, mobile subscription conditions, weekends, larger events in the area, etc. A

57 6.2. Normality Assumption 43 model of the daily profile is estimated using the reference week and the variance of the noise term is calculated. Then the test data after software update is used. The deviations from the daily profile of the reference week is calculated and de-trended since the trend change is not considered here. The remaining noise term should be of the same size as the reference noise if the model is well suited also for this data. If the variance of the noise term is significant greater than the variance of the reference week noise the model for the daily profile is not suited for this data and thus the daily profile has changed. Since it is very unlikely that the model is better fitted for the test week, it is only relevant to know if the variance is significantly greater than the variance of the reference week. Thus a one-sided hypothesis test is used. 6.2 Normality Assumption If the normality assumption of the counter data is violated, the result from the change detection of daily profile may or may not be correct since the test is based on the fact that the data is normally distributed. Therefore a warning occurs if the normality assumption is rejected at the 0.01-level of significance. The change detection of the mean is not so sensitive to the normality assumption, see section Level of Significance If the observed value of the test statistic is close to the rejection region, the result of the change detection method can change when the level of significance is changed. For smaller values of α, a rejection of the null hypothesis of no change is a stronger indication that a change has occurred. Common values of α are 0.05, 0.01 and 0.001, see section 3.3. This can also be seen in the P-value since a P-value around the level of significance can show which value of α that changes the outcome of the hypothesis test. A better scenario would be that of two reference weeks instead of one, where one of the reference weeks could be used for choosing an appropriate value of α for that particular environment. You could also change the value of α depending on the use of the method. If you would like all changes to be detected, a larger value of α is suitable, for example 0.05, and if you do not want any false alarm, a smaller value of α is suitable, for example

58 44 Chapter 6. The Change Detection Method

59 Chapter 7 Applying the Change Detection Method To Counter Data In this chapter we will see the results of the change detection method on counter data with known changes and on counter data without changes. 7.1 Selection of Data When performing change detection on counter data, it is preferable to use a set of data on which the change detection method can be analysed. Therefore a set of data with a feature change that affects a KPI called Throughput and its underlying PI:s and counters is used. A measure of user data volume is also used, since it could affect the other results. All data comes from EGPRS Downlink (DL) transfers, which means that it is data and not speech and that it is directed from the network to the mobile. To know if the change detection method is working correctly, also data without any known changes is used Data Volume The User Data Volume is the data that the end-user, the subscriber, pays for. The User Data Volume is not an end-user KPI, but in general if the performance of the network improves then the users will be able to transfer a greater volume of data within the same time. Thus, this measure is important to monitor, since an unexpected change in data volume may indicate performance problems. If the data volume is too low, this indicates that the traffic is very low and deviations found in other KPIs and PIs may be due to too scarce data rather than true system effects. The user data volumes used in this thesis work are high and should not affect the result in that way. Eriksson,

46 Chapter 7. Applying the Change Detection Method To Counter Data 7.1.

all data channels. It is thus a measure of end-user perceived performance, which is the definition of a KPI, see section 2.3.1. The Throughput is measured in [kbit/s]. Figure 7.

60 46 Chapter 7. Applying the Change Detection Method To Counter Data The Throughput KPI and Underlying PI:s and Counters The Throughput is a KPI that measures the total percieved transfer velocity of the data volume that is going from the network out to the user on all data channels. It is thus a measure of end-user perceived performance, which is the definition of a KPI, see section The Throughput is measured in [kbit/s]. Figure 7.1 is showing some underlying PI:s and counters associated with the Throughput. Figure 7.1: Relations between the KPI, PI:s and counters used. What the Simultaneous TBFs per E-PDCH shows is the number of users carried on each individual Packet Data CHannel (PDCH). It is a measure of the traffic load in the system. This is also the case for the PI Average number of simultaneous EGPRS DL TBFs in Cell. TBF stands for Temporary Block Flow and it is the logical identifier for the data to be sent to one Mobile Station (MS). Simultaneous TBFs per E-PDCH is dependent on two counters called DLTBFPEPDCH and DLEPDCH. The formula for calculation is: Simultaneous TBFs per E-PDCH = DLT BF P EP DCH DLEP DCH The Radio Link Bitrate per PDCH measures the performance of the radio interface between mobile and network per channel. If each PDCH is thought of as a radio pipe through which data can be transferred then the measured radio link bit rate represents the average size of each such radio pipe. There are many different influences on the quality of a radio link: signal strength; interference; time dispersion and more. The formula for the Radio Link Bit Rate (DL) per PDCH is: Radio Link Bit Rate = MC19DLACK 20 MC19DLSCHED [kbit/s]

61 7.2. Interpreting the Figures 47 The Average Maximum Number of Time Slots (TS) reservable by MSs is a measure of how much of the data channels that the MSs can reserve. The formula for the Average Maximum Number of TS reservable by MSs is: Average Max Nr of TS Reservable = MAXEGT SDL T RAF F 2ET BF SCAN The Multislot Utilization shows, in percentage, the amount of mobiles that get as many packet data channels as they can handle. Typically the Multislot Utilization is high at low traffic hours while it decreases at high traffic hours. 7.2 Interpreting the Figures In the figures the test data is plotted together with the reference data, with the corresponding mean values. The header tells us which KPI, PI or counter that is studied and the question Significant change of mean is answered using the change detection method at the level of significance given by α. The P-value is defined in section When plotting the daily profile, the daily profile of the reference data is plotted together with the test data. The question Significant change of daily profile is answered using the method described in section at the level of significance given by α. 7.3 Counter Data Without Known Changes The data is taken from the same BSC, the same software release and two weeks in a row. The KPI:s and counters are the ones described in section 7.1. Worth noticing is that there is no change in the User Data Volume, see figure 7.2. This could be an indication that the end-user behaviour is similar and should therefore not affect the result. Detection of changes in the KPI, in 4 of the 5 PI:s and in the 6 counters do not occur. Figure 7.3, figure 7.4 and figure 7.5 are showing some examples of the result from the change detection. The PI that is considered changed is shown in figure 7.6. The reason for the change is not known. As seen around sample 150, the change could be due to outliers, but when the 3 samples creating the outlying values are removed there is still a significant change of mean value. This is seen in figure 7.7. Another reason could be that there is something wrong with the underlying counters for a day around sample 100 or that something in the environment changed that day, that only had an impact on this PI. When performing an hypothesis test of equal mean, an hypothesis test of equal noise variance is first performed. On the 0.01-level of significance, the hypothesis of equal noise variance can not be rejected, except in one case. This is in the PI Multislot Utilization, discussed earlier and shown in figure Level of Significance Since the data used is unchanged, the method should not find any changes except in special cases. Therefore this data can be used for tuning the parameter α. There is no detection of a change for α = 0.01, both when using a whole week

62 48 Chapter 7. Applying the Change Detection Method To Counter Data Figure 7.2: No significant change in User Data Volume.

63 7.3. Counter Data Without Known Changes 49 Figure 7.3: No significant change on KPI-level.

64 50 Chapter 7. Applying the Change Detection Method To Counter Data Figure 7.4: No significant change on PI level.

65 7.3. Counter Data Without Known Changes 51 Figure 7.5: No significant change on counter level.

66 52 Chapter 7. Applying the Change Detection Method To Counter Data Figure 7.6: Change detected.

67 7.4. Counter Data With Known Changes 53 Figure 7.7: Change still detected when the possible outliers are removed. and when using only weekdays, except in the PI mentioned earlier. That one has a rather low P-value, even zero for the daily profile, see figure 7.6, and the rejection of the hypothesis of no change is certain. When using α = 0.05 on the other hand, there is a significant change in some of the counters when using only weekdays. Therefore α = 0.01 could be a suitable level of significance in this case, and this is going to be used when detecting changes in the same KPI, PI:s and counters of data with changed software. 7.4 Counter Data With Known Changes The data chosen is the KPI, PI:s and counters described in section 7.1, with a feature on as reference and the feature turned off as test data. The effect of the feature and the settings that are used, is that it gives fewer resources to the same number of users, in order to decrease the total resource consumption in the system. As a consequence, there will be impacts on the end user experienced Throughput due to an increased number of TBFs on each PDCH. The feature is saving resources and that has a price in perceived velocity of the data volume to the users, thus the Throughput decreases when the feature is activated. When the feature is turned off, the Throughput should increase. The change detection of the Throughput is seen in figure 7.8. Figure 7.9 is showing the change detection of the Number of Simultaneous TBFs on Each PDCH. The feature will affect this PI in such a way that it will increase when the feature is activated and thus decrease when the feature is turned off. This is due to the fact that system sets up fewer physical channels for the same amount of data. The change in the Number of Simultaneous TBFs on Each PDCH should also be seen in the underlying counters DLTBFPEPDCH

68 54 Chapter 7. Applying the Change Detection Method To Counter Data and DLEPDCH, where the ratio DLTBFPEPDCH/DLEPDCH should decrease when the feature is turned off, see section DLTBFPEPDCH and DLEPDCH are shown in figure 7.10 and Figure 7.8: Change detection of Throughput. Figure 7.9: Change detection of Simultaneous TBFs per PDCH.

69 7.4. Counter Data With Known Changes 55 Figure 7.10: Change detection of DLTBFPEPDCH. Figure 7.11: Change detection of DLEPDCH.

70 56 Chapter 7. Applying the Change Detection Method To Counter Data We should probably see a different behaviour of the Radio Link Bit Rate. Fewer channels are used and that may improve the radio environment by decreasing the interference. Hence, the velocity of the radio pipe, the Radio Link Bit Rate, may increase. We therefore expect the Radio Link Bit Rate to go down when the feature is turned off. This is seen in figure Since the Radio Link Bit Rate is proportional to the counter MC19DLACK and inverse proportional to MC19DLSCHED, we expect a drop of level in MC19DLACK or a higher value for MC19DLSCHED, see section and figure 7.13 and Figure 7.12: Change detection of Radio Link Bitrate. Furthermore, as the feature strives to increase the resource utilization by putting more TBFs on each individual PDCH, there will also be an impact on the number of channels allocated to a user. The amount of mobiles that get as many channels as requested is lower with the chosen feature settings. This is the PI Multislot Utilization and it is shown in figure The Average Maximum Number of TS reservable by MSs shown in figure 7.16 should increase when the feature is activated. As the Average Maximum Number of TS Reservable by MSs changes, MAXEGTSDL and TRAFF2ETBFSCAN has to change as well, see section Figures 7.17 and 7.18 are showning this. The change detection indicates that the User Data Volume is changed, as seen in figure This is a small change and should probably only affect the other counters slightly. Regarding the change detection of the daily profile, we do not know how that would be affected by the feature. Worth noticing is that we can see changes in the daily profile in 11 of 13 cases when the feature is turned from on to off. Figure 7.20, figure 7.21, and figure 7.22 show some examples of the change detection of daily profile. At the 0.01-level of significance there is no change in the result of the change

71 7.4. Counter Data With Known Changes 57 detection of mean when only weekdays instead of a whole week are used. As for the hypothesis test of equal noise variance at the same level of significance, see section 3.5, the result varies and there is no particular pattern to be seen. Figure 7.13: Change detection of MC19DLACK. Figure 7.14: Change detection of MC19DLSCHED.

72 58 Chapter 7. Applying the Change Detection Method To Counter Data Figure 7.15: Change detection of Multislot Utilization. Figure 7.16: Change detection of Average Maximum Number of TS reservable by MSs.

73 7.4. Counter Data With Known Changes 59 Figure 7.17: Change detection of MAXEGTSDL. Figure 7.18: Change detection of TRAFF2ETBFSCAN.

74 60 Chapter 7. Applying the Change Detection Method To Counter Data Figure 7.19: Change Detection of User Data Volume. Figure 7.20: Change Detection of Daily Profile.

75 7.4. Counter Data With Known Changes 61 Figure 7.21: Change Detection of Daily Profile. Figure 7.22: Change Detection of Daily Profile.

76 62 Chapter 7. Applying the Change Detection Method To Counter Data

77 Chapter 8 Discussion In this chapter the choice of method will be motivated and topics related to it will be discussed. The results will be analysed and thoughts on how to use the method will be presented. Future work on the subject will also be discussed. Finally the conclusions regarding this thesis work will be presented. 8.1 Why Time Series Analysis and Statistical Hypothesis Testing The problem formulation is if we can detect if the behaviour of a counter has changed after a software update. We know the time when the change did or did not occur and we have a data set before and one after. Therefore it seems straightforward to formulate a hypothesis about whether the mean of the dataset has significantly changed or not. Due to the stochastic behaviour of the data we can not just compare if the mean values are the same, the error term has to be taken into account. Thus the choice of hypothesis testing is clarified. For a hypothesis test to work, it is helpful if the data-set is an independent and identically distributed sample and hopefully from a normal distribution. But since the data-set is given in form of a time series with seasonal variations the independence assumption is not valid. Since it is so important something has to be done about it. Time series analysis is used. When the seasonal component and the trend are estimated and eliminated something can be said about the remaining noise term. Its variance gives us the magnitude of the variations of the data when the seasonal behaviour is averaged out. Therefore the noise term can be used to see if the independence assumption is justified. If the noise term can be assumed to be iid, we know that the data averaged over the period can be treated as iid. The second question is if the normality assumption can be used. Since we use the average value over 24 hours we can use the central limit theorem, section 3.1.2, with n=24. This is large enough to assume that the mean value over a period is drawn from a normal distribution and thus the statistical hypothesis tests can be performed. The normality of the noise terms is tested using the Lilliefors test, section 3.6.2, and the hypotheis of normality can not be rejected in most cases. Eriksson,

78 64 Chapter 8. Discussion 8.2 Problems With Hypothesis Tests One problem with hypothesis testing is that if H 0 is not rejected this does not necessarily mean that H 0 is true. To adress this problem different levels of significance can be used depending of what you want to see. If the value of α is higher, the risk of detecting a change where there is none is larger and if the value of α is lower, changes could be missed. This is up to the user to decide. 8.3 Analysis of the Result Are the necessary assumptions of the method met sufficiently well, and does the method work as expected? Assumptions It seems reasonable to assume that the iid-assumption could be used, since the dependencies in the noise is small for most of the counters, see section 5.5. In most cases the normality assumption cannot be rejected at the 0.01-level of significance. This does not necessarily mean that the hypothesis of normality is true, but we can assume that the normality assumption is a reasonable approximation in this case for the change detection to be good enough. Care should be taken though, when using the change detection of daily profile for non-normal data, therefore the method uses a warning Change Detection When the change detection is tested on data where the software is unchanged and on data with known software update, the results seem to be as expected in most cases. Thus the method seems to work reasonably well under given conditions. When using only weekdays instead of a whole week the change detection did not change results at the 0.01-level of significance, thus we can assume that both approaches could be used Weekdays Versus Week The analysis of the noise term sometimes shows that the noise is not normally distributed when we use a whole week, but that the hypothesis of normality can not be rejected when we use only weekdays, see section 5.5. This can be an indication that the weekends have a different behaviour then the weekdays and we have treated them as equal. When all days of the week are treated as equal, the model is not correct and the noise variance will be over estimated. When the noise variance is over estimated, the changes are less likely to be detected. If we instead only use weekdays, we get less data to work with and a less accurate estimation of the variance. A tradeoff must be made between using more data or a better model, as discussed in section 6.1. This scenario is strongly dependent on the magnitude of the change of user behaviour between weekdays and weekend.

79 8.4. How the Method Could Be Used 65 Instead of having to chose between a whole week and only weekdays, both can be used and the results could be compared. If the results differ, a closer analysis should be made. 8.4 How the Method Could Be Used To reduce the amount of data to analyse, the method could be used in such a way that it only gives an alarm when a change is detected. It could start with changes with a small P-value, to make sure that the changes that are more certain are seen. Then the level of significance could increase and more uncertain changes could be more closely analysed, since the method could detect false changes for larger values of α. 8.5 Future Work The change detection method has so far only been implemented in Matlab. Some of the functions are hand-written and some functions from Matlab s Statistical Toolbox are used. The data is collected from an Access Database and written to a text-file, which is later read into Matlab. This requires some work by hand that probably could be done by a script. A moving average filter is used for modelling the trend when the data is deseasonalized, see section 4.2. But when using the change detection of mean value, the average value is used. If the average value was used for the model of the trend as well, this could give a more correct estimation of the noise variance. The method could be improved, if the model allowed for different means and daily profiles for the weekdays and weekends, respectively. This would however require some more data for the estimation of weekends. The missing values should perhaps be treated as missing and not replaced with the value of the model. Since the noise is zero for those points, it could affect the estimation of the noise variance. This is on the other hand a very small problem, since the missing values are very few. The method should be modified to handle outliers in a more theoretically correct way. The method also has to be modified if used on data without the daily profile. If there is no seasonal behaviour at all, the data is only de-trended and the remaining noise term is analysed as in the first case. The change detection of mean value is also the same. Worth investigating is if the environmental factors can be eliminated in some way from the data, using more knowledge of what the data stands for. Then the method could detect changes due to the software update and not to environmental changes. 8.6 Conclusions The counter data is behaving as a stochastic time series with a daily profile and the change detection can be done by estimating the daily profile and the variance of the noise term and perform statistical hypothesis tests if the mean

80 66 Chapter 8. Discussion and/or the daily profile of the counter data before and after the software update can be considered equal. When the data is averaged over day, the daily profile is eliminated. Since the data is also averaged over the same number of days and the same days of the week before and after the software change, we judge that even changes due to differences of weekdays are eliminated reasonably well. Consideration of seasonal components over longer time periods such as monthly behaviour cannot be made using this method as it is presented in this report. Care must be taken when setting up the test so that the prerequisites are properly met. But since the same consideration must be taken when performing the analysis by hand, this should not be a particular problem for the method. The method is only a tool to ease the analysis and to reduce the amount of data to investigate closer. When the counter data has been analysed, it seems to be reasonable in most cases to assume that the noise terms are sufficiently mutually independent and normally distributed, which justifies the hypothesis tests. When the change detection is tested on data where the software is unchanged and on data with known software updates, the results are as expected in most cases. Thus the method seems to work under given conditions. The problem statement of the thesis work was if any mathematical method to detect changes in counter behaviour could be found. Since a method has been found and analysed, and we have seen that it seems to work for the test cases used, the answer to the problem statement is yes: a mathematical method for change detection in counter data could be found, using time series analysis and statistical hypothesis testing.

81 Bibliography [1] Gunnar Blom, Björn Holmquist, (1998), Statistikteori med tillämpningar, B, Studentlitteratur, Lund. [2] Gunnar Blom, Jan Enger, Gunnar Englund, Jan Grandell, Lars Holst, (2005), Sannolikhetsteori och statistikteori med tillämpningar, Studentlitteratur, Lund. [3] Peter J. Brockwell, Richard A. Davis, (2003), Introduction to Time Series and Forecasting, Second Edition, Springer, New York. [4] Peter J. Brockwell, Richard A. Davis, (2006), Time Series: Theory and Methods, Second Edition, Springer, New York. [5] James Gallagher, The F test for Comparing Two Normal Variances: Correct and Incorrect Calculation of the Two-Sided p-value?, Teaching Statistics, Vol. 28, No. 2, (2006). [6] Hubert W. Lilliefors, On the Kolmogorov-Smirnov Test for Normality with Mean and Variance Unknown, Journal of the American Statistical Association, Vol. 62, No. 318, (1967), pp [7] Bernard W. Lindgren, (1993), Statistical Theory, Fourth Edition, Chapman & Hall/CRC, USA. [8] Douglas C. Montgomery, (2013), Design and Analysis of Experiments, Eight Edition, Wiley, Singapore. [9] John K. Taylor, Cheryl Cihon, (2004), Statistical Techniques for Data Analysis, Second Edition, Chapman & Hall/CRC, USA. [10] Wikipedia. Histogram. Retrieved 16 May, 2013, from Eriksson,

82 68 Bibliography

Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years from the date of publication barring exceptional circumstances.

83 Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: Upphovsrätt Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida c 2013, Tilda Eriksson Eriksson,

STAT 248: EDA & Stationarity Handout 3

STAT 248: EDA & Stationarity Handout 3 GSI: Gido van de Ven September 17th, 2010 1 Introduction Today s section we will deal with the following topics: the mean function, the auto- and crosscovariance