Generalized Lambda Distribution and Estimation Parameters

Size: px
Start display at page:

Download "Generalized Lambda Distribution and Estimation Parameters"

Transcription

1 The Islamic University of Gaza Deanery of Higher Studies Faculty of Science Department of Mathematics Generalized Lambda Distribution and Estimation Parameters Presented by A-NASSER L. ALJAZAR Supervised by Professor: Mohammed S. Elatrash Assistant Professor:Mahmoud k. Okasha Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science Islamic University of Gaza June 2005

2 Dedicated To my son

3 Acknowledgement All my special thank after thanking God, my mother, to Department of Mathematics in Islamic University of Gaza, special thanks to Professor: Mohammed Eltrash Assistant Professor: Mahmoud Okasha for many interesting suggestions and remarks on my thesis. Words can never express how I am grateful to my family and my wife for their endless love and support in good and bad times. I wish to thank all my friends for their Cooperations. Finally, I would like to thank everyone who supported me

4 Abstract Generalized Lambda Distribution and Estimation Parameters. Generalized lambda distribution (GLD) is a very useful mean to testing and fitting of data to well known distributions. Since the GLD is defined by its quantile function, it can provide a simple and effective algorithm for generating random variates. In this thesis we study the defection of GLD and plotting the function. The purpose of this thesis is to estimate the four parameters of the GLD by using three methods, the method of moments, the method of least squares and percentiles method. Numerical examples were used to estimate the parameters of the GLD. Finally we applied one method on real data.

5 Table of Contents Table of contents v Introduction 1 1. The Generalized lambda Family of Distributions History and Background Definition of the Generalized lambda distributions The parameter space of the GLD Shape characteristics of the FMKL Parameterization The Moments of the GLD Karian and dudewicz (2000) approach: The moments of the FMKL parameterizations of the GLD Approach: Estimating the parameters of the generalized lambda distribution Introduction Fitting the GLD by the Use of Tables Example Estimating the parameters of the generalized lambda distribution: the least squares method Example Estimating the parameters of the generalized lambda distribution The Use of percentiles Estimation of GLD parameters through a method of percentiles Example...53

6 5. GLD approximations to some well known distributions GLD approximations to some well known distributions by using a method of moment The normal distribution The uniform distribution The exponential distribution GLD approximations of some well known distribution by using a method on percentiles The normal distribution The uniform distribution The exponential distribution Application 67 Appendices 69 References 82

7 Introduction Fitting a probability distribution to data is an important task in any statistical data analysis. The data to be modeled may consist of observed events, such as a financial time series, or it may comprise simulation results. When fitting data, one typically first selects a general class, or family, of distributions and then finds values for the distributional parameters that best match the observed data. Rachev and Mittnik (2000),demonstrated that the usual approach to distribution fitting is to fit as many distributions as possible and use goodness-of-fit tests to determine the best fit. This method, the empirical method, is subjective and is not always conclusive. However, except in the case of data, there is no single accepted rule for selecting one distribution over another. The Generalized Lambda Distribution (GLD), originally proposed by Ramberg and Schmeiser (1974) at this time which is called RS distribution, is a fourparameter generalization of Tukey s Lambda family (Hastings et al. 1947) that has proved useful in a number of different applications. Since it can assume a wide variety of shapes, the GLD offers risk managers great flexibility in modelling a broad range of financial data. Due to its versatility, however, obtaining appropriate parameters for the GLD can be a challenging problem. An excellent synopsis of the GLD, its applications and parameter estimation methods appear in Karian and Dudewicz (2000). The initial, and still the most popular, approach for estimating the GLD parameters is based on matching the first four moments of the empirical data. This is undoubtedly due in part to the availability of published tables that provide parameter values for given levels of skewness and kurtosis (see, e.g., Ramberg et al. (1979); Karian and 1

8 2 Dudewicz (2000)). However, different parameter values can give rise to the same moments and so, while the tabulated parameters may match or closely approximate the first four moments, they may in fact fail to adequately represent the actual distribution of the data. Thus, as is well noted in the literature, a goodness-of-fit test should be performed to establish the validity of the results. If this test fails, or if the levels of skewness and kurtosis are outside of the tabulated values, it is necessary to use numerical procedures to find suitable parameters. Such procedures, which typically involve the downhill simplex method (Nelder and Mead 1965) or some variant thereof, require as input an initial estimate of the parameters. If multiple local optima exist, the solution returned is contingent on this estimate. Thus, several attempts may be required before obtaining parameter values that are acceptable from a goodness-of-fit perspective. Unlike previous approaches, King and MacGillivray (1999) assess the quality of the GLD directly by performing goodness-of-fit tests for specified combinations of parameter values Instead of matching moments, Ozturk and Dale (1985) minimize the total squared differences between the data and the expected values of order statistics implied by the GLD. The NelderMead downhill simplex algorithm is used to find the optimal parameters. The method, which is called least squares method, successfully fits a set of data for which tabulated moment-matching values are unavailable. As with moment matching, the resulting distribution must be assessed using a goodness-of-fit test, and several trials may be required before finding an acceptable solution. Instead of matching moments or least squares methods. Karian and Dudewicz (2000) develops a method for fitting a GLD distribution to data that is based on percentiles rather than moments. This approach makes a larger portion of the GLD family accessible for data fitting and eases some of the computational difficulties encountered in the method of moments. Examples of the use of the proposed system are included. The generalized lambda distribution (GLD) is very useful in fitting data and approximating many well known distributions. Since the GLD is defined by its a quintiles

9 3 function, it can provide a simple and effective algorithm for generating random variates. The purpose of this dissertation is to estimate the four parameters of the GLD by using three methods; the method of moments, the method of least squares and the method of percentiles. We use the moment-matching,least squares or the method of percentiles to obtain a candidate set of parameters. This thesis will proceed as follows: we start with introduction to the history and mathematical background of the GLD. This is followed by a discussion of how to estimate the unknown parameter with the method of moments, least squares method, and percentiles method. In particular we shall be looking at the application of the GLD and approximating some well known probability distributions as well as the quality of fit and solves some Examples.

10 Chapter 1 The Generalized lambda Family of Distributions Much of modern human endeavor, in wide-ranging fields that include science, technology, medicine, engineering, management, and virtually all of the areas that comprise human knowledge involves the construction of statistical models to describe the fundamental variables in those areas. The most basic and widely used model, called the probability distribution, relates the values of the fundamental variables to their probability of occurrence. The problem of this thesis is how to model (or, how to fit) a continuous probability distribution to data. The area of fitting distributions to data has seen explosive growth in recent years. Consequently, few individuals are well versed in the new results that have become available. In many cases these recent developments have solved old problems with the fitting process. The Generalized Lambda Distribution (GLD) is a four-parameter generalization originally proposed by Ramberg and Schmeiser (1974) of the one-parameter TukeyLambda distribution introduced by Hastings et al in Since then the flexibility of the GLD in assuming a wide variety of shapes has seen it being used extensively to fit and 4

11 5 model a wide range of differing phenomena to continuous probability distributions, from applications in meteorology and modeling financial data, to Monte Carlo simulation studies. The GLD is defined by an inverse distribution function or percentile (quantile) function. This is the function Q(u) where u takes values between 0 and 1, which gives us the value of x such that F(x) = u, where F(x) is the cumulative distribution function (c.d.f). From this it is easy to derive the probability density function (p.d.f) for the GLD using differentiation by parts, however the cumulative distribution function needs to be calculated numerically. The most popular method for estimating the GLD parameters is to match the first four moments of the empirical data to that of the GLD. The popularity of this method is partly due to the availability of extensive tables that provide parameter values for given values of skewness and kurtosis see Ramberg (1979) and Karian and Dudewicz (2000). In our case we will use the tables to find the parameter values and will be calculating them directly. From values of given values of skewness and kurtosis see Ramberg (1979) and Karian and Dudewicz (2000). 1.1 History and Background The search for a method of fitting continuous probability distributions to data is quite old. Pearson (1895) gave a four-parameter system of probability density function and fitted the parameters by what he called the method of moments (Pearson (1894))

12 6 Tukey s Lambda family of distributions is defined by the quantile function Q(u) origins in the one-parameter lambda distribution proposed by John Tukey (1960) u λ (1 u) λ, λ 0 λ Q(u) = where 1 u 0. log(u), λ = 0, u 1 1 u Tukey s lambda distribution was generalized, for the purpose of generating random varieties for Monte Carlo simulation studies, to the four-parameter generalized lambda distribution, or GLD, by Ramberg and Schmeiser ( ) subsequently, and Mykytka (1979). Since the early 1970s the GLD has been applied to fitting phenomena in many fields of endeavor with continuous probability density functions. In an early application of the GLD (at the time called the RS (for Ramberg- Schmeiser) distribution), Ricer (1980) dealt with construction industry data. His concern was to correct for the deviations from normality, which occur in construction data, especially in expectancy pricing in a competitive bidding environment, finding such quantities as the optimum markup. In another important application area, meteorology, it is recognized that many variables have used empirical distributions as an alternative, fitting of solar radiation data with the GLD was successful due to the flexibility and generality of the GLD, which could successfully be used to fit the wide variety of curve shapes observed. In many applications, this means that we can use the GLD to describe data with a single functional form by specifying its four parameter values for each case, instead of giving the basic data (which is what the empirical distribution essentially does) for each case, Karian and Dudewicz (2000). Before defining the Generalized Lambda Distribution (GLD) family we review some basic notions from statistics.

13 7 Definition [2] Probability space A probability space is triplet (Ω, I, P [. ]) where Ω is sample space,i is a collection of events( each subset of Ω),P[. ]is a probability function with domain I Definition [2] Random Variable r.v For a given probability space (Ω, I, P [. ]), a random variable, denoted by X or X(.), is a function with domain Ω. The function X(.) must be such that the set A r, defined by A r = {ω : X(ω) r} belong to I for every real number r. Definition [2] Discrete and continuous random variable: If X can take on only a few discrete values (such as 0 or 1 for failure or success, or 0,1,2,3,...as the number of occurrences of some event of interest), then X is called a discrete random variable. If the outcome of interest X can take on values in a continuous range (such as all values greater than zero and less than one), then X is called a continuous random variable. Definition [ 2] Cumulative distribution function : One way of specifying the chances of occurrence of the various values that are possible. This is, called cumulative distribution function (c.d.f.) of random variable X, denoted by F X (. ),is defined to be that function with domain the real line which satisfies F X (x) = p (X < x) < x < Definition [2]Probability density function: A second way of specifying the chances of occurrence of the various values of X is to

14 8 give what is called the probability density function (p.d.f.) of x. This is a function f X (x) that is f X (x) > 0 for all x, integrates to 1 over the range < x <, and such that for all x, F X (x) = x f X (t)dt Definition [2] Quantiles function: A third way of specifying the chances of occurrence of the various values of X is to give what is called the inverse distribution function, or quantiles function (p.f.), of X. This is the function Q X (y) which, for each y between 0 and 1, tells us the value of x such that F X (x) = y : Q X (y) = T he value of x such that F X (x) = y, 0 < y < 1 We see that there are three ways to specify the chances of occurrence of a r.v. We give the c.d.f., p.d.f., and p.f. for a r.v : with the general normal distribution with mean µ, and variance σ 2, N(µ, σ 2 ). The p.d.f. is f(x) = 1 exp (x µ)2 2σ 2 2πσ and the percentile function (p.f.) can be obtained as follows: F X (x) = p (X x) = y iff p ( X µ σ x µ σ ) = y = p (Z x µ σ ) iff Q Z (y) = x µ σ iff x = µ + σ Q Z (y)

15 9 therefor Q X (y) = x T he value of x such that F X (x) = y = µ + σq Z (y) We should note that, in addition to Q X (y), there are several notations in common use for the p.f. one usually finds the notation F 1 X (x). 1.2 Definition of the Generalized Lambda Distributions Definition [7] The generalized lambda distribution family GLD with parameters λ 1, λ 2, λ 3, λ 4, GLD(λ 1, λ 2, λ 3, λ 4 ), is most easily specified in terms of its percentile function Q(y) = λ 1 + yλ 3 (1 y) λ 4 λ 2 (1.2.1) Where 0 < y < 1. The parameters λ 1 and λ 2 are, respectively, location and scale parameters, λ 3 and λ 4 determine the skewness and kurtosis of the GLD(λ 1, λ 2, λ 3, λ 4 ). Note that Definition [2]Parameter A parameter is a value, usually unknown (and therefore has to be estimated), used to represent a certain population characteristic. For example, the population mean µ is a parameter that is often used to indicate the average value of a quantity

16 10 Definition [2] Location Parameter: let f(x) be any p.d.f.then the family of p.d.f f(x µ) indexed by the parameter µ, < µ < is called location family and µ is called Location Parameter. Measures of location give information about the location of the central tendency within a group of numbers. Definition [2]Skewness Not all distributions are bell shaped (or normal). In the normal distribution, there are just as many observations to the right of the mean as there are to the left. The median and mean are also equal. When this is not the case, we say the distribution is skewed or asymmetrical. If the tail is drawn out to the left, then the curve is left skewed If the tail is drawn out to the right, then the curve is right skewed. Definition [2]Kurtosis Another type of departure from normality is the kurtosis, or peakedness of the distribution. A leptokurtic curve has more values near the mean and at the tails, with fewer observations at the intermediate regions relative to the normal distribution. A platykurtic curve has fewer values at the mean and at the tails than the normal curve, but more values in the intermediate regions. A bimodal ( double-peaked ) distribution is an extreme example of a platykurtic distribution

17 11 The properties of the GLD distribution are studied in detail in Ramberg (1979) at this time which called Ramberg -Schmeiser RS distribution ). In addition to elaborating the richness of the fourth parameter of GLD to fit a wide variety of frequency distributions, Ramberg (1979). A good summary of the shape GLD distribution is well defined appears in King and MacGillivray (1999). Recall that for the normal distribution there are also restrictions on (µ, σ 2 ), namely, σ > 0. The restrictions on λ 1, λ 2, λ 3, λ 4 that yield a valid, GLD distribution will be discussed. It is relatively easy to find the probability density function from the percentile function of the GLD, as we now show. Theorem [3]: For the GLD(λ 1, λ 2, λ 3, λ 4 ), the probability density function is where 0 y 1 f(x) = λ 2 λ 3 y λ λ 4 (1 y) λ 4 1, at x = Q(y) (1.2.2) Proof. : Using the relationships: x = Q(y) and F (x) = y and differentiating with respect to x, we get: dy dx = f(x) or f(q(y)) = dy d(q(y)) = 1 d(q(y)) dy (1.2.3) so by differentiating (1.2.1) and outing it into (1.2.4) we find f(x). dq(y) dy = d dy (λ 1 + yλ3 (1 y) λ4 λ 2 ) = λ 3y λ3 1 + λ 4 (1 y) λ4 1 λ 2 (1.2.4)

18 12 f(x) = λ 2 λ 3 y λ λ 4 (1 y) λ 4 1, at x = Q(y). In plotting the function f(x) for a density such as the normal, where f(x)is given as a specific function of x, we calculate f(x) at x values, then plotting the pairs (x,f(x)) and connecting them with a smooth curve. For the GLD family, plotting f(x) proceeds differently since (1.2.3) tells us the value of f(x) at x = Q(y). Thus, we take a grid of y values (such as.01,.02,.03,,.99, that give us the ( points), find x at each of those points from(1.2.1), and find f(x) at that x from (1.2.3). Then, we plot the pairs (x,f(x)) and link them with a smooth curve. Example plot f(x) for a GLD, consider the GLD(λ 1, λ 2, λ 3, λ 4 ) with parameters λ 1 = , λ 2 = , λ 3 = , λ 4 = Proof. : The GLD (λ 1, λ 2, λ 3, λ 4 ) Q(y) = y (1 y) (1.2.5) we find that at y = 0.25 the Q(0.25)=0.0280, from (1.2.6). We have, at x = 0.028, using (1.2.3) with the specified values of λ 1, λ 2, λ 3, λ 4, f (0.028) = Hence, (0.028, 43.04) will be one of the points on the graph of f (x). Proceeding in this way for y = 0.01, 0.02,, 0.99, we obtain the graph of f(x) given in Figure (1.1)

19 13 Figure 1.1: The p.d.f. Of GLD (0.0305, , , ). The generalized lambda distribution, also known as the asymmetric lambda, ( or Turkey lambda distribution) or(ramberg -Schmeiser RS distribution ), is a distribution with a wide range of shapes. The distribution is defined by its quantile function, the inverse of the distribution function. 1.3 The parameter space of the GLD We noted, following formula (1.2.1), that GLD does not always specifies vialed distribution. The reason is that one cannot just write down any formula and be assured it will specify a distribution without checking the conditions needed for that fact to hold.

20 14 Theorem : The GLD(λ 1, λ 2, λ 3, λ 4 ) specifies a valid distribution if and only if for all y [0, 1] λ 2 λ 3 y λ λ 4 (1 y) λ (1.3.1) Proof. : In particular, a function f (x) is a probability density function if and only if it satisfies the conditions f(x) 0 and f(x)dx = 1 (1.3.2) From (1.2.4) we see that for the GLD(λ 1, λ 2, λ 3, λ 4 ), conditions (1.3.1) are satisfied if and only if λ 2 λ 3 y λ λ 4 (1 y) λ and f(q(y)) dq(y) = 1 (1.3.3) Since from (1.2.4) we know that f(q(y))dq(y) = dy and y is in the range [0,1], the second condition in (1.3.2) follows. Thus, for any λ 1, λ 2, λ 3 and λ 4 the function f(x) will integrate to 1. It remains to show that the first condition in (1.3.2) holds. The next theorem establishes the role of λ 1 as a location parameter.

21 15 Theorem : If the random variable X is GLD(0, λ 2, λ 3, λ 4 ) then the random variable X + λ 1 is GLD(λ 1, λ 2, λ 3, λ 4 ). Proof. : Suppose that X is GLD(0, λ 2, λ 3, λ 4 ).Then by (1.2.1) Q X (y) = yλ 3 (1 y) λ 4 λ 2 Since F X+λ1 (x) = p [X + λ 1 x] = p [X x λ 1 ] = F X (x λ 1 ) (1.3.4) then F X (x λ 1 ) = y also implies F X+λ1 (x) = y, yielding whence This prove that X + λ 1 is x λ 1 = Q X (y) = yλ 3 (1 y) λ 4 λ 2, x = Q X+λ1 (y) (1.3.5) Q X+λ1 (y) = x = λ 1 + Q X (y) = λ 1 + yλ 3 (1 y) λ 4 GLD(λ 1, λ 2, λ 3, λ 4 ) random variable. λ 2 (1.3.6) In addition to this Ramberg et al (1979) noted that there are certain combinations of λ 3 and λ 4 for which the distribution given by (1.2.1) is not a valid probability distribution. This undefined region is 1 + λ 2 3 < λ 4 < 1.8(λ ) see Karian and Dudewicz (2000). for an in depth study. The regions 1, 2, 3, 4, 5, 6 in Fig (1.2) are the ones for which the distribution is valid is as follows: to determine the (λ 3, λ 4 ) pairs that lead to a valid GLD, we

22 16 Figure 1.2: Regions 1, 2, 3, 4, 5, 6 for which the GLD parameterization is valid. consider (λ 3, λ 4 )-space in the following regions:by using a method on percentiles R 1 = [(λ 3, λ 4 ) λ 3 1, λ 4 1], R 2 = [(λ 3, λ 4 ) λ 3 1, λ 4 1] R 3 = [(λ 3, λ 4 ) λ 3 0, λ 4 0], R 4 = [(λ 3, λ 4 ) λ 3 0, λ 4 0] R 5 = [(λ 3, λ 4 ) λ 3 < 0, 0 < λ 4 < 1], R 6 = [(λ 3, λ 4 ) 0 < λ 3 < 1, λ 4 < 0] R 7 = [(λ 3, λ 4 ) 1 < λ 3 < 0, λ 4 > 1], R 8 = [(λ 3, λ 4 ) λ 3 > 1, 1 < λ 4 < 0] The GLD(λ 1, λ 2, λ 3, λ 4 ) is valid in Region 1,2,3 and 4, The GLD(λ 1, λ 2, λ 3, λ 4 ) is not valid in Region 5 and 6. The situation is quite the same in Region 7,8. A point in Region 7, is valid if and only if (1 λ 3 ) (λ 4 λ 3 ) (λ 4 1) λ 4 1 < λ 3 λ 4 see Karian and Dudewicz (2000). for an in depth study.

23 Shape Characteristics of the FMKL Parameterization Freimer.(1988) devise a different parameterizations for the GLD, denoted FMKL, which is given by Q(y) = λ λ 2 [ yλ 3 1 λ 3 (1 y)λ4 λ 4 ] where 0 y 1 (1.4.1) which is well defined over the entire (λ 3, λ 4 )-plane. The variety of shapes offered by this distribution classify the density shapes we need to know the role which each of the parameters play within the GLD.From the defection FMKL we get λ 1 is the location parameter. λ 2 determines the scale. λ 3, λ 4 determine the shape characteristics. For asymmetric distribution λ 3 = λ 4 Freimer (1988) classify the shapes returned by (1.4.1) as follows: Class I (λ 3 < 1, λ 4 < 1) : Unimodal densities with continuous tails. This class can be subdivided with respect to the finite or infinite slopes of the densities at the end points. Class Ia (λ 3, λ 4 < 1), Class Ib ( 1 < λ < 1, λ 4 1), and Class Ic ( 1 < λ < 1, 1 < λ 2 4 < 1). Class II (λ 3 > 1, λ 4 < 1): Monotone p.d.f.s similar to those of the exponential or χ 2 distributions. The left tail is truncated. Class III

24 18 (1 < λ 3 < 2, 1 < λ 4 < 2): U-shaped densities with both tails truncated. Class IV (λ 3 > 2, 1 < λ 4 < 2): Rarely occurring S-shaped p.d.fs with one mode and one antimode. Both tails are truncated. Class V (λ 3 > 2, λ 4 > 2): Unimodal p.d.fs with both tails truncated. Figures 1.3 to 1.10 show examples of each class of shapes. see SusanneW.M.Au Y enug for an in depth study. Figure 1.3: Class Ia p.d.fs including the normal distribution Figure 1.4: Class Ib p.d.fs

25 19 Figure 1.5: Class Ic p.d.fs. Figure 1.6: Class II p.d.fs includes the exponential distribution. Figure 1.7: Class III U-shaped p.d.fs.

26 20 Figure 1.8: Class IV S-shaped p.d.fs. Figure 1.9: Class V p.d.fs.

27 Chapter 2 The Moments of the GLD Moment come in two formes.they are either raw moment or central moments. Definition [2] The K th raw moment of a probability density function f(x) of the random variable Xis defined by : E(X k ) = X k f(x)dx where k 1.In particular the 1 st moment E(X) = µ is the mean. Definition [2] The K th central moment is defined by E(X µ) = (X µ) k f(x)dx wherek > Karian and Dudewicz (2000) approach : The moments of the GLD(λ 1, λ 2, λ 3, λ 4 ), parameterizations of the GLD can be derived as follows : We start by setting λ 1 = 0 to simplify this task; next, we obtain the non-central moments of the GLD(λ 1, λ 2, λ 3, λ 4 ); and finally, we derive the central GLD(λ 1, λ 2, λ 3, λ 4 ) moments. Theorem If X is a GLD(λ 1, λ 2, λ 3, λ 4 ) random variable then Z = X λ 1 is GLD(0, λ 2, λ 3, λ 4 ). 21

28 22 Proof. : Since X is GLD(λ 1, λ 2, λ 3, λ 4 ) Q X (y) = λ 1 + yλ 3 (1 y) λ 4 λ 2 and F X λ1 (x) = p (X λ 1 x) = p (X x + λ 1 ) = F X (x + λ 1 ) (2.1.1) If we set F X (x + λ 1 ) = y, we obtain x + λ 1 = Q X (y) = λ 1 + yλ 3 (1 y) λ 4 (2.1.2) λ 2 From (2. 2.1) we also have F X λ1 (x) = y which with (2.2.2) yields Q X λ1 (y) = x = yλ 3 (1 y) λ 4 Proving that X λ 1 is GLD(0, λ 2, λ 3, λ 4 ). λ 2 Having established λ 1 as a location parameter, we now determine the non-central moment (when they exist) of the GLD(λ 1, λ 2, λ 3, λ 4 ) Theorem If Z is GLD(0, λ 2, λ 3, λ 4 ); then E(Z k ), the expected value of Z K, is given by E(Z k ) = 1 λ k 2 k [( k i )( 1) i β(λ 3 (k i) + 1, λ 4 i + 1) (2.1.3) i=0 where β(a, b) is the beta function defined by : β(a, b) = 1 0 x a 1 (1 x) b 1 dx (2.1.4) Proof. where E(Z k ) = Z k f(z)dz = 1 0 (Q(y)) k dy f(z) = dy, Q(y) = z (2.1.5) dz

29 23 = 1 By the binomial theorem, 0 ( yλ 3 (1 y) λ 4 λ 2 ) k dy = 1 λ k (y λ 3 (1 y) λ 4 ) k dy (y λ 3 (1 y) λ 4 ) k = k [( k i )(y λ 3 ) (k i) ( (1 y) λ 4 ) i ) (2.1.6) i=0 using (2.2.6) in the last expression of (2.2.5), we get E(Z K ) = 1 k 1 [( k λ k i )( 1) i (y λ 3 ) (k i) (1 y) λ4i dy] 2 0 i=1 = 1 k [( k λ k i )( 1) i β(λ 3 (k i) + 1, λ 4 i + 1)] 2 i=1 Completing the proof of the theorem. Before continuing with our investigation of the GLD(λ 1, λ 2, λ 3, λ 4 ) moments,we note the beta function that will be useful in our subsequent work where β(a, b) is the beta function defined before. The integral in (2.1.4) that defines the beta function will converge if and only if a and b are positive (this can be verified by choosing c from the (0, 1) interval and considering the integral over the subintervals (0, c) and (c, 1)). Corollary The k th GLD(λ 1, λ 2, λ 3, λ 4 ) moment exists if and only if min (λ 3, λ 4 ) > 1/k. Proof.. From Theorem (2.1.1), E(X k ) will exist if and only if exists, which, by Theorem (2.1.2), will exist if and only if E(Z k ) = E((X µ) K ) exists, which, by Theorem (2.1.2), will exist if and only if

30 24 λ 3 (k i) + 1 > 0 and λ 4 i + 1 > 0, for i = 0, 1,..., k. T hen λ 3 k λ 3 i > 1 and λ 3 k > 1 + λ 3 i Since λ 3 i is positive then λ 3 k > 1, and λ 3 > 1/k Also we have λ 4 i + 1 > 0 and λ 4 i > 1 for i = 0, 1,., k if i = k then λ 4 > 1/k This condition will done if λ 3 > 1/k and λ 4 > 1/k. Since, ultimately, we are going to be interested in the first four moments of the GLD(λ 1, λ 2, λ 3, λ 4 ), we will need to impose the condition λ 3 > 1/4 and λ 4 > 1/4 throughout the remainder of this chapter. The next theorem gives an explicit formulation of the first four centralized GLD(λ 1, λ 2, λ 3, λ 4 ) moments. Theorem If X is GLD(λ 1, λ 2, λ 3, λ 4 ) with λ 3 > 1/4 and λ 4 > 1/4, then its first four moments, α 1, α 2, α 3, α 4 (mean, variance, skewness, and kurtosis, respectively), are given by α 1 = µ = E(X) = λ 1 + A λ 2 (2.1.7) α 2 = σ 2 = E[(X µ) 2 ] = B A λ 2 2 (2.1.8) α 3 = E[(X E(X))3 ] σ 3 = C 3AB + 2A3 λ 4 2σ 3 (2.1.9)

31 25 where α 4 = E[(X E(X))4 ] σ 4 = D 4AC + 6A2 B 3A 4 λ 4 2σ 4 (2.1.10) A = λ λ 4 (2.1.11) 1 1 B = + 2β(1 + λ 3, 1 + λ 4 ) 1 + 2λ λ 4 (2.1.12) 1 1 C = 3β(1 + 2λ 3, 1 + λ 4 ) + 3β(1 + λ 3, 1 + 2λ 4 ) 1 + 3λ λ 4 (2.1.13) 1 1 D = + 4β(1 + 3λ 3, 1 + λ 4 ) + 6β(1 + 2λ 3, 1 + 2λ 4 ) 1 + 4λ λ 4 and β denotes the Beta function. 4β(1 + λ 3, 1 + 3λ 4 ) (2.1.14) Proof. : Let Z be a GLD(0, λ 2, λ 3, λ 4 ) random variable By (theorem 2.1.1). E(X k ) = E((Z + λ 1 ) k ) we first express E(Z i ), for i = 1,2,3, and 4, in terms of A, B, C and D, to do this for E(Z), we use ( theorem 2.1.2) to obtain. E(Z) = 1 λ 2 β(λ 3 + 1, 1) β(1, λ 4 + 1) and since β(λ + 1, 1) = β(1, λ + 1) Since β(λ 3 + 1, 1) = 1 λ 3 +1 and β(1, λ 4 + 1) = 1 λ we get E(Z) = 1 1 ( λ 2 λ λ ) = A (2.1.15) λ 2 for E(Z 2 ) we again use (theorem 2.1.2) and the simplification allowed by β(λ+1, 1) = β(1, λ + 1) = 1 to get. 1+λ E(Z 2 ) = 1 [β(λ λ , 1) β(λ 3 + 1, λ 4 + 1) + β(1, 2λ 4 + 1))] 2

32 26 = 1 1 ( λ 2 2 2λ λ β(λ 3 + 1, λ 4 + 1)) = B λ 2 2 (2.1.16) similar arguments,with somewhat more complicated algebraic manipulations, E(Z 3 ) and E(Z 4 ) produce. E(Z 3 ) = C λ 3 2 (2.1.17) E(Z 4 ) = D λ 4 2 (2.1.18) we now use (2.1.15) to derive (2.1.7): α 1 = E(X) = E(Z + λ 1 ) = λ 1 + E(Z) = λ 1 + A λ 2 (2.1.19) next we consider (2.1.8): Since E(X µ) 2 = E(X 2 ) (E(X)) 2 α 2 = E(X 2 ) α 2 1 = E((Z+λ 1 ) 2 α 2 1 = E(Z 2 )+2λ 1 E(Z)+λ 2 1 α 2 1 (2.1.20) Substituting A/λ 2 for E(Z) and λ 1 + A2 λ 2 2 = B A λ 2 2 for in (2.2.20) and using (2.2.16), we get. The derivations of (2.2.9) and (2.2.10) are similar but algebraically more involved. see Karian and Dudewicz (2000). for an in depth study.

33 The moments of the FMKL parameterizations of the GLD Approach : The moments of the FMKL parameterizations of the GLD can be derived as follows We can rewrite equation (1.4.1) as Q(u) = [λ 1 1 λ 2 λ λ 2 λ λ 2 ( uλ3 λ 3 = b + a ˆQ(u) (1 u)λ4 λ 4 ] (2.2.1) where ˆQ(u) = uλ 3 λ 3 (1 u)λ4 λ 4 Now, If X represents the random variable with percentile function Q(u) given in ( 1.4.1) and Y represents the random variable with percentile function ˆQ(u) given by ( 2.2.1), then we have E(x) = (ae(y) + b) E(x k) k = (a k E(y) + b) k Since x = Q(u), µ = E(X),σ 2 = E(Xµ) 2 We now Letting v k = E(Y ) k Hence for the random variable X with the percentile function Q(u). The k th raw moment is defined by E(X k ) = 1 0 Q(u) k du (2.2.2) Letting v K = E(Y ) K then from (2.2.2) we find that we need to calculate v k = 1 0 ( uλ 3 λ 3 (1 u)λ4 λ 4 ) k du (2.2.3)

34 28 Expanding the integrand on the right-hand side of Equation (2.2.3) using binomial expansion gives v k = v k = 1 0 k j=0 k ( 1) j j=0 ( 1) j λ (k j) 3 λ j 4 ( k j ) uλ 3(k j) λ (k j) 3 j (1 u)λ4 λ j 4 ( k j ) β [λ 3 (k j) + 1, λ 4 (j + 1)] du (2.2.4) where β is the beta function defined as β(a, b) = 1 0 x a 1 (1 x) b 1 dx The beta function on the right-hand side of Equation (2.2.4) is defined if both of its arguments are positive, which essentially means that the following holds: min k,j λ 3 (k j) + 1 > 0 (2.2.5) min j λ 4 (j + 1) > 0 From Equation (2.2.5) it is clear that the inequality is only crucial when λ 3, λ 4 < 0. Since 0 < j < k, Equation (2.2.5) can be written as min (λ 3, λ 4 ) > 1 k Using equation ( 2.2.4) we can obtain the first forth moment values v 1 = µ = 1 λ λ v 2 = 1 λ 2 3(3λ 3 + 1) + 1 λ 2 4(3λ 4 + 1) 2 λ 3 λ 4 β(λ 3 + 1, λ 4 + 1)

35 29 v 3 = 1 λ 3 3(3λ 3 + 1) 1 λ 3 4(3λ 4 + 1) 3 β(2λ λ , λ 4 + 1) + 3 β(λ 3λ 4 λ 3 λ , λ 4 + 1) 4 v 4 = 1 λ 4 3(4λ 3 + 1) + 1 λ 4 4(4λ 4 + 1) + 6 β(2λ λ 2 3λ , λ 4 + 1) 4 β(3λ 4 λ , λ 4 + 1) 3λ 4 4 β(λ λ 3 λ , 3λ 4 + 1) 4 From the above results and putting them into E(x k) k = (a k E(y) + b) k, we get the first four central moments of the FMKL parameterizations of the GLD. E(X µ) 2 = 1 λ 2 2 E(X µ) 3 = 1 λ 3 2 E(X µ) 4 = 1 λ 4 2 (v 2 v 2 1) (v 3 3v 1 v 2 + 2v 3 1) (v 4 4v 1 v 3 + 6v 2 1v 2 3v 4 1) So from equations α 3 = 1 σ 3 E(X µ) 3 and α 3 = 1 σ 3 and kurtosis of the GLD to be E(X µ) 3 we get the skewness α 3 = v 3 3v 1 v 2 + 2v 3 1 (v 2 v 2 1) 3 2 (2.2.6) α 4 = v 4 4v 1 v 3 + 6v 2 1 3v 4 1 (v 2 v 2 1) 2 (2.2.7)

36 30 So if we are given the meanˆµ, variance ˆσ 2, skewness ˆα 3, and kurtosis ˆα 4 of the sample data we can find λ 3 and λ 4 of the GLD by solving Once we have found λ 3 and λ 4 we can find λ 2 from ( 2.2.7) and then λ 1 from E(X) = (ae(y ) + b) and using a = 1 λ 2, b = λ 1 1 λ 2 ( 1 λ 3 1 λ 4 ) we get: v2 v1 2 λ 2 = ˆσ λ 1 = ˆµ ( λ 2 λ λ )

37 Chapter 3 Estimating the Parameters of the generalized lambda Distribution Fitting the GLD thought the method of moment 3.1 Introduction The method of moment consists of equating the first few moments of a population to the corresponding sample moments, thus getting as many equations as are needed to solve for unknown parameter of the population. Thus, if a population has r parameters, the method of moments consists of solving the system of equations ά k = µ k k= 1,2,3,...r, for the r parameters. where, ά k is the k th sample moment. As stated at the beginning, our intention is to fit a GLD to a data set by equating σ 1, σ 2, σ 3, σ 4 to ˆσ 1, ˆσ 2, ˆσ 3, ˆσ 4 the sample statistics corresponding to σ 1, σ 2, σ 3, σ 4 and solving the equations for λ 1, λ 2, λ 3, λ 4. For a data set X 1, X 2...X n, the sample moments corresponding to σ 1, σ 2, σ 3, σ 4 are 31

38 32 denoted by ˆσ 1, ˆσ 2, ˆσ 3, ˆσ 4, and are defined by ˆσ 1 = X = ˆσ 2 = n i=1 x i n, (3.1.1) n (x i X) 2, (3.1.2) n i=1 Solving the system of equation ˆσ 3 = ˆσ 4 = n (x i X) 3 i=1 nˆσ 3, (3.1.3) n (x i X) 4 i=1 nˆσ 4, (3.1.4) σ i = ˆσ i for i = 1, 2, 3, 4 (3.1.5) for λ 1, λ 2, λ 3, λ 4, the system is simplified some what by observing that A,B,C and D of (2.2.11) through (2.2.14) are free of λ 1, λ 2. Thus σ 3, and σ 4 depend only on λ 3, and λ 4. Hence if λ 3, and λ 4 can be obtained by solving the system σ 3 = ˆσ 3 and σ 4 = ˆσ 4 (3.1.6) of two equation in the two variables λ 3 and λ 4, then using (2.2.8) and (2.2.7) successively will yield λ 2, and λ 1. Unfortunately, (3.1.5) is complex enough to prevent an exact solution, forcing us to appeal to numerical methods to obtain approximate solutions. The values of the parameter λ 3 and λ 4 may be computed by solving system of equation (3.1.5) in the region ( 1/4, )( 1/4, ) of the (λ 3, λ 4 ). Algorithms such as for finding numerical solutions to systems of equations such search for a

39 33 solution by checking if an initial set of values (λ 3 = λ 3, λ 4 = λ 4) in the case of (3.1.6)) can be considered an approximate solution. This determination is made by checking if Max( σ 3 ˆσ 3, σ 4 ˆσ 4) ) < ε (3.1.7) when λ 3 = λ 3 and λ 4 = λ 4. The positive numbers ε represents the accuracy associated with the approximation; if it is determined that the initial set of values λ 3 = λ 3, λ 4 = λ 4 does not provide a sufficiently accurate solution, the algorithm searches for a better choice of λ 3 and λ 4 and iterates this process until a suitable solution is reached (i,e,, one that satisfies (3.1.6)). In algorithms of this type there is no assurance that the algorithm will terminate successfully nor that greater accuracy will be attained in successive iterations. Therefore, such searching algorithms are designed to terminate (unsuccessfully) if (3.1.6) is not satisfied after a fixed number of iterations. 3.2 Fitting the GLD by the Use of Tables Some readers may not have sufficient expertise in programming or adequate programming support to use the type of analysis that was illustrated in (3.1.1) a number of investigators have provided tables for the estimation of λ 1, λ 2, λ 3, λ 4. The first of these was given by Ramberg and Schmeiser (1974); et.al. and Dudewicz and Karian (1996) provide the most accurate and comprehensive tables to date. Unless some simplifications are used, tabulated results for determining λ 1, λ 2, λ 3, λ 4 from ˆσ 1, ˆσ 2, ˆσ 3, ˆσ 4. To make the tabulation manageable, we summarize this process in the GLD-M algorithm below

40 34 Algorithm GLD- M: Fitting a GLD distribution to data by the method of moments. GLD-M-1. Use (3.1.1) through (3.1.4) to compute ; ˆσ 1, ˆσ 2, ˆσ 3, ˆσ 4 GLD-M-2. Find the entry point in a table closest to ( ˆσ 3, ˆσ 4 ) GLD-M-3. Using ( ˆσ 3, ˆσ 4 ) from Step GLD-M-2 to extract λ 1 (0, 1), λ 2 (0, 1) λ 3 and λ 4 from the table GLD-M-4. If ˆσ 3 < 0, interchange λ 3 and λ 4 and change the sign of λ 1 (0, 1) ; GLD-M-5. Compute λ 1 = λ 1 (0, 1) ˆα 2 + ˆσ 1 and λ 2 = λ 2 (0, 1)/ ˆα 2. To illustrate the use of Algorithm GLD- M and the table Appendix A Suppose that σ 1, σ 2, σ 3, σ 4 have been computed to have value ˆσ 1 = 2, ˆσ 2 = 3, ˆσ 3 = 0.025, ˆσ 4 = 2 (3.2.1) Note that σ 4 has been taken to be the same as, and ˆσ 3 has been taken to be the negative of Step GLD-M-1 is taken care of since σ 1, σ 2, σ 3, and σ 4 have is given. For Step GLD-M-2, we observe that ˆσ 3 = ; hence, the closest point to ( ˆσ 3, ˆσ 4 ). in the Table of Appendix A is (0.15,2.0), giving us λ 1 (0, 1) = , λ 2 (0, 1) = , λ 3 = , λ 4 = The instructions on the use of the table in Appendix A indicate that a superscript of b in a table entry designates a factor of 10 b. In this case, an entry of for λ 3 indicates a value of = Since σ 3 < 0, step GLD- M- 4 readjusts these to λ 1 (0, 1) = , λ 2 (0, 1) = , λ 3 = , λ 4 =

41 35 with the computation in, step GLD-M-5 we get λ 1 = , λ 2 = , λ 3 = , λ 4 = Example Dudewicz, Levy, Lienhart, and Wehrli (1989) give data on the brain tissue MRI scan parameter, AD. It should be noted that the term parameter is used differently in brain scan studies it is used to designate what we would term random variables. In the cited study the authors show that AD 2 has a normal distribution while AD does not, and report the following 23 observations associated with scans of the left thalamus Proof. We compute the ˆσ 1, ˆσ 2, ˆσ 3, ˆσ 4 for AD to obtain ˆσ 1 = , ˆσ 2 = , ˆσ 3 = 0.162, ˆσ 4 = With these ˆσ 1, ˆσ 2, ˆσ 3, ˆσ 4, Karian, Dudewicz and McDonald (1996) fitted this data by following Algorithm GLD-M of, using the entry at ( σ 3, ˆσ 4 ) = (0.15, 2.1) to obtain the fit GLD 1 ( , , , )

42 36 with support [96.445, ]. Here we appeal to adequate programming to determine the more precise fit GLD 2 ( , , , ) whose support is [ , ]. Figure 3.1 (a) shows the p.d.f.s of GLD 1 (labeled (1)), and GLD 2 (labeled (2)), and a histogram of the data; Figure 3.1(b) shows the e.d.f. of the data with the d.f.s of GLD 1 and GLD 2 (the former is not labeled; the latter is labeled by (2)). The λ 1, λ 2, λ 3, λ 4 of GLD 1 and GLD 2 seem to differ significantly; however, at least visually, the p.d.f.s and d.f.s of the distributions appear to provide equally valid fits. With the small sample size of this example, we are not able to perform a chi-square test but we can partition the data into classes, such as (0, 103), [103, 107), [107, 110.5), [110.5, 0), whose respective frequencies are 7, 6, 5, 6, and calculate the chi-square statistic of the two fits to obtain and for GLD 1 and GLD 2 respectively GLD 1 : , , , , , , , , GLD 2 : , , , ,

43 37 Figure 3.1: Histogram of AD and GLD1 and GLD2 p.d.f.s, designated by (1) and (2) in (a); e.d.f.s of AD with the d.f.s of GLD1 and GLD2 in (b) , , , , These give the following chi-square statistics and p-values for the two fits: GLD 1 : χ 2 statistic = , p-value = GLD 2 : χ 2 statistic = , p-value = Note that: 1- The p-value, which directly depends on a given sample, attempts to provide a measure of the strength of the results of a test for the null hypotheses, in contrast to a simple reject or do not reject in the classical approach to the test of hypotheses. If the null hypothesis is true and the chance of random variation is the only reason for sample differences, then the p-value is a quantitative measure to feed into the decision

44 38 making process as evidence. The following table provides a reasonable interpretation of p-values: P-value Interpretation P < 0.01 very strong evidence against H < P < 0.05 moderate evidence against H < P < 0.10 suggestive evidence against H < P little or no real evidence against H 0

45 Estimating the parameters of the generalized lambda distribution; the least squares method One of the problems involved in summarizing a set of data is to find a probability distribution model that will fit the data well. problem is to use a flexible family of distributions. The usual way of tackling this Families of distributions that have been constructed to cover a wide range of distribution shapes include the Pearson system (Johnson and Kotz 1970), the Johnson system (Johnson 1949), and the generalized lambda distribution (GLD) developed by Ramberg and Schmeiser (1972, 1974). Ramberg and Schmeiser used the GLD, which is a generalization of Tukey s lambda distribution (Tukey 1960), to generate continuous random variables. The least squares method for computing the parameters of the GLD (Ozturk and Dale 1985) can be described as follows let x i, i = 1,.., n denote the ith order statistic of the data which is to be represented by the quantile function Q(u) and let u i, i = 1,,,., n denote the order statistic of the corresponding uniformly distributed random variable F(X). Definition [2] Order statistics Let X 1, X 2,..., X n, denote a random sample of size n from a cumulative distribution function f(.), where Y i are the X i arranged in order of increasing magnitudes and defined to the order statistics corresponding to the random sample X 1, X 2,..., X n.

46 40 The least squares method finds the values of λ for which the differences between the observed and predicted order statistics are as small as possible, Thus, it minimizes the function where G(λ) = n i=1 [x i λ 1 + Z i λ 2 ] 2 (3.4.1) The formula for computing Z i = 1 λ 3 [E(u λ 3 (i) ) 1] 1 λ 4 [E(1 u (i) ) λ 4 1] E(u λ 3 (i) ) and E(1 u (i)) λ 4 and E(u λ 3 (i) ) = Γ(n + 1) Γ(i + λ 3) Γ(i) Γ(n + λ 3 + 1) and E(1 u (i) ) λ 4 = Γ(n + 1) Γ(n i + λ 4 + 1) Γ(n i + 1)Γ(n + λ + 1) Note that the function to be minimized in (Equation 3.4.1) depends on all four parameters. To avoid solving a computationally demanding minimization problem in fourdimensional space,(λ 1, λ 2 ) are decoupled from (λ 3, λ 4 ) as in the moment-matching method. We first assume that λ 3 and λ 4 are constant and solve the minimization problem for λ 1 and λ 2. The values for λ 1 and λ 2, once obtained, are substituted into the minimization function and then compute λ 3 and λ 4. We adopt this strategy based on the observation that the expression for G(λ) has λ 1

47 41 and λ 2 is in linear form, whereas λ 3 and λ 4 appear in nonlinear form. Thus, differentiating G(λ) with respect to λ 1 and λ 2 and setting the resulting function equal to zero, we obtain λ 1 = µ x b xz µ z (3.4.2) λ 2 = 1 b xz where µ x and µ z denote the mean of sample data and the quantities Z i, respectively, and the regression coefficient b xz is given by b xz = n i=1 (x i µ x )(z i µ z ) n i=1 (z i µ z ) (3.4.3) Inserting Equations (3.4.2) and (3.4.3) in Equation (3.4.1), we obtain, after some rearrangement of terms, n G(λ 3, λ 4 ) = (1 r xz (λ 3, λ 4 ) 2 ) (x i µ x ) 2 where r xz is the correlation coefficient between the quantities x i and z i. Thus, in order to minimize G(λ 3, λ 4 ) we need to maximize the quantity r xz (λ 3, λ 4 ) or, equivalently, minimize the function i=1 H(λ 3, λ 4 ) = r xz (λ 3, λ 4 ) 2 (3.4.4) Once λ 3 and λ 4 have been obtained by minimizing Equation (3.4.4), they can be inserted into Equations (3.4.2) and (3.4.5) in order to compute λ 1 and λ 2 Before leaving this section, we again emphasize that maximizing the square of the correlation coefficient r xz does not guarantee that the estimated parameters will yield a distribution

48 42 that closely matches the empirical distribution of the sample data. We now describe an example that illustrates the least Square method. 3.5 Example To illustrate the LS method, consider the ordered data set given in the Table The random sample was generated using the quantile function of the GLD for which λ 1 = 4.114, λ 2 =.1333, λ 3 =.0193, and λ 4 =.1588 to give values of µ = 5.0, σ = 1.0, α 3 = 1.0, and α 4 = 4.0 for the mean, standard deviation, skewness, and kurtosis, respectively. The 75 observations have sample moments, skewness, and kurtosis X = S = ˆα 3 = ˆα 4 = By using the tables of Ramberg et al. (1979), it can be shown that for the preceding, combination of coefficients of skewness and kurtosis, corresponding values of λ 3 and λ 4 do not exist see Oztrk A. Dale R.F. (1985). Hence the method of moments cannot be used for this data set. Least squares estimates of λ 3 and λ 4 based on X i = λ 1 + Q i λ 2 + e i

49 43 Where Q i is defined as Q i = E(u λ 3 (i) ) E(1 u (i)) λ 4 and the random variable e i has mean zero and variance σ 2 i, are obtained using the Nelder-Mead simplex procedure for function minimization, where the objective function ϕ(λ 3, λ 4 ) = 1 r 2 X Q is minimized subject to the constraint that λ 3, λ 4 > 0. ( See Olsson 1974 for the use of subroutine NELMIN for the Nelder-Mead procedure.) Then ˆλ 3 and ˆλ 4 > 0 are calculated from (4.1.3). The corresponding estimates are ˆλ 1 = ˆλ2 = , ˆλ 3 = , ˆλ 4 = Defining the quantity Q i, as in X i = λ 1 + Q i λ 2 + e i, Q i = E(u λ 3 (i) ) E(1 u (i)) λ 4 is also used to estimate the parameters following the same minimization procedure. Parameter estimates are calculated to be ˆλ 1 = ˆλ2 = , ˆλ 3 = , ˆλ 4 =

50 Chapter 4 Estimating the parameters of the generalized lambda Distribution The Percentiles Method for fitting generalized lambda distribution. to data sets The generalized lambda distribution, GLD(λ 1, λ 2, λ 3, λ 4 ), is a four-parameter family that has been used for fitting distributions to a wide variety of data sets. In almost all cases the method of moments has been used to determine the parameters of the GLD that fits a given data set, negating the possibility of applying those members of the GLD family that do not possess the first four moments and yet may provide superior fits to the data. In this chapter we develop a method for fitting a GLD distribution to data that is based on percentiles rather than moments. This approach makes a larger portion of the GLD family accessible for data fitting and eases some of the computational difficulties encountered in the method of moments. We now consider a GLD(λ 1, λ 2, λ 3, λ 4 ) fitting process that is based exclusively on percentiles. The concept and name percentile (also, quartile and decile ) are 44

51 45 due to Galton (1875), who in his 1875 paper proposed to characterize a distribution by its location (median), and its dispersion (half the interquartile ranger) and it fits a GLD(λ 1, λ 2, λ 3, λ 4 ) distribution to a given dataset by specifying four percentile-based sample statistics and equating them to their corresponding GLD(λ 1, λ 2, λ 3, λ 4 ) statistics. The resulting equations are then solved for (λ 1, λ 2, λ 3, λ 4 ) with the constraint that the resulting GLD be a valid distribution. To make the percentile approach an acceptable alternative to the method of moments and to provide the necessary computational support for the use of percentile-based fits, Dudewicz and Karian(2000) give extensive tables for estimating the parameters of the fitted GLD(λ 1, λ 2, λ 3, λ 4 ) distribution. These tables are reproduced in Appendix B. There are three principal advantages to the use of percentiles: 1. There is a large class of GLD(λ 1, λ 2, λ 3, λ 4 ) distributions that have fewer than four moments and these distributions are excluded from consideration when one uses parameter estimation methods that require moments. On the occasions when moments do not exist or may be out of table range, percentiles can still be used to estimate parameters and obtain GLD(λ 1, λ 2, λ 3, λ 4 ) fits. 2. The equations associated with the percentile method that we will consider are simpler and the computational techniques required for solving them provide greater accuracy. 3.The relatively large variability of sample moments of orders 3 and 4 can make it difficult to obtain accurate GLD(λ 1, λ 2, λ 3, λ 4 ) fits through the method of moments.

52 The Use of Percentiles For a given sample of independent observation, X l, X 2..., X n, let ˆπ p denote the (100p)th percentile of the data. ˆπ p is computed by first writing (n + l)p as r + a, b where r is a positive integer and a is a proper fraction, possibly zero. If Y b 1, Y 2,.,., Y n are the order statistics of the observation X l, X 2..., X n,, then ˆπ p can be obtained from ˆπ p = Y r + a b ( Y r+1 Y r ) (4.1.1) where Y 1 Y 2... Y n are the order statistics. This definition of the (100p)th data percentile differs from the usual definition. Consider, for example, p = 0.5 where the sample median is usually defined as M n = Y k if n = 2k for some integer k and M n = (Y k + Y k+1 ) 2 if n = 2k + 1 for some integer k. By contrast, the sample quantile of order 0.5 is usually defined as Z 0.5 = Y [0.5n]+1 where [0.5n] denotes the largest integer less than [0.5n]. Since the sample quantile can be defined as a function of a single order statistic, it is mathematically somewhat simpler. The sample statistics that we will use are defined by

53 47 ˆρ 1 = ˆπ 0.5 (4.1.2) ˆρ 2 = ˆπ 1 u ˆπ u (4.1.3) ˆρ 3 = ˆπ 0.5 ˆπ u ˆπ 1 u ˆπ 0.5 (4.1.4) ˆρ 4 = ˆπ 0.75 ˆπ 0.25 ˆρ 2 (4.1.5) where u is an arbitrary number between 0 and 1. These statistics have the following interpretations (where for ease of discussion we momentarily assume u = 0.1). 1. ˆρ 1 is the sample median; 2. ˆρ 2 is the inter-decile range, i.e., the range between the 10th percentile and 90th percentile; 3. ˆρ 3 is the left-right tail-weight ratio, a measure of relative tail weights of the left tail to the right tail ( distance from median to the 10th percentile in the numerator and distance from 90 percentile to the median in the denominator); 4. ˆρ 4 is the tail-weight factor or the ratio of the inter-quartile range to the interdecile range, which cannot exceed 1 and measures the tail weight (values that are close to 1 indicate the distribution is not greatly spread out in its tails, while values close to 0 indicate the distribution has long tails). In the case of N(µ, σ 2 ), the normal distribution with mean µ and variance σ 2, we have

54 48 ˆρ 1 = µ, ˆρ 2 = 2.56 σ, ˆρ 3 = 1, ˆρ 4 = = 0.53 This indicates, respectively, that the median of N(µ, σ 2 ), is µ, the middle 80 of the probability is in the range of about two-and-a-half standard deviations from the median, left and right tail weights are equal, and the inter-quartile range is 53 of the inter-decile range From the definition of the GLD(λ 1, λ 2, λ 3, λ 4 ) inverse distribution function (1.2.1), Dudewicz and Karian(1999). We now define ρ 1, ρ 2, ρ 3, ρ 4 the GLD counterparts of ˆρ 1, ˆρ 2, ˆρ 3, ˆρ 4 as ρ 1 = Q( 1 2 ) = λ 1 + ( 1 2 )λ 3 ( 1 2 )λ 4 λ 2 (4.1.6) ρ 2 = Q(1 u) Q(u) = (1 u)λ 3 u λ 4 + (1 u) λ 4 u λ 3 λ 2 (4.1.7) ρ 3 = Q( 1) Q(u) 2 Q(1 u) Q( 1) = (1 u)λ4 u λ3 + ( 1 2 )λ3 ( 1 2 )λ4 (1 u) 2 λ 3 u λ 4 + ( 1 2 )λ 4 ( 1 )λ 2 3 (4.1.8) ρ 4 = Q( 3 4 ) Q( 1 4 ) Q(1 U) Q( 1 2 ) = ( 3 4 )λ3 ( 1 2 )λ4 + ( 3 4 )λ4 ( 1 4 )λ3 (1 u) λ 3 u λ 4 + (1 u) λ 4 u λ 3 (4.1.9) The following are direct consequences of the definitions of ρ 1, ρ 2, ρ 3, ρ 4 1- Since λ 1 may assume any real value, we can see from (4.1.6) that this is also true for ρ Since 0 < u < 1, we have 3 < 1 u and from (4.1.7) we see that ρ > The numerator and denominator of ρ 3 in(4.1.8) are both positive; therefore ρ 3 > 0.

55 49 4- In (4.1.9), because of the restriction on u, the denominator of ρ 4 must be greater than or equal to its numerator, confining ρ 4 to the unit interval. In summary, the definitions of ρ 1, ρ 2, ρ 3, ρ 4 lead to the restrictions: < ρ 1 <, ρ 2 > 0, ρ 3 > 0, 0 < ρ 4 < 1 (4.1.10) The fitting of a GLD(λ 1, λ 2, λ 3, λ 4 ) to a given data set X 1, X 2,.., X n is done by solving the system of equations ˆρ i = ρ i (i = 1, 2, 3, 4) for λ 1, λ 2, λ 3, λ 4 The definitions of ˆρ 1, ˆρ 2, ˆρ 3, ˆρ 4 in (4.1.2) through (4.1.5) may have seemed strange or arbitrary to this point. However, we now observe the main advantage of these definitions. the subsystem ˆρ 3 = ρ 3 and ˆρ 4 = ρ 4 involves only λ 3 and λ 4, allowing us to first solve this subsystem for λ 3 and λ 4 and then substitutes λ 3 and λ 4 in ˆρ 2 = ρ 2 to obtain λ 2 from λ 2 = (1 u)λ 3 u λ 4 + (1 u) λ 4 u λ 3 ˆρ 2 (4.1.11) and finally, using the values of λ 2, λ 3 and λ 4 in ˆρ 1 = ρ 1 to obtain λ 1 from λ 1 = ˆρ 1 + ( 1 2 )λ 3 ( 1 2 )λ 4 λ 2 (4.1.12) As we consider solving the system ˆρ 3 = ρ 3 and ˆρ 4 = ρ 4, it becomes necessary to give u a specific value. For particular u we must have (n + 1)u 1 to be able to compute

56 50 ˆπ u,ˆπ 1 u and eventually ˆρ 2, ˆρ 3 and ˆρ 4. If u is too small, say u = 0.01, then our method will be restricted to large samples (n > 99 for the u = 0.01 case). 4.2 Estimation of GLD Parameters through a Method of Percentiles As was the case with the equations of Chapters 2, ˆρ 3 = ρ 3, ˆρ 4 = ρ 4, cannot be solved in closed form. We use an algorithm to obtain approximate solutions to these equations. In this case the equations are simpler and good approximations with Max( ρ 3 ˆρ 3, ρ 3 ˆρ 3 ) < 10 6 are generally obtained within 3 or 4 iterations. Solutions for a given (ˆρ 3, ˆρ 4 ) can be found in various regions of (λ 3, λ 4 ) space Depending on the precise values of ρ 3 and ρ 4, as many as four solutions may exist in just one region. With the availability of the tables of Appendix B, the algorithm below shows how to obtain numerical values of λ 1, λ 2, λ 3, and λ 4 for a GLD fit. Algorithm GLD-P: Fitting a GLD distribution to data percentile method. GLD-P-1. Use (4.1.2) through (4.1.5) to compute ˆρ 1, ˆρ 2, ˆρ 3, ˆρ 4. GLD-P-2. Find the entry point in one or more of the tables of Appendix B closest to (ˆρ 3, ˆρ 4 ) ; if ˆρ 3 > 1, use ( 1ˆρ 3, ˆρ 4 ) instead of (ˆρ 3, ˆρ 4 ). GLD-P-3. Using the entry point from Step GLD-P-2, extract ˆλ 3 and ˆλ 4, if ˆρ 3 > 1, interchange λ 3 and λ 4.

57 51 GLD-P-4. Substitute ˆλ 3 for λ 3 and ˆλ 4 for λ 4 in (4.1.7) to determine ˆλ 2. GLD-P-5. Substitute ˆλ 2 for λ 2, ˆλ 3 for λ 3 and ˆλ 4 forλ 4 in (4.1.6) to obtain ˆλ 1. The tables of Appendix B can be used to obtain reasonable starting points for a search that ultimately is likely to estimate λ 1, λ 2, λ 3,, and λ 4 more accurately. Examples that illustrate this below Note: for specified ρ 1 and ρ 2 and we find five tables that give λ 3, and λ 4 values for fitting GLD system by method of percentiles, in the regions designated by T 1, T 2, T 3, T 4 and T 5 respectively, in Figure (4.1) below Figure 4.1: (ρ3, ρ4) -space covered by table 1,2 and 3 denoted by T 1,T 2 and T 3 respectively

58 52 We now describe an example that illustrates the method of percentiles 4.3 Example The data for this example (listed below) are given in example (2) of Karian and Dudewicz (2000) Proof. We first attempt to obtain fits by using the moment-based methods of Chapters 2 and compute ˆσ 1, ˆσ 2, ˆσ 3, ˆσ 4, to get ˆα 1 = 0.346, ˆα 2 = , ˆα 3 = 1.867, ˆα 4 = , The (ˆα 2 3, ˆα 4 ) that we have is well outside the range of the tables in Appendices A, making it impossible to fit a distribution from the GLD family by the methods moments. To obtain a percentile-based fit, we compute ˆρ 1, ˆρ 2, ˆρ 3, ˆρ 4, ˆρ 1 = , ˆρ 2 = 7.26, ˆρ 3 = 1.87, ˆρ 4 = 31.39,

59 53 and obtain two fits from Tables B-l and B-5, respectively, of Appendix B. GLD 1 ( , , , ), GLD 5 ( , , , ). GLD 5 turns out to be the superior fit. A assuring us that the, support of the resulting fit will be (, ) ; by contrast, the support of GLD 1 is [-8.3,7.6]. Although most of the data is concentrated on the interval [-6, 6], the range of the data is [-37.9, 48.4]. A histogram on [-37.9, 48.4] would be so compressed that its main features would not be visible. A slightly distorted histogram of the data (when the 8 of the 100 observations outside of the interval [-6, 6] are ignored) and the GLD 1 and GLD 2 p.d.f. shown in Figure 4.2 (a) (the p.d.f. of GLD 1 rises higher at the center). Figure 4.2 (b) shows the e.d.f. of the data with d.f.s of GLD 1 and GLD 2. When the data is partitioned into the intervals (, 3], (-3, -1.5], (0,.4], (-1.5, -.7], (-.7, -.4], (.4,.7],(.7, 1.5], (1.5, 3], (-.4,0], (3, ),we obtain observed frequencies of 10, 12, 11, 11, 14, 8, 8, 7, 6, 13. and the expected frequencies for these intervals that result from GLD 1 are , , , , , , , , , These lead to the chi-square goodness-of-fit statistic and corresponding p-value of and , respectively. For GLD 2, the expected frequencies are , , , , ,9.8868, , , , ,

60 54 Figure 4.2: Histogram of data and the p.d.f.s of the fitted GLD1 and GLD5 (a); the e.d.f. of the data with the d.f.s of the fitted GLD1 and GLD2 (b). and the resulting chi-square statistic and p-value are and , Justifying our earlier observation that GLD 5 is better of the two fits.

61 Chapter 5 GLD Approximations to Some Well Known Distributions 5.1 GLD Approximations to Some Well Known Distributions by using a method of moment The large variety of shapes that the GLD(λ 1, λ 2, λ 3, λ 4 ) p.d.f. can attain. For the GLD(λ 1, λ 2, λ 3, λ 4 )to be useful for fitting distributions to data, it should be able to provide good fits to many of the distributions the data may come from. In this section we see that the GLD(λ 1, λ 2, λ 3, λ 4 ) fits well many of the most important distributions The Normal Distribution The normal distribution, with mean µ and variance σ 2,(σ > 0), N(µ, σ 2 ), has p.d.f. f(x) = 1 σ (x µ) 2 2π exp 2σ 2, < x < 55

62 56 Since all normal distributions can be obtained by a location and scale adjustment to N (0, 1), we consider agld(λ 1, λ 2, λ 3, λ 4 ) fit to N (0, 1) for which α 1 = 0, α 2 = 1, α 3 = 0, α 4 = 3, Appendix A suggests (λ 3, λ 4 ) = (0.13, 0.13) as a starting point for our invocation of adequate programming support which yields GLD(0, , , ) Figure 5.1: Fig. 5.2 shows a plot of N(0,1) and the GLD approximation, as you can see the GLD provides a very good fit with the two distributions almost indistinguishable except near x = 0 where the GLD rises slightly higher. If we denote the pdf of N(0,1) by ˆf(x) and that of the GLD by f(x) we get Max f(x) ˆf(x) = So we can say that the GLD is accurate to within Next we shall look at the c.d.f of N(0,1) and the corresponding GLD. Unfortunately we cannot find the cdf of

63 57 the Normal distribution exactly however there are a number of good approximations, we have implemented the approximation, which returns the c.d.f of N(0,1), ˆF (x) to a minimum accuracy of The graph obtained is shown in Fig 5.3 Figure 5.2: From the graph above we can see that the GLD c.d.f provides a very good fit to that of the c.d.f of N(0,1), in fact the two curves are so close together you can hardly tell them apart. We also found Max F (x) ˆF (x) = Which shows that the GLD provides an even better fit to the c.d.f than did the p.d.f, these results together show that the GLD provide a very good fit to the normal distribution.

64 The Uniform Distribution The pdf of the uniform distribution on the interval [a,b], U(a,b) is as follows where 1, ifa < x < b b a f(x) = 0, otherwise µ = a + b 2, σ 2 = (b a)2 12, α 3 = 0, α 4 = 9 5 Looking at U (0,1) for which µ = 0.5 and σ 2 = we get GLD parameter values of λ 1 = 0.5, λ 2 = 2, λ 3 = 1, λ 4 = 1. In this case the GLD will be a perfect fit since these values of λ 1, λ 2, λ 3, λ 3 in (1.2.2) yield Q(u) = u and thus F(x) = u and f(x) =1, which are the c.d.f and p.d.f of U(0,1). Fig. 5.4 shows the resulting graph for the c.d.f which you can see is exact. Figure 5.3:

65 The Exponential Distribution The p.d.f of the Exponential distribution is given as 1 x β f(x) = exp β, for x > 0 0 otherwise µ = β, σ 2 = β 2, α 3 = 2, α 4 = 6 The c.d.f is defined as: F (x) = 1 exp x β We can see that the values of α 3, α 4 remain unchanged for whatever value of β Therefore, we can use the values of λ 3 and λ 4 that we obtain for say β = 1 to all other exponential distributions. Using β = 1, we haveµ = 1 and σ 2 = 1 which gives λ 1 = 0.155, λ 2 = , λ 3 = 6.784, λ 4 = Figure 5.4:

66 60 We can see from Fig. 5.5 that the GLD is slightly higher initially but still provides a good fit, this is illustrated by: Max f(x) ˆf(x) = If we now look at how the c.d.fs compare, we find max F (x) ˆF (x) = , and from Fig6.6 we see again that the GLD provides a good approximation. Figure 5.5: In the next section we approximate the above three distribution with a percentile method

67 GLD Approximations of Some Well-Known Distribution by using a method on percentiles In this section we use the percentile-based method described in chapter5 fit GLD distributions to some of the important distributions. That in most cases we will be able to fit several GLD fits to a given distribution, Of course, this does not mean that all the fits will be good ones. In the following sections as we consider two distributions. We will discover that, generally, the percentile method produces three fits of which one is clearly superior to the others. In our first example we fit the N (0,1) distribution, obtain three fits, and give all the details associated with each fit The Normal Distribution for the normal distribution the percentile statistics, ρ 1, ρ 2, ρ 3, ρ 4 are ρ 1 = µ, ρ 2 = 2.563σ, ρ 3 = 1, ρ 4 = Locating (ρ 3, ρ 4 ) in Figure 6.8, we see that three fits are available from Tables B-l, B-2, and B-3. Through the use of these table entries, we get the three fits GLD 1, GLD 2 and GLD 3 to N(0,1), associated with Tables B-l, B-2, and B-3, respectively. These fits are GLD 1 ( 0.858, 0.397, 1.554, ) GLD 2 (0, 0.546, 3.389, 3.390) GLD 3 (0, 0.214, 0.149, 0.149)

68 62 with respective supports [ 3.38, 1.66], [ 1.83, 1.83], and [ 4.67, 4.67]. GLD 1 seems to be asymmetric (even though it has ρ 3 = 1 and hence, as measured by the left-right tail-weight is ρ 3 -symmetric) because ; therefore, may not be a suitable fit for N (0, 1), depending on why N (0, 1) is chosen in a particular application. GLD 2 is symmetric but its support is much too confined to be a suitable fit for N (0, 1). GLD 1, GLD 2, GLD 3, and the N (0, 1) p.d.f.s are shown in Figure 5.7 and the GLD s is marked by (1), (2) Figure 5.6: The GLD1, GLD2, and GLD3 fits to N(0,1) marked by (1), (2), and (3), respectively; the N(0,1) and GLD3 p.d.f.s cannot distinguished. and (3). In the figure above N(0,1) p.d.f. cannot be seen as a distinct curve because it coincides (visually) with GLD 3. The support of the moment-based GLD fit of N (0, 1) obtained was [-5.06, 5.06] however, this may not be a problem in many applications (see discussion at the beginning of Chapter 2) because the N(0,1) tail probability outside of the GLD 3 support is approximately We complete our

69 63 first check for the N (0, 1) fits by noting that Sup ˆf 1 (x) f(x) = Sup ˆf 1 (x) f(x) = Sup ˆf 1 (x) f(x) = where ˆf i (x) are the GLD i p.d.f.s and f(x) is the p.d.f. of N(0,1). There are perceptible differences between the graphs of the GLD 1 and GLD 2 d.f.s and the d.f. of N(0, 1). However, the graphs of the N(0,1) and GLD 3 d.f.s appear to be identical and to complete our second check, we note that Sup ˆF 1 (x) F (x) = Sup ˆF 1 (x) F (x) = Sup ˆF 1 (x) F (x) = where ˆF i (x) are the GLD i estimated commutative distribution function F (x) and F(x) is the c.d.f. of computed from the normal distribution N(0,1). The result is, as indicated in the introductory paragraph of this section, we obtained several fits, with one of the fits, GLD 3, clearly superior to the others.

70 The Uniform Distribution The values of ρ 1, ρ 2, ρ 3, ρ 4 for the uniform distribution on the interval (a, b) are ρ 1 = 1 2 (a + b), ρ 2 = 4 5 (a + b), ρ 3 = 1, ρ 4 = 5 8 We can see from Figure 4.1 that it is possible to find fits from Tables B-l, B-2, B-3, and B-4. Computations associated with the four entries, when a = 0 and b = 1. Obtained from Tables B-l through B-4, yield the following fits, respectively. GLD 1 ( , , , ), GLD 2 (0.5000, , , ), GLD 3 (0.5000, , , ), GLD 4 ( , , , ). We observe that GLD 1 lacks symmetry, as did one of the fits to N (0, 1). For either GLD 2 or GLD 3, the substitution of λ 1, λ 2, λ 3, λ 4 into Q(y), the inverse distribution function that defines the GLD(λ 1, λ 2, λ 3, λ 4 ) (see (1.2.1)), yields Q(y) = y This not only establishes that GLD 2 and GLD 3 are the same, but that this common fit is a perfect one that matches the inverse distribution function of the uniform distribution on (0,1). See the latter part for a discussion of how to obtain a GLD fit to the general uniform distribution on the interval (a, b) from a fit of the uniform distribution on (0,1) The Exponential Distribution The d.f. of the exponential distribution is F (x) = x 0 f(t)dt = 1 exp x β, for x > 0

71 65 and 0 for x < 0, where f(t) is the p.d.f. of the exponential distribution. The percentile function, Q(x), of this distribution is Q(x) = β(1 x). Since for 0 < p < 1, π p, the (100p)-th percentile, is characterized by π p = Q(p), we can easily compute π 0.I, π 0.25, π 0.5, π 0.75 ; and π 0.9 to obtain the ρ 1, ρ 2, ρ 3, ρ 4 the exponential distribution with parameter β. These are ρ 1 = β ln 2, ρ 2 = 2β ln 3, ρ 3 = ln 9 ln 5 1 = , ρ 4 = 1 2 Since for all values of β, (ρ 3, ρ 4 ), rounded to two decimals, is (0.37,0.50), Figure 6.8 indicates the possibility of solutions from Tables B-l and B-2. When β = 3, these lead, through the use of Algorithm GLD-P, to the two fits GLD 1 (1.0498, , , ), GLD 2 (5.0180, , , ), with respective supports [-6.95,9.05] and [-0.066,10.10]. Figure 4.1 shows the p.d.f.s of GLD 1, GLD2, and the exponential distribution with β = 3. Figure 5.7: The p.d.f.s of the exponential distribution (with θ = 3), GLD1 marked by (I), and GLD2 marked by (2).

72 Application Example Data giving shows the dose of cytoxan (CTX) which used to treat various types of cancer in the oncology department in European Gaza Hospital. The dose calculated by body mass area multiplies in prescribed dose of that drug according to the patient. The total dose unit is measured by mg. The data illustrated the prescribed dose for 40 patients Proof. We compute ˆα 1, ˆα 2, ˆα 3, ˆα 4 for (CTX) dose ˆα 1 = 864, ˆα 2 = , ˆα 3 = 0.081, ˆα 4 = with these ˆα 1, ˆα 2, ˆα 3, ˆα 4, we cant fitted this data by Algorithm GLD-M since ( ˆα 3, ˆα 4 ) out of given tables by Karian and Dudewicz (1999). So we compute ˆρ 1, ˆρ 2, ˆρ 3, ˆρ 4, by Algorithm GLD-P ˆρ 1 = 865, ˆρ 2 = 765, ˆρ 3 = 0.275, ˆρ 4 = 0.41 From the location of ( 0.270, 0.41) in Figure5.1 we should expect two solution to arise from entries of Tables B-1, and B-5 of Appendix B. These solutions, extracted from

73 67 the tables, have, respectively (λ 3, λ 4 ) = (4.1503, ) ( 0.842, ) and then by using ρ 1 = Q( 1 2 ) = λ 1 + ( 1 2 )λ 3 ( 1 2 )λ 4 λ 2 ρ 2 = Q(1 u) Q(u) = (1 u)λ 3 u λ 4 + (1 u) λ 4 u λ 3 λ 2 we get GLD 1 ( , , , ) With support [ , ] And GLD 5 ( , , , ) With support [ , ] GLD 5 in spite of it s support is the better of two fits

74 Appendices Table A-l for GLD Fits: Method of Moments For specified α 3 and α 4, Table A-l gives the λ 1 (0, 1), λ 2 (0, 1),λ 3, and λ 4 for GLD fits. In order to give a sufficient number of significant digits, superscripts are used to designate factors of. Thus, an entry of a s designates a 10 s. For example, for (α 3, α 4 ) close to (0.15,4.1), the table gives (λ 1 (0, 1), λ 2 (0, 1)), λ 3, λ 4 ) = ( , , , ). The proper interpretation of these table entries is (λ 1 (0, 1), λ 2 (0, 1)), λ 3, λ 4 ) = ( , , , ). With few exceptions, Table A-l provides values of λ 1, λ 2, λ 3, λ 4 for which Max α i ˆα i < 10 5 The exceptions occur when very small changes in λ 3 or λ 4 cause large variations in α 3 and α 3. a situation that arises when λ 3 or λ 4 gets close to 0. When λ 3 < 10 2 or λ 4 < 10 2, we generally have Max α i ˆα i <

75 69 In the rare instances where λ 3 < 10 4 or λ 4 < 10 4, we can only be assured of Max α i ˆα i < 10 2 The entries of this part of Table A-l are from The Extended Generalized Lambda Distribution (EGLD) System for Fittin g Distributions to Data with Moments, II: Tables by E.J. Dudewicz and Z.A. Kari-an, American Journal of Mathematical and Management Sciences, V. 16, 3 and 4 (1996), pp , copyright 1996 by American Sciences Press, Inc., 20 Cross Road, Syracuse, New York Reprinted with permission.

76 71

77 72

78 73

79 74

80 75

81 76

82 77

83 78

84 79

85 80

86 81

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES

IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES IAM 530 ELEMENTS OF PROBABILITY AND STATISTICS LECTURE 3-RANDOM VARIABLES VARIABLE Studying the behavior of random variables, and more importantly functions of random variables is essential for both the

More information

MATH4427 Notebook 4 Fall Semester 2017/2018

MATH4427 Notebook 4 Fall Semester 2017/2018 MATH4427 Notebook 4 Fall Semester 2017/2018 prepared by Professor Jenny Baglivo c Copyright 2009-2018 by Jenny A. Baglivo. All Rights Reserved. 4 MATH4427 Notebook 4 3 4.1 K th Order Statistics and Their

More information

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations

Basics of Experimental Design. Review of Statistics. Basic Study. Experimental Design. When an Experiment is Not Possible. Studying Relations Basics of Experimental Design Review of Statistics And Experimental Design Scientists study relation between variables In the context of experiments these variables are called independent and dependent

More information

arxiv:math/ v2 [math.st] 26 Jun 2007

arxiv:math/ v2 [math.st] 26 Jun 2007 Characterizing the generalized lambda distribution by L-moments arxiv:math/0701405v2 [math.st] 26 Jun 2007 Juha Karvanen Department of Health Promotion and Chronic Disease Prevention, National Public Health

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Brief Review of Probability

Brief Review of Probability Maura Department of Economics and Finance Università Tor Vergata Outline 1 Distribution Functions Quantiles and Modes of a Distribution 2 Example 3 Example 4 Distributions Outline Distribution Functions

More information

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS

EXAM. Exam #1. Math 3342 Summer II, July 21, 2000 ANSWERS EXAM Exam # Math 3342 Summer II, 2 July 2, 2 ANSWERS i pts. Problem. Consider the following data: 7, 8, 9, 2,, 7, 2, 3. Find the first quartile, the median, and the third quartile. Make a box and whisker

More information

Continuous Distributions

Continuous Distributions Chapter 3 Continuous Distributions 3.1 Continuous-Type Data In Chapter 2, we discuss random variables whose space S contains a countable number of outcomes (i.e. of discrete type). In Chapter 3, we study

More information

Module 3. Function of a Random Variable and its distribution

Module 3. Function of a Random Variable and its distribution Module 3 Function of a Random Variable and its distribution 1. Function of a Random Variable Let Ω, F, be a probability space and let be random variable defined on Ω, F,. Further let h: R R be a given

More information

First Year Examination Department of Statistics, University of Florida

First Year Examination Department of Statistics, University of Florida First Year Examination Department of Statistics, University of Florida August 20, 2009, 8:00 am - 2:00 noon Instructions:. You have four hours to answer questions in this examination. 2. You must show

More information

Recall the Basics of Hypothesis Testing

Recall the Basics of Hypothesis Testing Recall the Basics of Hypothesis Testing The level of significance α, (size of test) is defined as the probability of X falling in w (rejecting H 0 ) when H 0 is true: P(X w H 0 ) = α. H 0 TRUE H 1 TRUE

More information

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable

Distributions of Functions of Random Variables. 5.1 Functions of One Random Variable Distributions of Functions of Random Variables 5.1 Functions of One Random Variable 5.2 Transformations of Two Random Variables 5.3 Several Random Variables 5.4 The Moment-Generating Function Technique

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables

UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables UQ, Semester 1, 2017, Companion to STAT2201/CIVL2530 Exam Formulae and Tables To be provided to students with STAT2201 or CIVIL-2530 (Probability and Statistics) Exam Main exam date: Tuesday, 20 June 1

More information

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean

Confidence Intervals. Confidence interval for sample mean. Confidence interval for sample mean. Confidence interval for sample mean Confidence Intervals Confidence interval for sample mean The CLT tells us: as the sample size n increases, the sample mean is approximately Normal with mean and standard deviation Thus, we have a standard

More information

Statistics 100A Homework 5 Solutions

Statistics 100A Homework 5 Solutions Chapter 5 Statistics 1A Homework 5 Solutions Ryan Rosario 1. Let X be a random variable with probability density function a What is the value of c? fx { c1 x 1 < x < 1 otherwise We know that for fx to

More information

Probability Theory and Statistics. Peter Jochumzen

Probability Theory and Statistics. Peter Jochumzen Probability Theory and Statistics Peter Jochumzen April 18, 2016 Contents 1 Probability Theory And Statistics 3 1.1 Experiment, Outcome and Event................................ 3 1.2 Probability............................................

More information

Descriptive Data Summarization

Descriptive Data Summarization Descriptive Data Summarization Descriptive data summarization gives the general characteristics of the data and identify the presence of noise or outliers, which is useful for successful data cleaning

More information

Characterizing Log-Logistic (L L ) Distributions through Methods of Percentiles and L-Moments

Characterizing Log-Logistic (L L ) Distributions through Methods of Percentiles and L-Moments Applied Mathematical Sciences, Vol. 11, 2017, no. 7, 311-329 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2017.612283 Characterizing Log-Logistic (L L ) Distributions through Methods of Percentiles

More information

STATISTICS SYLLABUS UNIT I

STATISTICS SYLLABUS UNIT I STATISTICS SYLLABUS UNIT I (Probability Theory) Definition Classical and axiomatic approaches.laws of total and compound probability, conditional probability, Bayes Theorem. Random variable and its distribution

More information

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom

Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, Jeremy Orloff and Jonathan Bloom Expectation, Variance and Standard Deviation for Continuous Random Variables Class 6, 8.5 Jeremy Orloff and Jonathan Bloom Learning Goals. Be able to compute and interpret expectation, variance, and standard

More information

Frequency Distribution Cross-Tabulation

Frequency Distribution Cross-Tabulation Frequency Distribution Cross-Tabulation 1) Overview 2) Frequency Distribution 3) Statistics Associated with Frequency Distribution i. Measures of Location ii. Measures of Variability iii. Measures of Shape

More information

TOPIC: Descriptive Statistics Single Variable

TOPIC: Descriptive Statistics Single Variable TOPIC: Descriptive Statistics Single Variable I. Numerical data summary measurements A. Measures of Location. Measures of central tendency Mean; Median; Mode. Quantiles - measures of noncentral tendency

More information

Chapter 4: Continuous Random Variables and Probability Distributions

Chapter 4: Continuous Random Variables and Probability Distributions Chapter 4: and Probability Distributions Walid Sharabati Purdue University February 14, 2014 Professor Sharabati (Purdue University) Spring 2014 (Slide 1 of 37) Chapter Overview Continuous random variables

More information

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Working Paper 2013:9 Department of Statistics Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression Ronnie Pingel Working Paper 2013:9 June

More information

MATH Notebook 5 Fall 2018/2019

MATH Notebook 5 Fall 2018/2019 MATH442601 2 Notebook 5 Fall 2018/2019 prepared by Professor Jenny Baglivo c Copyright 2004-2019 by Jenny A. Baglivo. All Rights Reserved. 5 MATH442601 2 Notebook 5 3 5.1 Sequences of IID Random Variables.............................

More information

Probability Distribution

Probability Distribution Economic Risk and Decision Analysis for Oil and Gas Industry CE81.98 School of Engineering and Technology Asian Institute of Technology January Semester Presented by Dr. Thitisak Boonpramote Department

More information

CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions CS 147: Computer Systems Performance Analysis Summarizing Variability and Determining Distributions 1

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Summary of Chapters 7-9

Summary of Chapters 7-9 Summary of Chapters 7-9 Chapter 7. Interval Estimation 7.2. Confidence Intervals for Difference of Two Means Let X 1,, X n and Y 1, Y 2,, Y m be two independent random samples of sizes n and m from two

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Introduction to Statistics

Introduction to Statistics Introduction to Statistics By A.V. Vedpuriswar October 2, 2016 Introduction The word Statistics is derived from the Italian word stato, which means state. Statista refers to a person involved with the

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Continuous random variables

Continuous random variables Continuous random variables Can take on an uncountably infinite number of values Any value within an interval over which the variable is definied has some probability of occuring This is different from

More information

C A R I B B E A N E X A M I N A T I O N S C O U N C I L REPORT ON CANDIDATES WORK IN THE CARIBBEAN ADVANCED PROFICIENCY EXAMINATION MAY/JUNE 2009

C A R I B B E A N E X A M I N A T I O N S C O U N C I L REPORT ON CANDIDATES WORK IN THE CARIBBEAN ADVANCED PROFICIENCY EXAMINATION MAY/JUNE 2009 C A R I B B E A N E X A M I N A T I O N S C O U N C I L REPORT ON CANDIDATES WORK IN THE CARIBBEAN ADVANCED PROFICIENCY EXAMINATION MAY/JUNE 2009 APPLIED MATHEMATICS Copyright 2009 Caribbean Examinations

More information

STAT T&E COE-Report Reliability Test Planning for Mean Time Between Failures. Best Practice. Authored by: Jennifer Kensler, PhD STAT T&E COE

STAT T&E COE-Report Reliability Test Planning for Mean Time Between Failures. Best Practice. Authored by: Jennifer Kensler, PhD STAT T&E COE Reliability est Planning for Mean ime Between Failures Best Practice Authored by: Jennifer Kensler, PhD SA &E COE March 21, 2014 he goal of the SA &E COE is to assist in developing rigorous, defensible

More information

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015

MFM Practitioner Module: Quantitative Risk Management. John Dodson. September 23, 2015 MFM Practitioner Module: Quantitative Risk Management September 23, 2015 Mixtures Mixtures Mixtures Definitions For our purposes, A random variable is a quantity whose value is not known to us right now

More information

Parametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data

Parametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data Applied Mathematical Sciences, Vol. 2, 2008, no. 9, 449-462 Parametric Probability Densities and Distribution Functions for Tukey g-and-h Transformations and their Use for Fitting Data Todd C. Headrick,

More information

SUMMARIZING MEASURED DATA. Gaia Maselli

SUMMARIZING MEASURED DATA. Gaia Maselli SUMMARIZING MEASURED DATA Gaia Maselli maselli@di.uniroma1.it Computer Network Performance 2 Overview Basic concepts Summarizing measured data Summarizing data by a single number Summarizing variability

More information

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015

Part IA Probability. Definitions. Based on lectures by R. Weber Notes taken by Dexter Chua. Lent 2015 Part IA Probability Definitions Based on lectures by R. Weber Notes taken by Dexter Chua Lent 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

Probability Density Functions

Probability Density Functions Probability Density Functions Probability Density Functions Definition Let X be a continuous rv. Then a probability distribution or probability density function (pdf) of X is a function f (x) such that

More information

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba

B.N.Bandodkar College of Science, Thane. Random-Number Generation. Mrs M.J.Gholba B.N.Bandodkar College of Science, Thane Random-Number Generation Mrs M.J.Gholba Properties of Random Numbers A sequence of random numbers, R, R,., must have two important statistical properties, uniformity

More information

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions

Introduction to Statistical Data Analysis Lecture 3: Probability Distributions Introduction to Statistical Data Analysis Lecture 3: Probability Distributions James V. Lambers Department of Mathematics The University of Southern Mississippi James V. Lambers Statistical Data Analysis

More information

Extreme Value Theory.

Extreme Value Theory. Bank of England Centre for Central Banking Studies CEMLA 2013 Extreme Value Theory. David G. Barr November 21, 2013 Any views expressed are those of the author and not necessarily those of the Bank of

More information

Foundations of Algebra/Algebra/Math I Curriculum Map

Foundations of Algebra/Algebra/Math I Curriculum Map *Standards N-Q.1, N-Q.2, N-Q.3 are not listed. These standards represent number sense and should be integrated throughout the units. *For each specific unit, learning targets are coded as F for Foundations

More information

Continuous random variables

Continuous random variables Continuous random variables Continuous r.v. s take an uncountably infinite number of possible values. Examples: Heights of people Weights of apples Diameters of bolts Life lengths of light-bulbs We cannot

More information

Random variables. DS GA 1002 Probability and Statistics for Data Science.

Random variables. DS GA 1002 Probability and Statistics for Data Science. Random variables DS GA 1002 Probability and Statistics for Data Science http://www.cims.nyu.edu/~cfgranda/pages/dsga1002_fall17 Carlos Fernandez-Granda Motivation Random variables model numerical quantities

More information

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations

Chapter 5. Statistical Models in Simulations 5.1. Prof. Dr. Mesut Güneş Ch. 5 Statistical Models in Simulations Chapter 5 Statistical Models in Simulations 5.1 Contents Basic Probability Theory Concepts Discrete Distributions Continuous Distributions Poisson Process Empirical Distributions Useful Statistical Models

More information

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables

Probability Distributions for Continuous Variables. Probability Distributions for Continuous Variables Probability Distributions for Continuous Variables Probability Distributions for Continuous Variables Let X = lake depth at a randomly chosen point on lake surface If we draw the histogram so that the

More information

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution

CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS. 6.2 Normal Distribution. 6.1 Continuous Uniform Distribution CHAPTER 6 SOME CONTINUOUS PROBABILITY DISTRIBUTIONS Recall that a continuous random variable X is a random variable that takes all values in an interval or a set of intervals. The distribution of a continuous

More information

Northwestern University Department of Electrical Engineering and Computer Science

Northwestern University Department of Electrical Engineering and Computer Science Northwestern University Department of Electrical Engineering and Computer Science EECS 454: Modeling and Analysis of Communication Networks Spring 2008 Probability Review As discussed in Lecture 1, probability

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing

Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing Statistical Inference: Estimation and Confidence Intervals Hypothesis Testing 1 In most statistics problems, we assume that the data have been generated from some unknown probability distribution. We desire

More information

Chapte The McGraw-Hill Companies, Inc. All rights reserved.

Chapte The McGraw-Hill Companies, Inc. All rights reserved. er15 Chapte Chi-Square Tests d Chi-Square Tests for -Fit Uniform Goodness- Poisson Goodness- Goodness- ECDF Tests (Optional) Contingency Tables A contingency table is a cross-tabulation of n paired observations

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities

PCMI Introduction to Random Matrix Theory Handout # REVIEW OF PROBABILITY THEORY. Chapter 1 - Events and Their Probabilities PCMI 207 - Introduction to Random Matrix Theory Handout #2 06.27.207 REVIEW OF PROBABILITY THEORY Chapter - Events and Their Probabilities.. Events as Sets Definition (σ-field). A collection F of subsets

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses

Space Telescope Science Institute statistics mini-course. October Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses Space Telescope Science Institute statistics mini-course October 2011 Inference I: Estimation, Confidence Intervals, and Tests of Hypotheses James L Rosenberger Acknowledgements: Donald Richards, William

More information

Statistics 3657 : Moment Approximations

Statistics 3657 : Moment Approximations Statistics 3657 : Moment Approximations Preliminaries Suppose that we have a r.v. and that we wish to calculate the expectation of g) for some function g. Of course we could calculate it as Eg)) by the

More information

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018

15-388/688 - Practical Data Science: Basic probability. J. Zico Kolter Carnegie Mellon University Spring 2018 15-388/688 - Practical Data Science: Basic probability J. Zico Kolter Carnegie Mellon University Spring 2018 1 Announcements Logistics of next few lectures Final project released, proposals/groups due

More information

2 Functions of random variables

2 Functions of random variables 2 Functions of random variables A basic statistical model for sample data is a collection of random variables X 1,..., X n. The data are summarised in terms of certain sample statistics, calculated as

More information

Probability. Table of contents

Probability. Table of contents Probability Table of contents 1. Important definitions 2. Distributions 3. Discrete distributions 4. Continuous distributions 5. The Normal distribution 6. Multivariate random variables 7. Other continuous

More information

3 Continuous Random Variables

3 Continuous Random Variables Jinguo Lian Math437 Notes January 15, 016 3 Continuous Random Variables Remember that discrete random variables can take only a countable number of possible values. On the other hand, a continuous random

More information

Review of Statistics

Review of Statistics Review of Statistics Topics Descriptive Statistics Mean, Variance Probability Union event, joint event Random Variables Discrete and Continuous Distributions, Moments Two Random Variables Covariance and

More information

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4.

Continuous Random Variables. and Probability Distributions. Continuous Random Variables and Probability Distributions ( ) ( ) Chapter 4 4. UCLA STAT 11 A Applied Probability & Statistics for Engineers Instructor: Ivo Dinov, Asst. Prof. In Statistics and Neurology Teaching Assistant: Christopher Barr University of California, Los Angeles,

More information

An Alternative Method for Estimating and Simulating Maximum Entropy Densities

An Alternative Method for Estimating and Simulating Maximum Entropy Densities An Alternative Method for Estimating and Simulating Maximum Entropy Densities Jae-Young Kim and Joonhwan Lee Seoul National University May, 8 Abstract This paper proposes a method of estimating and simulating

More information

Practical Solutions to Behrens-Fisher Problem: Bootstrapping, Permutation, Dudewicz-Ahmed Method

Practical Solutions to Behrens-Fisher Problem: Bootstrapping, Permutation, Dudewicz-Ahmed Method Practical Solutions to Behrens-Fisher Problem: Bootstrapping, Permutation, Dudewicz-Ahmed Method MAT653 Final Project Yanjun Yan Syracuse University Nov. 22, 2005 Outline Outline 1 Introduction 2 Problem

More information

Statistics for scientists and engineers

Statistics for scientists and engineers Statistics for scientists and engineers February 0, 006 Contents Introduction. Motivation - why study statistics?................................... Examples..................................................3

More information

Chapter 3.3 Continuous distributions

Chapter 3.3 Continuous distributions Chapter 3.3 Continuous distributions In this section we study several continuous distributions and their properties. Here are a few, classified by their support S X. There are of course many, many more.

More information

Design and Implementation of CUSUM Exceedance Control Charts for Unknown Location

Design and Implementation of CUSUM Exceedance Control Charts for Unknown Location Design and Implementation of CUSUM Exceedance Control Charts for Unknown Location MARIEN A. GRAHAM Department of Statistics University of Pretoria South Africa marien.graham@up.ac.za S. CHAKRABORTI Department

More information

Probability and Distributions

Probability and Distributions Probability and Distributions What is a statistical model? A statistical model is a set of assumptions by which the hypothetical population distribution of data is inferred. It is typically postulated

More information

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems

MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems MA/ST 810 Mathematical-Statistical Modeling and Analysis of Complex Systems Review of Basic Probability The fundamentals, random variables, probability distributions Probability mass/density functions

More information

SAZ3C NUMERICAL AND STATISTICAL METHODS Unit : I -V

SAZ3C NUMERICAL AND STATISTICAL METHODS Unit : I -V SAZ3C NUMERICAL AND STATISTICAL METHODS Unit : I -V UNIT-I Introduction Mathematical Preliminaries Errors: Computations, Formula Errors in a Series Approximation Roots of Equations Linear Equations Bisection

More information

Introduction & Random Variables. John Dodson. September 3, 2008

Introduction & Random Variables. John Dodson. September 3, 2008 & September 3, 2008 Statistics Statistics Statistics Course Fall sequence modules portfolio optimization Bill Barr fixed-income markets Chris Bemis calibration & simulation introductions flashcards how

More information

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com

Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com 1 School of Oriental and African Studies September 2015 Department of Economics Preliminary Statistics Lecture 3: Probability Models and Distributions (Outline) prelimsoas.webs.com Gujarati D. Basic Econometrics,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Introduction to Probability

Introduction to Probability LECTURE NOTES Course 6.041-6.431 M.I.T. FALL 2000 Introduction to Probability Dimitri P. Bertsekas and John N. Tsitsiklis Professors of Electrical Engineering and Computer Science Massachusetts Institute

More information

Chapter 1. Probability, Random Variables and Expectations. 1.1 Axiomatic Probability

Chapter 1. Probability, Random Variables and Expectations. 1.1 Axiomatic Probability Chapter 1 Probability, Random Variables and Expectations Note: The primary reference for these notes is Mittelhammer (1999. Other treatments of probability theory include Gallant (1997, Casella & Berger

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

This does not cover everything on the final. Look at the posted practice problems for other topics.

This does not cover everything on the final. Look at the posted practice problems for other topics. Class 7: Review Problems for Final Exam 8.5 Spring 7 This does not cover everything on the final. Look at the posted practice problems for other topics. To save time in class: set up, but do not carry

More information

STAT 830 Hypothesis Testing

STAT 830 Hypothesis Testing STAT 830 Hypothesis Testing Richard Lockhart Simon Fraser University STAT 830 Fall 2018 Richard Lockhart (Simon Fraser University) STAT 830 Hypothesis Testing STAT 830 Fall 2018 1 / 30 Purposes of These

More information

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu

Course: ESO-209 Home Work: 1 Instructor: Debasis Kundu Home Work: 1 1. Describe the sample space when a coin is tossed (a) once, (b) three times, (c) n times, (d) an infinite number of times. 2. A coin is tossed until for the first time the same result appear

More information

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics

Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics Wooldridge, Introductory Econometrics, 4th ed. Appendix C: Fundamentals of mathematical statistics A short review of the principles of mathematical statistics (or, what you should have learned in EC 151).

More information

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models

System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models System Simulation Part II: Mathematical and Statistical Models Chapter 5: Statistical Models Fatih Cavdur fatihcavdur@uludag.edu.tr March 20, 2012 Introduction Introduction The world of the model-builder

More information

2008 Winton. Statistical Testing of RNGs

2008 Winton. Statistical Testing of RNGs 1 Statistical Testing of RNGs Criteria for Randomness For a sequence of numbers to be considered a sequence of randomly acquired numbers, it must have two basic statistical properties: Uniformly distributed

More information

Review of Statistics 101

Review of Statistics 101 Review of Statistics 101 We review some important themes from the course 1. Introduction Statistics- Set of methods for collecting/analyzing data (the art and science of learning from data). Provides methods

More information

Basic Statistical Analysis

Basic Statistical Analysis indexerrt.qxd 8/21/2002 9:47 AM Page 1 Corrected index pages for Sprinthall Basic Statistical Analysis Seventh Edition indexerrt.qxd 8/21/2002 9:47 AM Page 656 Index Abscissa, 24 AB-STAT, vii ADD-OR rule,

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Week 12-13: Discrete Probability

Week 12-13: Discrete Probability Week 12-13: Discrete Probability November 21, 2018 1 Probability Space There are many problems about chances or possibilities, called probability in mathematics. When we roll two dice there are possible

More information

Joint Probability Distributions and Random Samples (Devore Chapter Five)

Joint Probability Distributions and Random Samples (Devore Chapter Five) Joint Probability Distributions and Random Samples (Devore Chapter Five) 1016-345-01: Probability and Statistics for Engineers Spring 2013 Contents 1 Joint Probability Distributions 2 1.1 Two Discrete

More information

BTRY 4090: Spring 2009 Theory of Statistics

BTRY 4090: Spring 2009 Theory of Statistics BTRY 4090: Spring 2009 Theory of Statistics Guozhang Wang September 25, 2010 1 Review of Probability We begin with a real example of using probability to solve computationally intensive (or infeasible)

More information

Chi-square goodness-of-fit test for vague data

Chi-square goodness-of-fit test for vague data Chi-square goodness-of-fit test for vague data Przemys law Grzegorzewski Systems Research Institute Polish Academy of Sciences Newelska 6, 01-447 Warsaw, Poland and Faculty of Math. and Inform. Sci., Warsaw

More information

Asymptotic distribution of the sample average value-at-risk

Asymptotic distribution of the sample average value-at-risk Asymptotic distribution of the sample average value-at-risk Stoyan V. Stoyanov Svetlozar T. Rachev September 3, 7 Abstract In this paper, we prove a result for the asymptotic distribution of the sample

More information

ORDER STATISTICS, QUANTILES, AND SAMPLE QUANTILES

ORDER STATISTICS, QUANTILES, AND SAMPLE QUANTILES ORDER STATISTICS, QUANTILES, AND SAMPLE QUANTILES 1. Order statistics Let X 1,...,X n be n real-valued observations. One can always arrangetheminordertogettheorder statisticsx (1) X (2) X (n). SinceX (k)

More information

Curriculum Map for Mathematics SL (DP1)

Curriculum Map for Mathematics SL (DP1) Unit Title (Time frame) Topic 1 Algebra (8 teaching hours or 2 weeks) Curriculum Map for Mathematics SL (DP1) Standards IB Objectives Knowledge/Content Skills Assessments Key resources Aero_Std_1: Make

More information

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring STAT 6385 Survey of Nonparametric Statistics Order Statistics, EDF and Censoring Quantile Function A quantile (or a percentile) of a distribution is that value of X such that a specific percentage of the

More information

Section 4.6 Simple Linear Regression

Section 4.6 Simple Linear Regression Section 4.6 Simple Linear Regression Objectives ˆ Basic philosophy of SLR and the regression assumptions ˆ Point & interval estimation of the model parameters, and how to make predictions ˆ Point and interval

More information

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation

Chapter Four. Numerical Descriptive Techniques. Range, Standard Deviation, Variance, Coefficient of Variation Chapter Four Numerical Descriptive Techniques 4.1 Numerical Descriptive Techniques Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient

More information

AP Statistics Cumulative AP Exam Study Guide

AP Statistics Cumulative AP Exam Study Guide AP Statistics Cumulative AP Eam Study Guide Chapters & 3 - Graphs Statistics the science of collecting, analyzing, and drawing conclusions from data. Descriptive methods of organizing and summarizing statistics

More information

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation

Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation Univariate Normal Distribution; GLM with the Univariate Normal; Least Squares Estimation PRE 905: Multivariate Analysis Spring 2014 Lecture 4 Today s Class The building blocks: The basics of mathematical

More information