A Study on Outliers in Multi-response Experiments. Sankalpa Ojha. Master of Science AGRICULTURAL STATISTICS

Size: px
Start display at page:

Download "A Study on Outliers in Multi-response Experiments. Sankalpa Ojha. Master of Science AGRICULTURAL STATISTICS"

Transcription

1 cgq&vuqfø;k ijh{k.kksa esa vkmvyk;j ij,d v/;;u A Study on Outliers in Multi-response Experiments by Sankalpa Ojha Thesis submitted to the Faculty of Post-Graduate School, Indian Agricultural Research Institute, New Delhi, in partial fulfillment of the requirements for the degree of Master of Science in AGRICULTURAL STATISTICS Indian Agricultural Statistics Research Institute Indian Agricultural Research Institute New Delhi

2 cgq&vuqfø;k ijh{k.kksa esa vkmvyk;j ij,d v/;;u A Study on Outliers in Multi-response Experiments by Sankalpa Ojha Thesis submitted to the Faculty of Post-Graduate School, Indian Agricultural Research Institute, New Delhi, in partial fulfillment of the requirements for the degree of Master of Science in AGRICULTURAL STATISTICS Indian Agricultural Statistics Research Institute Indian Agricultural Research Institute New Delhi Approved By: Chairman: Members: Dr. L. M. Bhar Dr. V. K. Gupta Dr. Rajender Parsad Mr. S. B. Lal

3 This thesis is dedicated to my revered parents Gopal Ojha and Jyoti Ojha

4 Dr. L. M. Bhar Senior Scientist I. A. S. R. I., Library Avenue, Pusa, New Delhi-0 0 CERTIFICATE This is to certify that the thesis A Study on Outliers in Multi-response Experiments submitted in the partial fulfillment of the requirement for the MASTER OF SCIENCE in AGRICULTURAL STATISTICS of the Faculty of Post-Graduate School, Indian Agricultural Research Institute, New Delhi, is a record of bonafide research work carried out by SANKALPA OJHA under my guidance and supervision. No part of the thesis has been submitted for any other degree or diploma. All assistance and help received during the course of this investigation has been duly acknowledged by him. August 03, 009 New Delhi- (L.M. Bhar) Chairman Advisory Committee

5 Acknowledgements With reverence I express my deep sense of gratitude to Dr. Lalmohon Bhar, Senior Scientist, Indian Agricultural Statistics Research Institute, New Delhi and Chairman of my Advisory Committee for his initiative, benevolence, endurance, constructive criticism and constant monitoring during the period of my investigation and also in the preparation of this thesis. I consider myself fortunate in having the privilege of being guided by him. I am equally indebted to Dr. V. K. Gupta, National Professor (ICAR), IASRI, New Delhi and Co-chairman of my Advisory Committee for his counsel, valuable suggestions and ready to help for accomplishment my research work. I am extremely thankful to Dr. Rajender Parsad, National Fellow (ICAR) and Head, Division of Design of Experiments, IASRI, New Delhi and member of my Advisory Committee for his useful suggestions during my investigation. I am also highly grateful to Mr. Sashi Bhusan Lal, Scientist, Division of Computer Application, IASRI for his great help during my thesis work. I am also indebted to Dr. V.K. Bhatia, Professor of Agricultural Statistics and Director, IASRI for his constant help and encouragement throughout the investigation. I avail this opportunity to express my profound gratitude to him for providing the necessary research facilities for completion of this work. I want to express my respect and gratitude to Baba, Ma, Didi, Dada, Jija, Boudi for their sacrifices they did for me. These were the inspirations and moral boosting which gave me sufficient energy to complete this thesis in time. Mr. Biman Bhttacharya and Mrs. Bhabani, chotda Sudeb, did a lot of sacrifices to educate me to this level and to make me what am I today. My affectionate thanks are always due to them for their love, affection, support and constant ushering of

6 blessings on me. I am thankful to my niece and nephew specially Bikshan, Pritam, Biswayan, Bristi and Drirata. I do not have any words to express the help and support given by my seniors specially Madanda, Kaustavda, Ranjitda, Meherulda, Bisalda, Divyenduda, Sandipda, Eldhosir, Dwijeshsir, Nitisir, Sanmitsir, Sukantsir,all my juniors specially Sandipan, Rupam, Hiranmay, sandip, Debasis, Arijit, Monoranjan, Mrityunjay and Monojit. I am thankful to my classmate Arpan, Ankur, Pravin and Mohansir for their friendly approach and moral support throughout my entire M.Sc. tenure. I am also thankful to all my seniors, juniors and friends in IARI, especially in Sukhatme Hostel for their moral support and inspirations. I take this opportunity to appreciate the help rendered by the staff of TAC, IASRI and staff of CAS Lab. Special thanks are also due to Sondhi ji, Sanjeev ji, Thakur ji and Raj Madam. My earnest thanks are due to Joshi sir, Behera sir, Amit sir, Basuda and other staff of National Fellow Research Unit for their help and constant encouragement from time to time to prepare this manuscript. I extend my thanks to the Dean and Director, IARI, New Delhi and staff of PG school for their helpful attitude and cooperation, throughout the period of study. Last but not least, I am thankful to ICAR for the financial assistance provided to me in the form of Junior Research Fellowship during the tenure of my study. August 03, 009 IASRI, New Delhi-00 (Sankalpa Ojha)

7 C O N T E N T S Introduction -3. Introduction. Outliers.3 Outliers in Linear Models 6.4 The General Setup for Multi-response Experiment 7.5 Outliers in Multi-response Experiments.6 Motivation and Objectives Scope of the Thesis Review of Literature 4-6. Introduction 4. Outliers 4.3 Outliers in Multi-variate Regression Models 0.4 Outliers in Multi-response Experiments 4 3 Cook-statistic for Multiple Outlier Vectors Introduction 7 3. Cook-statistic for Multi-variate Regression Models Cook-statistic for the Detection of Outlier Observation Vectors in Multi-Response Experiments Illustration Discussion 56 4 Cook-statistic for Subsets of Outlier Vectors Introduction Development of Cook-statistic Some Particular Cases Example Discussion 63 Summary Abstract References Appendix I Appendix II 78-80

8 Chapter Introduction. Introduction Statistical models, like all mathematical models for real phenomena, are chosen as tools to describe certain aspects of the observed phenomena. The scientists in National Agricultural Research System (NARS) conduct a large number of experiments for their research and consequently generate a huge amount of information in the form of data collected through experimentation. This information is converted into knowledge by statistically processing the data using sophisticated statistical tools. This knowledge helps in identifying most promising agricultural technologies for making recommendations to the farmers. Hence, designing of experiments and analysis of experimental data form an integrated component for improving the quality of agricultural research. A model is, at best, correct only in an approximate sense. Besides the appropriateness of the postulated model, standard statistical techniques are based on the assumption that the data items have been sampled independently from the same distribution. However, contravention of the idealized iid situation is probably more common than not in reality. Statistical theories and tools fall short for the analysis of data generated from various unknown distribution. These data points (generally called outlying observations or outliers) definitely cause a concern to the data analyst. Presence of outlier(s) is often an indication of weakness in the model, the data or both. A vast literature on this subject indicates it s great importance in statistics. Examination of outlier(s) allows a more appropriate model to be formulated, or it enables us to asses any danger that may arise from basing inferences on the normality assumption. The data from designed experiments are generally analyzed without giving any importance to outlying observations. Outlier in block designs for uni-response experiments has been studied considerably. In many experimental situations, data on more than one response variable are recorded from the same experimental unit through application of same treatment. Such experiments are known as multi-response experiments. For example, in a varietal trial, data on several plant

9 Introduction characteristics such as plant height, plant population, days to maturity, number of primary branches per plant, thousand seed weight, yield and disease resistant characteristics, etc. are observed. The presence of outlier(s) in the data generated from multi-response experiments may cause departures from the assumptions of parameter estimation and testing of hypothesis concerning the parameters. Therefore, before analysis of multi-response data, detection of outlier(s) is required. In Section., a brief description on outliers is given. The problems of outliers in linear models are introduced in Section.3. The general setup for multi-response experiment is introduced in Section.4. In the Section.5, the occurrence of outliers in multi-response experiments has been described. Motivation and objective of the present study are discussed in Section.6. The Chapter is concluded with the scope of the present investigation in Section.7.. Outliers From the time when human beings started exploiting and employing the information in the collected data as an aid to understand the world they live in, there has been a concern over the unrepresentative or outlying observations in the data set. Outlier(s) in a set of data is (are) defined to be an observation (or sub-set of observations) that appears to be inconsistent with the remainder set of data. Occurrence of outlier(s) is (are) very common in every field involving data collection and outlier(s) arises from heavy tailed distributions or is simply bad data point due to error. When outliers are present in the data, the result from the analysis of such data may lead to erroneous inference. To be clearer, consider the following example. Example.: An experiment with seven chemical treatments was conducted in a randomized complete block (RCB) design with three replications at the Regional Agricultural Research Station, Nandyal, Andhra Pradesh with a view to evaluate the effect of Mepiquaet Chloride on the yield of mustard crop [net plot size: 5.60m.40m]. The table below shows the data on yield in kilogram per plot for different treatments:

10 Introduction 3 Table.: Yield of mustard in kg/plot Replications Treatments Analysis of variance is presented in Table.. It is observed that the treatment effects are not significant at 5% level of significance. Table.: ANOVA (With original data) Source DF SS MS F- Value Significance Level Treatment Replication Error Total Note: Here DF means degree of freedom, SS means Sum of Squares, MS means mean square, F-value means calculated F-value and significance level means the probability at which the null hypothesis is rejected. It was then followed by residual analysis of this data. Standardized residuals are presented in Table.3.

11 Introduction 4 Table.3: Standardized Residuals Obser- Replica Treat- Std. Residual Obser- Replica- Treat- Std. Residual vation -tion ment vation tion ment No. No It is observed from the table that the observation at serial number 8 and 0 stand out because of their high value of standardized residuals. These two observations seem to be influential. We carry out the analysis again after removing these two observations. The results of this analysis are presented in the Table.4. The dramatic effect of removing these two observations is worth noticing. The treatment effects now become significant at 5% level of significance. Removal of any other observation or pair of observations does not affect the analysis. These two observations, therefore, definitely are influential. Table.4: ANOVA (After removing two observations) Source of variation DF SS MS F- Value Significance Level Treatment Replication Error Total

12 Introduction 5 The present example clearly shows how the presence of outliers affects the inference drawn... What is an Outlier? No observation can be guaranteed to be a totally dependable manifestation of the phenomena under study. The probable reliability of an observation is reflected by its relationship to other observations that were obtained under similar conditions. Observations that in the opinion of the investigator stand apart from the bulk of the data have been called outliers, extreme observations discordant observations, rouge values, contaminants, surprising values, mavericks or dirty data. An outlier is one that appears to deviate markedly from the other members of the sample in which it occurs. Daniel (960) defines an outlier as an observation whose value is not in the pattern of values produced by the rest of the data. A more comprehensive definition is due to Beckman and Cook (983). They defined the following: Discordant observation: Any observation that appears surprising or discrepant to the investigator. Contaminant: Any observation that is not a realization from the target distribution. Outlier: A collective to refer to either a contaminant or discordant observation. Influential cases: An outlier need not be influential in the sense that the result of an analysis may remain essentially unchanged when an outlying observation is removed. It is useful to regard an influential observation as a special type of outlier... Approaches of Studying Outliers Following Barnett and Lewis (984) approaches to outliers are divided into two broad categories. (i) To identify the outlier(s) for further study. This forms the detection part. (ii) To accommodate the possibility of outlier(s) by suitable modifications of the models and or method of analysis. The robust methods of estimation or analysis, which were created to modify least squares procedure so that the outliers do not have much influence, fall under this category. For detailed discussion on robust methods of analysis in presence of outliers, one may refer to books by Huber (98) and Tiku et al. (967).

13 Introduction 6 Sometimes outlying data points may be legitimately occurring extreme observations. Such data often contain valuable information that improves estimation efficiency by its presence. Even in this beneficial situation, it is constructive to isolate extreme points and to determine the extent to which the parameter estimates depend on these desirable data. Consider the following example: Example. (Beckman and Cook, 983) Russians, once inadvertently dropped one of their satellites in central Canada. The joint USA and Canadian effort to locate the debris involved monitoring changes in the level of radiation while flying over probable locations. The levels of radiation that were high relative to the background were taken as indicators of an alternative phenomenon, satellite debris. The basic statistical methodology consisted of applying outlier detection schemes on a large scale. There was no interest in estimating the average level of background radiation, except to provide a baseline for judging individuals as outlier. The aforementioned examples clearly show that the study of outliers is very important from different angles of interest. A vast literature on this subject indicates it great importance in statistics..3 Outliers in Linear Models Based on two different concepts of outliers as proposed by Dixon (950), outliers in linear models can be dealt with in the following two ways: Mean shift model: The studentized residuals are often used to identify outliers in analysis based on linear models. The observation with the largest absolute studentized residual is usually given special attention and is taken as the observation most likely to be contaminant. The justification behind this procedure is that the basic normal theory model is valid except that the expectation of at most one unknown response is shifted. That is, in the presence of an outlier, say the i th observation to be an outlier, the mean of the i th observation will be

14 Introduction 7 shifted from µ i to µ i +c, where c is some non-zero quantity. In effect, a single parameter is devoted to a single observation. This model has dominated in the literature on linear models. Variance-inflation model: In contrast to the mean-shift model, the variance-inflation model for a single outlier is based on the assumption that the basic model is valid except that the variance of an unknown response is larger than the rest. Book by Cook and Weisberg (98) deals with an excellent study on influential observations in mean-shift model. However, the analysis of the variance-inflation model tends to be more complicated, but Cook and Weisberg (980) show that the observation most likely to have an inflated variance not corresponds to the observation with the largest ordinary residual. In other words, an observation which is an outlier in mean-shift model need not be so in variance-inflation model. However, if the observation, having an inflated variance corresponds to the largest studentized residual, then the mean-shift model and varianceinflation model agree with respect to the most outlying observation..4 The General Setup (Gauss-Markoff) for Multi-response Experiments Rao (973) explained the Gauss-Markoff setup for multivariate model. The least squares theory under Gauss-Markoff setup is applicable to situations where a single measurement is taken on an experimental unit. In general, let p response variables be observed on each of n independent experimental units. Without giving any consideration to treatments and/ or blocks, a brief outline of data collected is given as: Experimental Units Response variables () () (p) y y y p y y y p n y n y n y np

15 Introduction 8 Let y s denote s th column vector representing n observations pertaining to s th response. Then consider that the p response variables are modeled as E(y s ) = X s, s =,,, p and Cov(y s,y s`) = σ ss` I, s, s =,,, p. The above model is Gauss-Markoff (y s, X s, σs s I) for each y s with specific unknown vector parameter s for each s and n m design matrix X is same for all s. σ ss denotes the variance of the s th character and σ s s is the covariance between s th and s th characters. Obtain the np-component vector variable y by writing the column vectors y s (s =,,, p ) one below the other. y y y =. y p Also let be the mp-component vector defined as =. p One can easily see that E ( y) = ( I X) ( σ ) ( y) = Σ I Σ = ( ) D ;, pp pp ss where denotes the Kronecker product of matrices, the matrix Σ I is the covariance pp n matrix of the np-component vector variable y ; for s = s, σ ss denotes the variance of s th response variable and σ ss is the covariance between s th and s th response variables s s =,,..., p. We can write the above multivariate linear model as the triplet ( y ( I X) Σ I ),,,

16 Introduction 9 which can be elaborated as y X 0 0 ε y = 0 X 0 ε +, y p 0 0 X p εp where D ( ) σin σin σpin σ σ σ I I I σ pin σ pin σ ppin n n p n = pp n = y Σ I is of the form similar to univariate Gauss-Markoff model. One can then apply the methods of estimation in linear models developed for single response variable..5 Outliers in Multi-response Experiments Study of outliers in block designs for uni-response experiments has been considered by Bhar (997) and Bhar and Gupta (00). They developed some statistics for detecting outliers in the experimental data for the block designs using the Cook s distance for the problem of inferring on the complete set of linearly independent orthonormalized treatment contrasts. Most of the literature available for detection of outlier(s) in the experimental data is for single response situations. Not much work has been done for detection/ handling of outlier(s) in multi-response experiments. Barrett and Ling (99) made an attempt to generalize some of the univariate measures of influence to the multivariate regression setting and then showed that the generalized measures are special cases of two general classes of influence measures. Rocke and Woodruff (996) have given new insights into the problem of identification of outliers in multivariate data. Gnanadesikan and Lee (970), Gnanadesikan and Kettenting (97) and Kang and Bates (990) investigated the problems of outlier(s) in multi-response data mostly in regression analysis situations.

17 Introduction 0 In the data generated from designed experiments, the problem of outlier(s) has been studied mainly for single response situations. Very little seems to have been done on detection and handling of outlier(s) in multi-response experiments. A test statistic for detection of single outlier observation vector has been developed under a mean-shift model on the line of Cookstatistic for single response experiments by Nandi (007). Nandi (007) considered Cook-statistic for detecting single outlier vector in multi-response experiments. However, there are many other situations where more than one outlier observation vectors may be present or any arbitrary number of outliers may occur. Keeping in view these possibilities, the present thesis aims at developing some statistics for detecting outliers in multi-response experiments in various situations..6 Motivation and Objectives It can t be guaranteed that an observation(s) is (are) totally dependent manifestation of the phenomenon which is being considered for study. From the very beginning when the people started exploiting and employing the information in the collected data as a tool to understand the world he lives in, there has been concern over the unrepresentative or outlying observation (or subset of observations) that appears to be inconsistent with the remaining observations in the aforementioned data set. Occurrence of outlier(s) is (are) very common in all the fields where collection of data is involved. Generally outlier(s) arises (arise) from heavy tailed distributions or is (are) simply bad data point(s) due to error. When outliers are present in the data, the whole set up of the data is disturbed. As a result analysis of the experimental data may be erroneous. The analysis of data from designed experiments is valid only when the assumptions like normality and homogeneity of error variances and additivity of the experimental effects holds. The departures from these assumptions may take place in presence of disturbances like outlier(s). The data from designed experiments is generally analyzed without giving any importance to these assumptions considering the ideal set up.

18 Introduction When outliers are present in the data, the whole set up of the experiment is disturbed and may give rise to misleading inferences. Therefore, it is important to detect and handle the outlier(s) efficiently. Outlier(s) in multi-response experiments is (are) likely to appear. If an experimental plot is heavily infested with pests, disease and/or weeds, then all the responses observed from that plot may be outlier(s). Outliers may also be due to heavy irrigation by mistake on some experimental plot(s) or mistakes in recording/ transcription of observations etc. In multi-response experiments, for taking the advantage of correlation structure among the response variables, multivariate analysis of variance (MANOVA) of data is performed for testing the equality of treatment effects. The inference(s) drawn from MANOVA may be misleading if outlier(s) are present in the data. Therefore, before analysis of multi-response data, detection of outlier(s) is required. Test statistic, available in literature for detecting outlier(s) in multivariate regression cannot directly be applied to the multi-response experimental settings because of the following reasons: i) Design matrix of multi-response experiments is not of full column rank as in multivariate regression. ii) In multi-response experiments, interest is in a sub set of parameters (linear function of treatment effects) rather than whole vector of parameters. Very little work seems to have been done on detection of outlier(s) in data from multiresponse experiments. Nandi (007) had attempted to develop a test statistic for detection of an outlier observation vector from multi response. This vector is obtained by taking a particular observation from each response vector. But in practice, this may not be the only situation of occurrence of outliers. There may be situation where outliers are present in more than one such observation vectors and in each observation vector outliers may not be present from all the responses. Also the response(s) which contributes (contribute) outlier(s) in an outlier observation vector may not be contributing outlier(s) in another outlier observation vector i.e. in an outlier observation vector, outlier observations of the responses may be different from the responses with outliers in another observation vector. Therefore, in the present investigation an attempt has been made to develop a test statistic for detection of outliers from multi-response data generated through block design.

19 Introduction In view of above discussion the following objectives are formulated for the present investigation: (i) To develop test statistics for the detection of outliers in block designs for multiresponse experiments. (ii) To apply the above statistics in the data obtained from multi-response experiments..7 Scope of the Present Investigation A vast literature on outliers clearly indicates its importance in statistics. Primarily this concept was developed for a univariate sample but it started attracting people working in other fields of statistics, soon after Dixon (950) proposed a clear cut concept of outliers. Srikantan (96) and Ferguson (96) explicitly defined the mean shift outlier model using Dixon s concept, which later on become a strong basis for studying outliers in regression models. Out of two broad methods dealing with outliers, identification is most important, because diagnostics based on identification helps to do critical assessment at various stages of development. In the present thesis this method of dealing outliers in multi-response experiments has been considered. Cook-statistic has been developed for detecting outlier vectors. Nandi (007) considered only one outlier vector, i.e., a particular observation from each response makes this vector. In the present study we have considered more than two such outlier vectors. In general Cook-statistic has been developed for any t number of outlier vectors. In another situation where all observations from all responses may not be outliers has also been considered. Thus we have incomplete vectors of outliers. For developing Cookstatistic mean-shift outlier model is considered. In chapter, a brief introduction to the problem emphasizing on the problem of outliers is presented. Problem of outliers has been illustrated with some examples. Situation of multiresponse experiments has also been explained.

20 Introduction 3 In Chapter, a review of related work in the field of outliers in general and in the field of multi-response experiments has been presented. The analogous field of multi-response experiments is multi-variate regression model. Some procedures for detecting outliers in multi-variate regression model are also available in the literature. These procedures are also reviewed in this Chapter. Chapter 3 and Chapter 4 deal with the main results of the present study. In Chapter 3, Cookstatistic has been developed for more than one outlier vectors. A general expression is first obtained. Then some particular cases are discussed. Developed statistic has been illustrated with real experimental data. In Chapter 4, Cook-statistic is derived for the situation where not all responses in a particular observation vector are outliers. Here also some particular cases are discussed and an example is presented. For applying the developed Cook-statistic programs have been written in SAS/IML. The codes of these programs are provided in the Appendices. Finally the thesis is concluded with summary and a list of references.

21 Chapter Review of Literature. Introduction In the present chapter, the work on outliers in general and outliers in multi-response experiments in particular have been reviewed. The literature on the study of outliers is very vast and much in common with almost every area likes robust estimation, data analysis, each of which is important in its own right. For an excellent review a reference may be made to Beckman and Cook (983) and Hadi and Simonoff (993). For a useful survey of literature, the books by Atkinson and Riani (000), Barnett and Lewis (994) and Rousseeuw and Leroy (987) may be referred. In Section., we surveyed the work done on outliers in general and in particular to designed experiments. Some work in the field of outliers in multivariate regression model is available in the literature. This is reviewed in Section.3. In Section.4, work related to multi-response experiments has been presented.. Outliers.. Outliers in Linear Regression Models The literature on the study of outlier(s) is very vast and much in common with almost every area like robust estimation, data analysis, each of which is important in its own right. Presence of outlier in the data is the most serious illness to the linear statistical model and this is the fact behind the creation of work in the area of robust estimation and outlier testing. The existence of the problem of doubtful values or outliers has been recognized for a very long time. Historically objective methods for dealing with outlier were employed only after the outliers were identified through the normal inspection of the data. Chauvenet (863) developed a test for a single doubtful observation. An observation is to be rejected if it lies outside the lower and upper (/4n) points of the null distribution. The most popular models for the study of outlier perhaps are the models proposed by Dixon (950) that are known as Dixon s location shift and scale shift model. Dixon s model for a single outlier can be described as: a location shift model is one in which out of n observations

22 Review of Literature 5 (n-) observations come from N (µ, σ ) except the one which comes from N (µ+λ, σ ). Likewise a scale shift model is one in which (n-) observations come from N (µ, σ ) and the remaining one comes from N (µ, λσ ), where λ is some nonzero scalar quantity. Ferguson (96) considered various invariant tests for outlier rejection based on sample skewness and kurtosis. There are many more studies on detection of outliers in normal sample data. The study of outlier in linear model gained special attention soon after Ferguson (96) and Srikantan (96) explicitly defined mean-shift and variance inflation model on using Dixon s concept. Mean-shift model: The justification behind this procedure is that the basic normal theory model is valid except that the expectation of at most one unknown response is shifted. That is, in the presence of an outlier, say the i th observation to be an outlier, the mean of the i th observation will be shifted from µ i to µ i + c, where c is some non zero quantity. Variance-inflation model: In contrast to the mean shift model, the variance inflation model for a single outlier is based on the assumption that the variance of an unknown response is larger than the rest. The well-known Cook-statistic (Cook, 977), Q k - statistic (Gentleman and Wilk, 975) and AP- statistic (Andrew and Pregibon, 978) are based on mean shift model. For an excellent study on outlying observation in mean-shift model a reference may be made to Cook and Weisberg (98). Cook (977) proposed a new measure based on confidence ellipsoid for judging the contribution of each data point to the determination of the least squares estimator of the parameter vector in full rank linear regression models. y = Xβ + e (..)

23 Review of Literature 6 where y is an n vector of observations, X is an n p full rank matrix of known constant, β is a p vector of unknown parameters and e is an n vector of randomly distributed errors such that E(e) = 0 and V(e) = σ I, σ > 0. The estimated value of β and the ordinary residual vector based on the model (..) are βˆ = ( XX ) - Xy and r = y Xβˆ respectively. Corresponding to the i th s r r = ( n p) residual r i, the studentized residual is r t i = s i vii, where is the residual mean square and v ii is the i th diagonal element of the matrix V = I X( X X) X and r is the vector of residuals. The residuals are often used to identify outlier in data in linear models. More precisely it can be said that studentized residual and absolute Studentized residual are given the special attention and it is taken that the observation with largest absolute studentized residual is most likely to be an outlier. If β ( ) is the least squares estimator of β with the i th point deleted, then the suggested measure of the critical nature of each data point is defined to be ( β β ˆ ˆ ) XX ( β β ˆ ˆ ), i =,,,n ps () i () i D i = ˆ i = i r p w ii w ii, where w ii is the i th diagonal element of the matrix W = X(X X) X known as Hat matrix. This is known as Cook-statistic. It provides a measure of distance between βˆ and β ˆ () i in terms of descriptive levels of significance. D i can be compared to the percentage point of an F-distribution with p and n-p degrees of freedom. This measure was developed under implicit assumption that β is the parameter of interest. This may not always be the case. If the interest is in q linearly independent combinations ofβ, then it would be more reasonable to measure the influence each data point has on the determination of the least squares estimates of these combinations of interest. Let Pβ denote the parametric combinations of interest, where P is a q p matrix with rank q. A generalized measure of the importance of the i th point is defined to be

24 Review of Literature 7 D ( Pβˆ Pβˆ )[ P(X X) P ] ( Pβˆ Pβˆ ) () i () i i = qσˆ There are a number of other statistics available in the literature. Hocking and Pendleton (983) discussed the relative merit of some commonly used influential diagnostics. Polasek (984), de Gruttola et al. (987) and Martin (99) developed some methods to study the influence of outliers in regression when errors are correlated. Puterman (988) discussed the influence of outlying observation when errors follow first order autoregression. Schull and Dunne (988) gave a comprehensive general discussion of much of the theory of an outlier and influence in the normal general linear model ( y, Xβ,σ V ) with arbitrary known variance and covariance structure. If the data set contains more than one outlier or influential observation, which is likely to be the case in most of the data sets, the problem of identifying such observations becomes more difficult. This is due to masking and swamping effects. Masking occurs when an outlying subset goes undetected because of the presence of another, usually adjacent, subset. Swamping occurs when good observations are incorrectly identified as outliers because of the presence of another, usually remote, subset. As a result consecutive application of single outlier test leads to problem. Realizing this fact many researchers proposed test statistics for simultaneous detection of outliers. The first proposed test was the optimal block test of Murphy (95). Rosner (975) developed test statistics for identification of more than one outlier that are free from masking and swamping effects. John and Draper (978) proposed a two-stage test for the presence of two or one outlier in a two-way table by using Q k - statistic of Gentleman and Wilk (975). Draper and John (980) further extended the previous work to the study of testing for three or fewer outliers in two-way table. An aspect of design of experiments in the general regression situation when it is feared that outlier may also occur was also briefly discussed. In an attempt to avoid the problem of swamping Cook and Beckman (980) proposed the use of M-estimators for the detection of multiple outliers.

25 Review of Literature 8 Other test procedures for detection of multiple outliers that are free from masking and swamping effects are due to Davies and Gather (993), Hadi and Simonoff (993), and Hadi (994). Atkinson (994) gave a robust method for the detection of multiple outliers based on a series of forward search method and least median squares (LMS). Chatterjee and Hadi (986) studied influential observations, high leverage points, and outliers in linear regression. They described the interrelationships, which exists among the proposed measures of outliers. An examination of these relationships leads them to conclude that only three of these measures along with some graphical displays can provide an analyst a complete picture of outliers and points which excessively influence the fitted regression equation. Pena and Yohai (995) proposed a new method to identify the influential subsets in linear regression problems that is free from the masking. The procedure uses the eigenstructure of an influence matrix which is defined as the matrix of uncentred covariances of the effect on the whole data set of deleting each observation, normalized to include the univariate Cook statistics on the diagonal. It is shown that the eigenstructure of the influence matrix is useful to identify influential subsets... Outliers in Designed Experiments Box and Draper (975) were the first to study the outliers in experimental designs. They considered robustness of experimental designs in presence of outlier(s). Box and Draper suggested that in order to make a designed experiment insensitive to the presence of outlier(s), the variance of the overall discrepancy in the predicted responses should be minimized. Ghosh (983) used the information contained in a set of observations and Cook s distance (Cook 977) for detecting influential observations under full rank models. Ghosh (989) gave another measure of identifying the influential set of observations. The criterion is based on the sum of the variances of the predicted values of the set of unavailable observations and

26 Review of Literature 9 showed that this criterion is equivalent to the Cook s distance. Most of these criteria are based on full rank model situations and can usefully be employed in fitting response surfaces. However, these cannot be applied to the linear model for which the design matrix is deficient in rank. Gopalan and Dey (976) developed a criterion of robustness on the lines similar to those given by Box and Draper (975) in the experimental situations where the design matrix is not of full rank. Instead of taking the overall discrepancy in the variance of the estimated responses, they considered the discrepancy in the estimation of error variance and identified the designs for one way elimination of heterogeneity settings that minimize the variance of the bias or discrepancy in the estimation of error variance. Singh et al. (987) extended the results of Gopalan and Dey (976) to find out robust designs for two-way elimination of heterogeneity settings. Bhar (997) and Bhar and Gupta (00) have investigated the problem of outlier(s) in the experimental data for the block designs using the Cook s distance for the problem of inferring on the complete set of linearly independent orthonormalized treatment contrasts. However, they showed that when all the outliers belong to a particular block this statistic fails. Sarker (00) and Sarker et al. (003) extended these results to the experimental situations where the interest of the experimenter is only in a subset of all possible elementary treatment contrasts rather than the complete set of all possible elementary contrasts. They obtained the Cook-distance (single response) for the set of treatment contrasts of interest. This statistic is used to identify the outlying observation. Sarker et al. (005) proposed a test statistic for detection of a single outlier in block designs for diallel crosses. They have established a correspondence between two existing criteria of robustness against a single outlier i.e. minimization of average Cook-statistic and minimization of variance of discrepancy or bias in estimation of error variance. It has been shown that a proper binary balanced block design for diallel crosses is robust against the

27 Review of Literature 0 presence of a single outlier. Block designs for diallel crosses in which every line appears an equal number of times in each block are also found to be robust against the presence of a single outlier. In most of these studies dealing with the problem of outlier(s) in block designs, the meanshift model has been considered for studying the effect of outlier on the estimation of parameters in a block design for single response situations. Bhar and Gupta (003) studied outliers in block designs under variance inflation models..3 Outliers in Multivariate Regression Models Many methods of detecting outliers in multivariate data have been proposed in the literature, almost all of them applicable to a single sample (Barnett and Lewis, 994; Rocke and Woodruffe, 996; Penny and Jolliffe, 00). Gnanadesikan and Lee (970), Gnanadesikan and Kittening (97) and Kang and Bates (990) investigated the problems of outlier(s) in multi-response data mostly in regression analysis situations. The normal distribution, perhaps following data transformation, has a central place in the analysis of multivariate data. Mahalanobis distances provide the standard test for outliers in such data. However, it is well known that the estimates of the mean and covariance matrix found by using all the data are extremely sensitive to the presence of outliers. When there are many outliers the parameter estimates may be so distorted that the outliers are masked and the Mahalanobis distances fail to reveal any outliers, or indicate as outlying observations that are not in fact so. Accordingly, several researchers have suggested the use of robust parameter estimates in the calculation of the distances. Rousseeuw and van Zomeren (990) used minimum volume ellipsoid estimators of both parameters in calculation of the Mahalanobis distances. More recent work such as Pison et al. (00) or Hardin and Rocke (005) uses the minimum covariance determinant (MCD) estimator, whereas Atkinson and Riani (006), employed the series of robust estimators that is provided by the forward search to explore the structure of the Mahalanobis distances. Cook and Hawkins (990) suggested that the procedure of Rousseeuw and van Zomeren

28 Review of Literature (990) may find outliers everywhere. The implication is that the size of the outlier test may be very much larger than the nominal 5% or %. Riani et al. (009) introduced a new outlier test using the forward search and to compare its size and power with the best existing tests. Many of these methods are designed to test whether individual observations are outlying. They showed that their procedure has superior power as well as good size. They found theoretical boundaries for forward search procedure that allow for simultaneous inference. Caroni (998) extended the application of Wilks well-known test statistic for a single outlier to the case of samples from several subpopulations with a common covariance matrix, but she assumed that the number of populations was known. Thus, her results are not applicable to the situation in which there is an unknown number of groups. A method given by Wang et al. (997) for detecting multiple outliers from a mixture distribution required the existence of a set of points whose group membership was known. Recently, Hardin and Rocke (004) have proposed another method for the detection of outliers in multi-variate regression model. Glascock (99) and Beier and Mommsen (994) proposed a method which is able to detect any number of outliers, not just a pre-specified number. Another procedure for doing this in a single group, using Wilks statistic, was given by Caroni and Prescott (99). Caroni and Billor (007) proposed a method based on a computationally efficient robust method of outlier detection in a single group, given the name BACON by Billor et al. (000). They extended the BACON algorithm to grouped multivariate data. Wilks statistic, widely used for detecting multivariate outliers, is equivalent to using the Mahalanobis distances of the n sample points x i from the sample mean x d S = i( x, ) ( xi x) S ( xi x ) where S is the covariance matrix of the entire sample or in the case of the multiple outlier version (Caroni and Prescott, 99) a sample that has been reduced by omitting points that are even further from the mean than the one currently being tested. Points sufficiently distant from the mean (in relation to percentage points of an F distribution) are declared to be outliers. An obvious danger is that of masking: if there are actually more outliers than the

29 Review of Literature number being tested for, then the covariance matrix will be inflated by these extra outliers, thus reducing the distances d i and making it less likely that outliers will be declared. Furthermore, it is possible that distortion of the covariance matrix by outliers may lead to the candidate outliers being tested in the wrong order in the sequential version of the procedure. These considerations make it desirable to construct a robust version of the multivariate outlier detecting procedure, as can be done in other multivariate analyses such as principal components analysis (Campbell, 980; Caroni, 000; Hubert et al., 005). The Mahalanobis distance is thus replaced by a robust distance RD = ( x T ( x))( C( x)) ( x T ( x )), i i i where T (x) and C(x) are robust estimators of the center and the covariance matrix, respectively. Various estimators have been proposed in the literature, such as those based on the minimum volume ellipsoid (MVE) (Rousseeuw and van Zomeren, 990) and the minimum covariance determinant (Rousseeuw and van Driessen, 999). These estimators have the desirable properties of high breakdown points and affine invariance. However, their implementation posed considerable practical problems since the amount of computation required to be sure of reaching the solution was, in many cases, infeasible. The BACON multiple outlier detection method is based on the methods of Hadi (994), which aimed for efficient computation of the MVE. One version of BACON is nearly affine equivariant, has a high breakdown point. Rocke and Woodruff (996) have given new insights into the problem of identification of outliers in multivariate data. They have explained that difficulty of detecting multivariate outliers increases with the dimension of the data. A procedure of multivariate outlier detection is discussed in two ways viz. estimate (i) location and shape and (ii) scale the shape so that it can be used to suggest which points are far enough from the location estimate to be considered possible outliers. The above method of detecting outlier is known as affine equivariant (Rocke and Woodruff, 996). Gao et al.(005) proposed a new procedure named Maximum-Eigen Difference (MED) for identifying outliers in multivariate data sets. This procedure is based on the differences based

30 Review of Literature 3 on the maximal eigen value and eigen vector of the sample variance-covariance matrix of the data set to detect multivariate outliers. Examples show that it works well for detecting multivariate outliers. They showed that the MED works better than Mahalanobis s Distance (MD) and is comparable with Robust Distance (RD). However, there also exist some examples where MED can t detects correctly outliers. So it is recommended the use of MED in combination with other procedures of outlier identification. The advantage of MED is that it is easy to calculate, with many efficient algorithms software package which can calculate the eigen values and eigenvectors of a matrix, while computation of robust estimator is quite difficult in many cases. Another method of outlier identification via Forward Search was proposed by Chambers et al. (004). The basic idea is simple. To avoid the well-known masking and swamping problems that can occur when there are multiple outliers in a data set (see Barnett and Lewis (994) and Rousseeuw and Leroy (987)), the algorithm starts from an initial subset of observations of size m<n that is chosen to be outlier free. Here n denotes the number of observations in the complete data set. In the regression context, a model for the variable of interest is estimated from this initial clean subset. Fitted values generated by this model are then used to generate n distances to the actual sample data values. The next step in the algorithm redefines the clean subset to contain those observations corresponding to the m+ smallest of these distances and the procedure is repeated. The algorithm stops when distances to all sample observations outside the clean subset are all too large or when this subset contains all n sample units. Definition of the initial clean subset is important for implementing the forward search procedure. Since the residuals from the estimated fit based on the initial clean subset define subsequent clean subsets, it is important that the parameter estimates defining this estimated fit are unaffected by possible outliers in the initial subset. This can be achieved by selecting observations to enter this subset only after they have been thoroughly checked. Alternatively, they used an outlier robust estimation procedure applied to the entire sample to define a set of robust residuals, with the initial subset then corresponding to the m observations with

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

OUTLIERS IN DESIGNED EXPERIMENTS

OUTLIERS IN DESIGNED EXPERIMENTS OUTLIERS IN DESIGNED EXPERIMENTS Lalmohan Bhar I.A.S.R.I., Library Avenue, New Delhi- 0 0 lmbhar@iasri.res.in. Introduction From the time when man started exploiting and employing the information in the

More information

MULTIVARIATE ANALYSIS OF VARIANCE

MULTIVARIATE ANALYSIS OF VARIANCE MULTIVARIATE ANALYSIS OF VARIANCE RAJENDER PARSAD AND L.M. BHAR Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 0 0 lmb@iasri.res.in. Introduction In many agricultural experiments,

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Identification of Multivariate Outliers: A Performance Study

Identification of Multivariate Outliers: A Performance Study AUSTRIAN JOURNAL OF STATISTICS Volume 34 (2005), Number 2, 127 138 Identification of Multivariate Outliers: A Performance Study Peter Filzmoser Vienna University of Technology, Austria Abstract: Three

More information

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1

, (1) e i = ˆσ 1 h ii. c 2016, Jeffrey S. Simonoff 1 Regression diagnostics As is true of all statistical methodologies, linear regression analysis can be a very effective way to model data, as along as the assumptions being made are true. For the regression

More information

COVARIANCE ANALYSIS. Rajender Parsad and V.K. Gupta I.A.S.R.I., Library Avenue, New Delhi

COVARIANCE ANALYSIS. Rajender Parsad and V.K. Gupta I.A.S.R.I., Library Avenue, New Delhi COVARIANCE ANALYSIS Rajender Parsad and V.K. Gupta I.A.S.R.I., Library Avenue, New Delhi - 110 012 1. Introduction It is well known that in designed experiments the ability to detect existing differences

More information

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI **

A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI *, R.A. IPINYOMI ** ANALELE ŞTIINłIFICE ALE UNIVERSITĂłII ALEXANDRU IOAN CUZA DIN IAŞI Tomul LVI ŞtiinŃe Economice 9 A ROBUST METHOD OF ESTIMATING COVARIANCE MATRIX IN MULTIVARIATE DATA ANALYSIS G.M. OYEYEMI, R.A. IPINYOMI

More information

Chapter 1 Statistical Inference

Chapter 1 Statistical Inference Chapter 1 Statistical Inference causal inference To infer causality, you need a randomized experiment (or a huge observational study and lots of outside information). inference to populations Generalizations

More information

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling

More information

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES

REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES REGRESSION DIAGNOSTICS AND REMEDIAL MEASURES Lalmohan Bhar I.A.S.R.I., Library Avenue, Pusa, New Delhi 110 01 lmbhar@iasri.res.in 1. Introduction Regression analysis is a statistical methodology that utilizes

More information

Improved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics

Improved Feasible Solution Algorithms for. High Breakdown Estimation. Douglas M. Hawkins. David J. Olive. Department of Applied Statistics Improved Feasible Solution Algorithms for High Breakdown Estimation Douglas M. Hawkins David J. Olive Department of Applied Statistics University of Minnesota St Paul, MN 55108 Abstract High breakdown

More information

Detection of outliers in multivariate data:

Detection of outliers in multivariate data: 1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal

More information

Supplementary Material for Wang and Serfling paper

Supplementary Material for Wang and Serfling paper Supplementary Material for Wang and Serfling paper March 6, 2017 1 Simulation study Here we provide a simulation study to compare empirically the masking and swamping robustness of our selected outlyingness

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

SAMPLING IN FIELD EXPERIMENTS

SAMPLING IN FIELD EXPERIMENTS SAMPLING IN FIELD EXPERIMENTS Rajender Parsad I.A.S.R.I., Library Avenue, New Delhi-0 0 rajender@iasri.res.in In field experiments, the plot size for experimentation is selected for achieving a prescribed

More information

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE

A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE Sankhyā : The Indian Journal of Statistics 2000, Volume 62, Series A, Pt. 1, pp. 144 149 A CONNECTION BETWEEN LOCAL AND DELETION INFLUENCE By M. MERCEDES SUÁREZ RANCEL and MIGUEL A. GONZÁLEZ SIERRA University

More information

Robust Regression Diagnostics. Regression Analysis

Robust Regression Diagnostics. Regression Analysis Robust Regression Diagnostics 1.1 A Graduate Course Presented at the Faculty of Economics and Political Sciences, Cairo University Professor Ali S. Hadi The American University in Cairo and Cornell University

More information

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp Detection of Influential Observation in Linear Regression R. Dennis Cook Technometrics, Vol. 19, No. 1. (Feb., 1977), pp. 15-18. Stable URL: http://links.jstor.org/sici?sici=0040-1706%28197702%2919%3a1%3c15%3adoioil%3e2.0.co%3b2-8

More information

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points

Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Detecting outliers and/or leverage points: a robust two-stage procedure with bootstrap cut-off points Ettore Marubini (1), Annalisa Orenti (1) Background: Identification and assessment of outliers, have

More information

Outlier Detection via Feature Selection Algorithms in

Outlier Detection via Feature Selection Algorithms in Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS032) p.4638 Outlier Detection via Feature Selection Algorithms in Covariance Estimation Menjoge, Rajiv S. M.I.T.,

More information

Monitoring Random Start Forward Searches for Multivariate Data

Monitoring Random Start Forward Searches for Multivariate Data Monitoring Random Start Forward Searches for Multivariate Data Anthony C. Atkinson 1, Marco Riani 2, and Andrea Cerioli 2 1 Department of Statistics, London School of Economics London WC2A 2AE, UK, a.c.atkinson@lse.ac.uk

More information

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX

ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX STATISTICS IN MEDICINE Statist. Med. 17, 2685 2695 (1998) ON THE CALCULATION OF A ROBUST S-ESTIMATOR OF A COVARIANCE MATRIX N. A. CAMPBELL *, H. P. LOPUHAA AND P. J. ROUSSEEUW CSIRO Mathematical and Information

More information

Heteroskedasticity-Consistent Covariance Matrix Estimators in Small Samples with High Leverage Points

Heteroskedasticity-Consistent Covariance Matrix Estimators in Small Samples with High Leverage Points Theoretical Economics Letters, 2016, 6, 658-677 Published Online August 2016 in SciRes. http://www.scirp.org/journal/tel http://dx.doi.org/10.4236/tel.2016.64071 Heteroskedasticity-Consistent Covariance

More information

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY

ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY ROBUST ESTIMATION OF A CORRELATION COEFFICIENT: AN ATTEMPT OF SURVEY G.L. Shevlyakov, P.O. Smirnov St. Petersburg State Polytechnic University St.Petersburg, RUSSIA E-mail: Georgy.Shevlyakov@gmail.com

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Detecting and Assessing Data Outliers and Leverage Points

Detecting and Assessing Data Outliers and Leverage Points Chapter 9 Detecting and Assessing Data Outliers and Leverage Points Section 9.1 Background Background Because OLS estimators arise due to the minimization of the sum of squared errors, large residuals

More information

Re-weighted Robust Control Charts for Individual Observations

Re-weighted Robust Control Charts for Individual Observations Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE

IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE IMPROVING THE SMALL-SAMPLE EFFICIENCY OF A ROBUST CORRELATION MATRIX: A NOTE Eric Blankmeyer Department of Finance and Economics McCoy College of Business Administration Texas State University San Marcos

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

AN ELECTRO-THERMAL APPROACH TO ACTIVE THERMOGRAPHY RAJESH GUPTA

AN ELECTRO-THERMAL APPROACH TO ACTIVE THERMOGRAPHY RAJESH GUPTA AN ELECTRO-THERMAL APPROACH TO ACTIVE THERMOGRAPHY by RAJESH GUPTA Centre for Applied Research in Electronics Submitted in fulfillment of the requirements of the degree of DOCTOR OF PHILOSOPHY to the INDIAN

More information

A Modified M-estimator for the Detection of Outliers

A Modified M-estimator for the Detection of Outliers A Modified M-estimator for the Detection of Outliers Asad Ali Department of Statistics, University of Peshawar NWFP, Pakistan Email: asad_yousafzay@yahoo.com Muhammad F. Qadir Department of Statistics,

More information

with the usual assumptions about the error term. The two values of X 1 X 2 0 1

with the usual assumptions about the error term. The two values of X 1 X 2 0 1 Sample questions 1. A researcher is investigating the effects of two factors, X 1 and X 2, each at 2 levels, on a response variable Y. A balanced two-factor factorial design is used with 1 replicate. The

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Statistical Distribution Assumptions of General Linear Models

Statistical Distribution Assumptions of General Linear Models Statistical Distribution Assumptions of General Linear Models Applied Multilevel Models for Cross Sectional Data Lecture 4 ICPSR Summer Workshop University of Colorado Boulder Lecture 4: Statistical Distributions

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1)

The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) The Model Building Process Part I: Checking Model Assumptions Best Practice (Version 1.1) Authored by: Sarah Burke, PhD Version 1: 31 July 2017 Version 1.1: 24 October 2017 The goal of the STAT T&E COE

More information

Independent Component (IC) Models: New Extensions of the Multinormal Model

Independent Component (IC) Models: New Extensions of the Multinormal Model Independent Component (IC) Models: New Extensions of the Multinormal Model Davy Paindaveine (joint with Klaus Nordhausen, Hannu Oja, and Sara Taskinen) School of Public Health, ULB, April 2008 My research

More information

The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects

The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects Int. J. Contemp. Math. Sciences, Vol. 3, 2008, no. 17, 839-859 The Algorithm for Multiple Outliers Detection Against Masking and Swamping Effects Jung-Tsung Chiang Department of Business Administration

More information

Math 423/533: The Main Theoretical Topics

Math 423/533: The Main Theoretical Topics Math 423/533: The Main Theoretical Topics Notation sample size n, data index i number of predictors, p (p = 2 for simple linear regression) y i : response for individual i x i = (x i1,..., x ip ) (1 p)

More information

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr

Regression Model Specification in R/Splus and Model Diagnostics. Daniel B. Carr Regression Model Specification in R/Splus and Model Diagnostics By Daniel B. Carr Note 1: See 10 for a summary of diagnostics 2: Books have been written on model diagnostics. These discuss diagnostics

More information

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts

A Study of Statistical Power and Type I Errors in Testing a Factor Analytic. Model for Group Differences in Regression Intercepts A Study of Statistical Power and Type I Errors in Testing a Factor Analytic Model for Group Differences in Regression Intercepts by Margarita Olivera Aguilar A Thesis Presented in Partial Fulfillment of

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Small Sample Corrections for LTS and MCD

Small Sample Corrections for LTS and MCD myjournal manuscript No. (will be inserted by the editor) Small Sample Corrections for LTS and MCD G. Pison, S. Van Aelst, and G. Willems Department of Mathematics and Computer Science, Universitaire Instelling

More information

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY

AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Econometrics Working Paper EWP0401 ISSN 1485-6441 Department of Economics AN EMPIRICAL LIKELIHOOD RATIO TEST FOR NORMALITY Lauren Bin Dong & David E. A. Giles Department of Economics, University of Victoria

More information

An Introduction to Path Analysis

An Introduction to Path Analysis An Introduction to Path Analysis PRE 905: Multivariate Analysis Lecture 10: April 15, 2014 PRE 905: Lecture 10 Path Analysis Today s Lecture Path analysis starting with multivariate regression then arriving

More information

Introduction to robust statistics*

Introduction to robust statistics* Introduction to robust statistics* Xuming He National University of Singapore To statisticians, the model, data and methodology are essential. Their job is to propose statistical procedures and evaluate

More information

9. Linear Regression and Correlation

9. Linear Regression and Correlation 9. Linear Regression and Correlation Data: y a quantitative response variable x a quantitative explanatory variable (Chap. 8: Recall that both variables were categorical) For example, y = annual income,

More information

Curvature measures for generalized linear models

Curvature measures for generalized linear models University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 1999 Curvature measures for generalized linear models Bernard A.

More information

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals

Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals Regression with a Single Regressor: Hypothesis Tests and Confidence Intervals (SW Chapter 5) Outline. The standard error of ˆ. Hypothesis tests concerning β 3. Confidence intervals for β 4. Regression

More information

An Overview of Multiple Outliers in Multidimensional Data

An Overview of Multiple Outliers in Multidimensional Data Sri Lankan Journal of Applied Statistics, Vol (14-2) An Overview of Multiple Outliers in Multidimensional Data T. A. Sajesh 1 and M.R. Srinivasan 2 1 Department of Statistics, St. Thomas College, Thrissur,

More information

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points International Journal of Contemporary Mathematical Sciences Vol. 13, 018, no. 4, 177-189 HIKARI Ltd, www.m-hikari.com https://doi.org/10.1988/ijcms.018.8616 Alternative Biased Estimator Based on Least

More information

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines)

Dr. Maddah ENMG 617 EM Statistics 11/28/12. Multiple Regression (3) (Chapter 15, Hines) Dr. Maddah ENMG 617 EM Statistics 11/28/12 Multiple Regression (3) (Chapter 15, Hines) Problems in multiple regression: Multicollinearity This arises when the independent variables x 1, x 2,, x k, are

More information

MULTIVARIATE TECHNIQUES, ROBUSTNESS

MULTIVARIATE TECHNIQUES, ROBUSTNESS MULTIVARIATE TECHNIQUES, ROBUSTNESS Mia Hubert Associate Professor, Department of Mathematics and L-STAT Katholieke Universiteit Leuven, Belgium mia.hubert@wis.kuleuven.be Peter J. Rousseeuw 1 Senior Researcher,

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

Topic 8. Data Transformations [ST&D section 9.16]

Topic 8. Data Transformations [ST&D section 9.16] Topic 8. Data Transformations [ST&D section 9.16] 8.1 The assumptions of ANOVA For ANOVA, the linear model for the RCBD is: Y ij = µ + τ i + β j + ε ij There are four key assumptions implicit in this model.

More information

1 Least Squares Estimation - multiple regression.

1 Least Squares Estimation - multiple regression. Introduction to multiple regression. Fall 2010 1 Least Squares Estimation - multiple regression. Let y = {y 1,, y n } be a n 1 vector of dependent variable observations. Let β = {β 0, β 1 } be the 2 1

More information

On Construction of a Class of. Orthogonal Arrays

On Construction of a Class of. Orthogonal Arrays On Construction of a Class of Orthogonal Arrays arxiv:1210.6923v1 [cs.dm] 25 Oct 2012 by Ankit Pat under the esteemed guidance of Professor Somesh Kumar A Dissertation Submitted for the Partial Fulfillment

More information

Researchers often record several characters in their research experiments where each character has a special significance to the experimenter.

Researchers often record several characters in their research experiments where each character has a special significance to the experimenter. Dimension reduction in multivariate analysis using maximum entropy criterion B. K. Hooda Department of Mathematics and Statistics CCS Haryana Agricultural University Hisar 125 004 India D. S. Hooda Jaypee

More information

Analysis of Variance and Co-variance. By Manza Ramesh

Analysis of Variance and Co-variance. By Manza Ramesh Analysis of Variance and Co-variance By Manza Ramesh Contents Analysis of Variance (ANOVA) What is ANOVA? The Basic Principle of ANOVA ANOVA Technique Setting up Analysis of Variance Table Short-cut Method

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

Accurate and Powerful Multivariate Outlier Detection

Accurate and Powerful Multivariate Outlier Detection Int. Statistical Inst.: Proc. 58th World Statistical Congress, 11, Dublin (Session CPS66) p.568 Accurate and Powerful Multivariate Outlier Detection Cerioli, Andrea Università di Parma, Dipartimento di

More information

OPTIMAL CONTROLLED SAMPLING DESIGNS

OPTIMAL CONTROLLED SAMPLING DESIGNS OPTIMAL CONTROLLED SAMPLING DESIGNS Rajender Parsad and V.K. Gupta I.A.S.R.I., Library Avenue, New Delhi 002 rajender@iasri.res.in. Introduction Consider a situation, where it is desired to conduct a sample

More information

Multiple Linear Regression

Multiple Linear Regression Andrew Lonardelli December 20, 2013 Multiple Linear Regression 1 Table Of Contents Introduction: p.3 Multiple Linear Regression Model: p.3 Least Squares Estimation of the Parameters: p.4-5 The matrix approach

More information

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages:

Glossary. The ISI glossary of statistical terms provides definitions in a number of different languages: Glossary The ISI glossary of statistical terms provides definitions in a number of different languages: http://isi.cbs.nl/glossary/index.htm Adjusted r 2 Adjusted R squared measures the proportion of the

More information

An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes

An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes An Approach to Constructing Good Two-level Orthogonal Factorial Designs with Large Run Sizes by Chenlu Shi B.Sc. (Hons.), St. Francis Xavier University, 013 Project Submitted in Partial Fulfillment of

More information

Tackling Statistical Uncertainty in Method Validation

Tackling Statistical Uncertainty in Method Validation Tackling Statistical Uncertainty in Method Validation Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 301-325 325-31293129 About the Speaker Mr. Steven

More information

NON-NUMERICAL RANKING BASED ON PAIRWISE COMPARISONS

NON-NUMERICAL RANKING BASED ON PAIRWISE COMPARISONS NON-NUMERICAL RANKING BASED ON PAIRWISE COMPARISONS By Yun Zhai, M.Sc. A Thesis Submitted to the School of Graduate Studies in partial fulfilment of the requirements for the degree of Ph.D. Department

More information

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA)

Robust Wilks' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) ISSN 2224-584 (Paper) ISSN 2225-522 (Online) Vol.7, No.2, 27 Robust Wils' Statistic based on RMCD for One-Way Multivariate Analysis of Variance (MANOVA) Abdullah A. Ameen and Osama H. Abbas Department

More information

Mathematics for Economics MA course

Mathematics for Economics MA course Mathematics for Economics MA course Simple Linear Regression Dr. Seetha Bandara Simple Regression Simple linear regression is a statistical method that allows us to summarize and study relationships between

More information

Linear Models 1. Isfahan University of Technology Fall Semester, 2014

Linear Models 1. Isfahan University of Technology Fall Semester, 2014 Linear Models 1 Isfahan University of Technology Fall Semester, 2014 References: [1] G. A. F., Seber and A. J. Lee (2003). Linear Regression Analysis (2nd ed.). Hoboken, NJ: Wiley. [2] A. C. Rencher and

More information

Open Problems in Mixed Models

Open Problems in Mixed Models xxiii Determining how to deal with a not positive definite covariance matrix of random effects, D during maximum likelihood estimation algorithms. Several strategies are discussed in Section 2.15. For

More information

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc

Introduction Robust regression Examples Conclusion. Robust regression. Jiří Franc Robust regression Robust estimation of regression coefficients in linear regression model Jiří Franc Czech Technical University Faculty of Nuclear Sciences and Physical Engineering Department of Mathematics

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Two Simple Resistant Regression Estimators

Two Simple Resistant Regression Estimators Two Simple Resistant Regression Estimators David J. Olive Southern Illinois University January 13, 2005 Abstract Two simple resistant regression estimators with O P (n 1/2 ) convergence rate are presented.

More information

8. Example: Predicting University of New Mexico Enrollment

8. Example: Predicting University of New Mexico Enrollment 8. Example: Predicting University of New Mexico Enrollment year (1=1961) 6 7 8 9 10 6000 10000 14000 0 5 10 15 20 25 30 6 7 8 9 10 unem (unemployment rate) hgrad (highschool graduates) 10000 14000 18000

More information

Likelihood and Fairness in Multidimensional Item Response Theory

Likelihood and Fairness in Multidimensional Item Response Theory Likelihood and Fairness in Multidimensional Item Response Theory or What I Thought About On My Holidays Giles Hooker and Matthew Finkelman Cornell University, February 27, 2008 Item Response Theory Educational

More information

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur

Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Applied Multivariate Statistical Modeling Prof. J. Maiti Department of Industrial Engineering and Management Indian Institute of Technology, Kharagpur Lecture - 29 Multivariate Linear Regression- Model

More information

9 Correlation and Regression

9 Correlation and Regression 9 Correlation and Regression SW, Chapter 12. Suppose we select n = 10 persons from the population of college seniors who plan to take the MCAT exam. Each takes the test, is coached, and then retakes the

More information

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis.

401 Review. 6. Power analysis for one/two-sample hypothesis tests and for correlation analysis. 401 Review Major topics of the course 1. Univariate analysis 2. Bivariate analysis 3. Simple linear regression 4. Linear algebra 5. Multiple regression analysis Major analysis methods 1. Graphical analysis

More information

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp

Nonlinear Regression. Summary. Sample StatFolio: nonlinear reg.sgp Nonlinear Regression Summary... 1 Analysis Summary... 4 Plot of Fitted Model... 6 Response Surface Plots... 7 Analysis Options... 10 Reports... 11 Correlation Matrix... 12 Observed versus Predicted...

More information

Robust Methods in Regression Analysis: Comparison and Improvement. Mohammad Abd- Almonem H. Al-Amleh. Supervisor. Professor Faris M.

Robust Methods in Regression Analysis: Comparison and Improvement. Mohammad Abd- Almonem H. Al-Amleh. Supervisor. Professor Faris M. Robust Methods in Regression Analysis: Comparison and Improvement By Mohammad Abd- Almonem H. Al-Amleh Supervisor Professor Faris M. Al-Athari This Thesis was Submitted in Partial Fulfillment of the Requirements

More information

Principal component analysis

Principal component analysis Principal component analysis Motivation i for PCA came from major-axis regression. Strong assumption: single homogeneous sample. Free of assumptions when used for exploration. Classical tests of significance

More information

Fast and robust bootstrap for LTS

Fast and robust bootstrap for LTS Fast and robust bootstrap for LTS Gert Willems a,, Stefan Van Aelst b a Department of Mathematics and Computer Science, University of Antwerp, Middelheimlaan 1, B-2020 Antwerp, Belgium b Department of

More information

Lecture 2. Simple linear regression

Lecture 2. Simple linear regression Lecture 2. Simple linear regression Jesper Rydén Department of Mathematics, Uppsala University jesper@math.uu.se Regression and Analysis of Variance autumn 2014 Overview of lecture Introduction, short

More information

Probability and Statistics

Probability and Statistics Probability and Statistics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be CHAPTER 4: IT IS ALL ABOUT DATA 4a - 1 CHAPTER 4: IT

More information

Simultaneous Optimization of Incomplete Multi-Response Experiments

Simultaneous Optimization of Incomplete Multi-Response Experiments Open Journal of Statistics, 05, 5, 430-444 Published Online August 05 in SciRes. http://www.scirp.org/journal/ojs http://dx.doi.org/0.436/ojs.05.55045 Simultaneous Optimization of Incomplete Multi-Response

More information

To Peter, Kris, Dad, Lame and everyone else who believed in me. ii

To Peter, Kris, Dad, Lame and everyone else who believed in me. ii Multivariate Outlier Detection and Robust Clustering with Minimum Covariance Determinant Estimation and S-Estimation By Johanna Sarah Hardin B.A. (Pomona College) 1995 M.S. (University of California, Davis)

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL

A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Discussiones Mathematicae Probability and Statistics 36 206 43 5 doi:0.75/dmps.80 A NOTE ON ROBUST ESTIMATION IN LOGISTIC REGRESSION MODEL Tadeusz Bednarski Wroclaw University e-mail: t.bednarski@prawo.uni.wroc.pl

More information

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications

A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications A Convex Hull Peeling Depth Approach to Nonparametric Massive Multivariate Data Analysis with Applications Hyunsook Lee. hlee@stat.psu.edu Department of Statistics The Pennsylvania State University Hyunsook

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression University of California, San Diego Instructor: Ery Arias-Castro http://math.ucsd.edu/~eariasca/teaching.html 1 / 42 Passenger car mileage Consider the carmpg dataset taken from

More information

MATHEMATICAL MODELLING OF DISPERSION OF AIR POLLUTANTS IN LOW WIND CONDITIONS

MATHEMATICAL MODELLING OF DISPERSION OF AIR POLLUTANTS IN LOW WIND CONDITIONS MATHEMATICAL MODELLING OF DISPERSION OF AIR POLLUTANTS IN LOW WIND CONDITIONS by ANIL KUMAR YADAV Thesis submitted to the Indian Institute of Technology, Delhi for the award of the degree of DOCTOR OF

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS RESTRICTED OPEN BOOK EXAMINATION (Not to be removed from the examination hall) Data provided: Statistics Tables by H.R. Neave MAS5052 SCHOOL OF MATHEMATICS AND STATISTICS Basic Statistics Spring Semester

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

An Introduction to Mplus and Path Analysis

An Introduction to Mplus and Path Analysis An Introduction to Mplus and Path Analysis PSYC 943: Fundamentals of Multivariate Modeling Lecture 10: October 30, 2013 PSYC 943: Lecture 10 Today s Lecture Path analysis starting with multivariate regression

More information

Do not copy, post, or distribute

Do not copy, post, or distribute 14 CORRELATION ANALYSIS AND LINEAR REGRESSION Assessing the Covariability of Two Quantitative Properties 14.0 LEARNING OBJECTIVES In this chapter, we discuss two related techniques for assessing a possible

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Original Article Robust correlation coefficient goodness-of-fit test for the Gumbel distribution Abbas Mahdavi 1* 1 Department of Statistics, School of Mathematical

More information