A clustering approach to detect multiple outliers in linear functional relationship model for circular data

Size: px
Start display at page:

Download "A clustering approach to detect multiple outliers in linear functional relationship model for circular data"

Transcription

1 Journal of Applied Statistics ISSN: (Print) (Online) Journal homepage: A clustering approach to detect multiple outliers in linear functional relationship model for circular data Nurkhairany Amyra Mokhtar, Yong Zulina Zubairi & Abdul Ghapor Hussin To cite this article: Nurkhairany Amyra Mokhtar, Yong Zulina Zubairi & Abdul Ghapor Hussin (2017): A clustering approach to detect multiple outliers in linear functional relationship model for circular data, Journal of Applied Statistics, DOI: / To link to this article: Published online: 29 Jun Submit your article to this journal Article views: 21 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at Download by: [University of Malaya] Date: 01 July 2017, At: 23:45

2 JOURNAL OF APPLIED STATISTICS, A clustering approach to detect multiple outliers in linear functional relationship model for circular data Nurkhairany Amyra Mokhtar a, Yong Zulina Zubairi b and Abdul Ghapor Hussin a a Faculty of Defence Sciences and Technology, National Defence University of Malaysia, Kuala Lumpur, Malaysia; b Centre for Foundation Studies in Science, University of Malaya, Kuala Lumpur, Malaysia ABSTRACT Outlier detection has been used extensively in data analysis to detect anomalous observation in data. It has important applications such as in fraud detection and robust analysis, among others. In this paper, we propose a method in detecting multiple outliers in linear functional relationship model for circular variables. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering approach. With the use of a tree diagram, we illustrate the detection of outliers graphically. A Monte Carlo simulation study is done to verify the accuracy of the proposed method. Low probability of masking and swamping effects indicate the validity of the proposed approach. Also, the illustrations to two sets of real data are given to show its practical applicability. ARTICLE HISTORY Received 8 June 2016 Accepted 29 May 2017 KEYWORDS Linear functional relationship model; clustering; outliers detection; wind data; circular variables 1. Introduction An outlying observation, or outlier is defined as the one that appears to deviate markedly from other members of the sample in which it occurs. The outlier may be the result of gross deviation from prescribed experimental procedure or an error in calculating or recording the numerical value. If reasons are found for aberrant observations, then one should act accordingly and perhaps scrutinise also the other observations [9]. Published works on identification of outliers are aplenty for linear data; some examples are Grubbs [9], Rahman et al. [19], Sebert et al. [23] and Adnan et al. [3]. However, for circular data, the approach is not straight forward as data are of the wrapped around in the nature. In circular regression model, Difference Mean Circular Error Statistic (DMCE) was proposed to detect outliers [1]. It has been shown that the method is applicable to the Down and Mardia [7]circular regression model [20]. Also, COVRATIO has been proposed to identify outliers to a general linear regression model [2]. Recently, a clustering method for Down and Mardia circular circular regression model has been proposed [22]. It is worthwhile to note that aforementioned methods are for regression model that considers the presence of error term for the variablex only. When both variables are observed with error, we can describe the relationship using the functional linear relationship model. CONTACT Yong Zulina Zubairi yzulina@um.edu.my Centre for FoundationStudiesin Science, University of Malaya, Kuala Lumpur 50603, Malaysia 2017 Informa UK Limited, trading as Taylor & Francis Group

3 2 N. A. MOKHTAR ET AL. The functional relationship model is part of general class of error-in-variables model (EIVM), in which the underlying variables are deterministic or fixed [10]. There are some other models in EIVM which are structural relationship model and ultrastructural relationship model. In structural relationship model, the variables are random meanwhile in ultrastructural relationship model, the variables are the synthesis of linear functional and structural relationship model. With regard to outlier detection in linear functional relationship model, Hussin et al. [12] used COVRATIO statistic based on row deletion approach for Caires and Wyatt model [5]. Later in 2014, another statistic based on Functional Difference Mean Circular Error Statistic (FDMCE) for Caires and Wyatt model has been proposed [24]. However, these two methods detect outliers based on statistic measures only. Thus, the motivation of our studyistoproposeanalternativegraphicalapproach,usingdendrogram,todetectoutliers in linear functional relationship model. In this paper, Section 2 describes the materials and methods used in this study, which explains circular data, linear functional relationship model, single linkage of hierarchical agglomerative clustering technique, the power of performance measures and a simulation study of the method proposed. Section 3 of this paper shows the results of the simulation study and its applicability to a real data set meanwhile Section 4 summarise the proposed method. 2. Materials and methods 2.1. Circular data Data that occurs around a circle is known as circular data and are usually measured in degrees in the interval [0, 360 ) or in radian with interval [0, 2π).Circulardatacanbe of the vectorial or axial type. Vectorial data has directed segments where both angle and direction are associated with a point. Axial data refers to angular position of random lines in which neither end can be identified as the starting point. Examples of axial and circular data can be found in ecological and environment studies, such as the direction of the earth magnetic pole [4], wind speed direction [16] and angles of knee flexion measured to assess the recovery of orthopaedic patients [13]. ThemostpopularcirculardistributionistheVonMisesdistributionwiththeprobability density function 1 g(θ; μ, κ) = 2πI 0 (κ) eκ cos(θ μ), (1) where I 0 (κ) is the modified Bessel function of the first kind and order zero, which can be defined by: I 0 (κ) = 1 2π e κ cos θ dθ, (2) 2π 0 where μ is the mean directionand κ is the concentration parameter. The concentration parameter κ influences the Von Mises distribution VM(μ, κ) inversely as σ 2 influences the Normal distribution N(μ, σ 2 ).Thus,aconcentratedVon Misesdistributionwillhavelargeconcentrationparameter,andadispersedVonMises distribution will have a small concentration parameter [5].

4 JOURNAL OF APPLIED STATISTICS Linear functional relationship model for circular variables As mentioned earlier, the functional relationship model is a part of general class of EIVM, in which the underlying variables are deterministic or fixed [18]. In ordinary linear regression model, for any pair of observations (x i, y i ), i = 1, 2,..., n, it is assumed that x values are observed without error, and only y-variable is observed with error, where in this case, x is known as explanatory variable and y is known as response variable. However, in EIVM, both x and y areobservedwitherror.inaddition,ineivmmodel,thereisnodistinction between explanatory and response variable. It is worthwhile to note that data that fits in the functional relationship model is different from functional data. Functional data deals with the theory of data that are in the form of functions, images and shapes, or more general objects [6].They are no longer a set of discretetimepointvaluesbutacontinuouscurveortermedasfunctionaldata.thestatistical methodology in analysing functional data is called as functional data analysis [21]. Thispaperisfocusingoncirculardatawhichhasalinearfunctionalrelationship.The functional relationship model for circular data was first introduced in 1997 and is known as unreplicated linear functional relationship model [11]. When β = 1, the model is known as Caires and Wyatt model, in connection with the desired symmetry of the functional relationship model [5]. The functional relationship model for this study can be expressed as Y = X + α(mod 2π). (3) With x i = X i + δ i and y i = Y i + ε i where i = 1, 2,..., n forsomerotationparameterα The random errors δ i and ε i areassumedtobeindependentlydistributedwithvonmises distribution with δ i VM(0, κ)and ε i VM(0, ν), respectively. The estimation of parameter may be obtained by Maximum Likelihood Estimation which involves some iterative technique [17] The single linkage of hierarchical agglomerative clustering technique in detecting outliers Outlierdetectionisimportantinmanydataanalysistoensuretherobustnessoftheestimation. Clustering algorithm are often used to detect outliers in linear data. One of the common method is using the hierarchical agglomerative clustering where we start by defining each data point to be a cluster and combine existing cluster using linkages. In a single linkage method, groups are formed from the individual object by merging nearest neighbours, where the term nearest neighbour is the connotation of the smallestdistanceorlargestsimilarity[14]. Initially, we have to find the minimum distance in D ={d ik } and merge the corresponding objects, say, U and V, to get the cluster of (UV). Next is to find the distances between (UV)andanyotherclusterW which is d (uv)w = min{d uw, d vw }. (4) The quantities d uw is the distance between the nearest neighbours of clusters U and W; and d vw is the distance between the nearest neighbours of clusters V and W. The results of single linkage clustering can be displayed graphically in the form of a tree diagram, or dendrogram. The branches in the tree represent clusters and they merge at

5 4 N. A. MOKHTAR ET AL. nodeswhosepositionsalongadistance(orsimilarity)axisindicatethelevelatwhichthe fusions occur. In this paper, a clustering-based procedure is developed for the predicted and the residualvaluesofthedatatodetecttheoutliersforcairesandwyattlinearfunctional relationship model Y = X + α(mod2π) for the range [0, 2π) radian. In view of the wrapped around nature of angles for circular data, we define the measure of similarity as p d ij = (π π θ ik θ jk )mod2π, (5) k=1 where d ij is the distance between observation i and j, p is the number of variables and θ ik is the value of the kth variable for the ith observation. Note that mod2π is multiplied to the d i as the data measured here is for the range of (0, 2π]. To estimate the number of target clusters and to decide the optimal level of a dendrogram can be quite challenging [15]. A fixed probability interval can be calculated around the mean direction μ and the circular standard deviation of von Mises distribution can be interpreted by the method proposed [8]. The interval in the distribution of the form (μ ± θ p ) contains a specified percentage of distribution. For example, the values θ 68.3 and θ 95.4 are such that μ ± θ 68.3 and μ ± θ 95.4 are intervals containing exactly 68.3% and 95.4%, respectively [8]. Some multipliers ξ P as given by Fisher for a selection of possible P-values are: P% = 75%, ξ 75 = 1.12, (κ 0.35, ρ 0.17), (6) P% = 90%, ξ 75 = 1.69, (κ 0.65, ρ 0.31), (7) P% = 95%, ξ 75 = 2.06, (κ 0.8, ρ 0.37). (8) In this study, we consider a 95% confidence interval. Therefore the stopping rule for the dendrogram is given by h s h, (9) where h is the average of the heights for all N 1 clusters, s h = 2log R h is the circular standard deviation of the height and R h is the mean resultant length of the heights, R h = C 2 +S 2 n [22].Thismeanstheclustergroupthatexceedsthestoppingruleasgivenin(9)is considered as outliers Power of performance The performance of the power can be measured by the misclassification error namely masking and swamping. Masking is the inability of a detection method to correctly classify a true outlier. That is, the outlier is falsely detected as an inlier [23]. On the other hand, swamping is defined as a detection method classifies an inlier as an outlier. The power of performance of the clustering technique is evaluated by using three measures namely the probability of success, probability of masking error and the probability of swamping error.

6 JOURNAL OF APPLIED STATISTICS 5 Probability of success is given by success pout =, (10) s where success is the number of data set that all of the planted outliers are successfully identified, with s is the number of simulation. Probability of masking error is failure pmask = (out)s, (11) where failure is the number of outliers in all data set that are detected as inliers. Probability of swamping error is false pswamp = (n out)s, (12) where false is the number of inliers in all data set that are detected as outliers and out is the number of the planted outliers Simulation study A simulation study was carried out using SPlus statistical software to study the power of performance of the clustering method. The number of simulation, s is set to be 5000 for each category. The values of X have been generated from the Von Mises distribution of VM ( π 4,3 ) and the value of the rotation parameter, α = π 4 = with different values of the concentration parameters of the error term considered namely κ = 5, 10, 15 and 20. For each value of κ, thesamplesizearen = 30, 50, 100 and 130 respectively. We also fix the number of outliers to be three at three random points of d 1, d 2 and d 3.Thethreeobservations of Y variables at position d t are set to be the outliers by a contamination given by (13) [2], where t = 1,2,3. Y dt =Y dt + ωπ(mod2π), (13) where Y dt isthevalueafterthecontaminationandω isthedegreeofthecontamination in the range of 0 ω 1. The scope of this study considers only the case when Y variable as contaminated and not the X variable. The parameters of the generated data are estimated for the Caires and Wyatt model. Then, the predicted values of ŷ and the residual values ê = ŷ y are obtained and are being clustered by single linkage of hierarchical clustering method as mentioned earlier. 3. Results and discussions 3.1. Simulation results Table 1 shows the probability of success (pout) of the clustering technique. For fixed n,the probability of success increases as the concentration parameter and the level of contamination increase. The highest concentration parameter and highest level of contamination result in the highest probability of success in which the value is very close to 1.

7 6 N. A. MOKHTAR ET AL. Table 1. Probability of success of the clustering technique. n ω κ = 5 κ = 10 κ = 15 κ = To compare the probability of success for different values of κ, we plot the results for particular n, sayn = 100asgiveninFigure1. Wenotethatthebiggertheκ value, the faster the probability of success approaches 1 with increased level of contamination. As mentioned earlier, another measure of performance is the masking error. Masking results in skewed mean and covariance estimates, thus results in distance between outliers and mean is small. Table 2 shows the probability of masking of the clustering technique. In this simulation study, we note that the probability of masking error decreases when both the level of contamination and the concentration parameter increase. The highest concentration parameter and highest level of contamination result in the lowest probability of masking error in which the value is very near to 0. Alternatively, to illustrate the effect of level contamination for different κ values, we plottheresultforn = 100 as shown in Figure 2. It is shown that the bigger the κ value, the faster is the decrease on the probability of masking as we increase the level of contamination. Another measure is the swamping effect. Swamping occurs when outlying observations skew the mean and covariance estimates towards it and away from other non-outlying observations, thus making the distance of non-outlying to the mean large. This makes them look like outliers. Table 3 shows the probability of swamping of the clustering technique. The probability of swamping error decreases when the level of contamination and the concentration parameter increase. We note that the highest probability of masking error is not more than Also for n = 100, it is observed that the larger the κ value, the faster is the decrease is the probability of swamping as we increase the level of contamination. Figure 3 is plotted to illustrate the trend of the swamping error of this clustering method.

8 JOURNAL OF APPLIED STATISTICS 7 Probability of "success" Level of contamination kappa=5 kappa=10 kappa=10 kappa=20 Figure 1. Plot of probability of success (pout) versus the level of contamination for n = 100. Table 2. Probability of masking of the clustering technique. n ω κ = 5 κ = 10 κ = 15 κ = Application to real data Previous researchers of circular statistics such as Hussin et al. [12] and Shamsudheen [24]haveusedwinddirectiondatatakenfromHumbersideCoast,UK,developedbyUK Rutherford and Appleton Laboratories, for the illustration of the presence of outliers, with sample size of 129. Variable x of the data shows that the data is measured by the techniques of HF radar system. It uses pulse radar and operates at frequency of MHz. Meanwhile variable y is measured by using the techniques of anchored wave buoy. They

9 8 N. A. MOKHTAR ET AL. Probability of masking Level of contamination kappa=5 kappa=10 kappa=15 kappa=20 Figure 2. Plot of probability of masking (pmask) versus the level of contamination for n = 100. Table 3. Probability of swamping of the clustering technique. n ω κ = 5 κ = 10 κ = 15 κ = have established that observations 38 and 111 as the outliers of the data set and it has been established that the data can be modelled using Caires and Wyatt model [17]. Thus, we illustrate the presence of the outliers on the same data set using the new clustering method to detect outliers for Caires and Wyatt linear functional relationship model. Figure 4 shows the tree diagram, or the dendrogram of the wind direction data of Humberside Coast with the mean of the height, h of and the standard deviation of height, s h equals to The cut height of the dendrogram based on Equation (9) is It can

10 JOURNAL OF APPLIED STATISTICS 9 swamping of Probability Level of contamination kappa=5 kappa=10 kappa=15 kappa=20 Figure 3. Plot of probability of swamping (pswamp) versus the level of contamination for n = 100. Cut height = Figure 4. The plot of tree diagram with a cut-off at the height of for wind direction data of Humberside Coast, UK. be seen that there are 3 groups of observations, where Group 1 contains most of the observations, Group 2 contains observation 111 and Group 3 contains observation 38. Therefore, from this tree diagram, we conclude that Group 3 contains the inliers and, observations 111 and 38 are the outliers with 95% of confidence. Another real data used to illustrate the applicability of this method is on wind direction data collected at Bayan Lepas Airport, Penang, Malaysia located at 16.3 m above ground level, latitude N and longitude Ewithn = 62. The variable x in this data is wind direction at pressure of 850 Hpa with 5000 m height, meanwhile the y variable is wind direction at pressure of 1000 Hpa with 300 m height. Figure 5 shows the dendrogram of the

11 10 N. A. MOKHTAR ET AL. Cut height = Figure 5. The plot of tree diagram with a cut-off at the height of for wind direction data of Bayan Lepas, Malaysia. data, where the cut of height for the dendrogram of this data is The outliers detected are observations 47, 12 and 57 with 95% of confidence. 4. Summary In this paper, we consider single linkage of hierarchical clustering method for the detectionofmultiple outliersincircular datawith therange of [0,2π)radians. The dendrogram produced has a cut-off at the height of h s h where h is the average of the heights for all N 1clusters,s h = 2log R h is the circular standard deviation of the height and R h is the mean resultant length of the heights. Potential outliers are classified at 95% confidence level when the data exceed the stopping rule. Results from the simulation study confirm the validity of the use of the hierarchical clustering technique with low probability of swamping and masking. The method is illustrated by using the real data set of wind direction of Humberside Coast UK and Bayan Lepas Airport, Malaysia. Moreover, our method correctly identifies the outliers as established by other studies. Disclosure statement No potential conflict of interest was reported by the authors. Funding We would like to thank National Defence University of Malaysia and University of Malaya [research grant BKS ] for supporting this work.

12 JOURNAL OF APPLIED STATISTICS 11 References [1] A.H. Abuzaid, A.G. Hussin, and I.B. Mohamed, Detection of outliers in simple circular regression models using the mean circular error statistic, J.Stat.Comput.Simul.83(2013), pp [2] A.H. Abuzaid, I.B. Mohamed, A.G. Hussin, and A. Rambli, COVRATIO statistic for simple circular regression model, Chiang Mai J. Sci. 38 (2011), pp [3] R. Adnan, M.N. Mohamad, and H. Setan, Multiple outliers detection procedures in linear regression, Matematika 19 (2003),pp [4] E. Batschelet, Circular Statistic in Biology, AcademicPress, London, [5] S. Caires and L.R. Wyatt, A linear functional relationship model for circular data with an application to the assessment of ocean wave measurement, J. Agric. Biol. Environ.Stat. 8 (2003), pp [6] J.C. Davis, Statistics and Data Analysis in Geology, 3rded., WileyIndia, [7] T.D. Down and K.V. Mardia, Circular regression, Biometrika89 (2002),pp [8] N.I. Fisher, Problems with the current definitions of the standard deviation of wind direction, J.Clim.Appl.Meteorol.26(1987), pp [9] F.E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics11(1969), pp [10] S.F. Hassan, A.G. Hussin, and Y.Z. Zubairi, Estimation of functional relationship model for circular variables and its application in measurement problem, Chiang Mai J. Sci. 37 (2010), pp [11] A.G. Hussin, A. G. Pseudo-replication in functional relationship with environmental application, Ph.D. thesis, University of Sheffield, England, [12] A.G.Hussin,A.Abuzaid,F.Zulkifli,andI.Mohamed,Asymptotic covariance and detection of influential observation in a linear functional relationship model for circular data with application to the measurements of wind directions,sci.asia36(2010), pp [13] S.R. Jammaladaka and A. Sengupta, Topics in Circular Statistics, World Scientific Publishing, [14] R.A. Johnson and D.W. Wichern, AppliedMultivariateStatisticalAnalysis, Pearson, New Jersey, [15] Y.Jung,H.Park,D.Du,andB.L.Drake,A decision criterion for the optimal number of clusters in hierarchical clustering,j.globaloptim.25(2003), pp [16] K.V.Mardia and P.E.Jupp,Directional Statistics, John Wiley & Sons, West Sussex, [17] N.A. Mokhtar, Y.Z. Zubairi, and A.G. Hussin, A simple linear functional relationship model for circular variables and its application, Proceedings of the 9th International Conference on Renewable Energy Sources (RES 15), Kuala Lumpur, Malaysia, 2015, pp [18] N.A. Mokhtar, Y.Z. Zubairi, and A.G. Hussin, Parameter estimation of simultaneous functional relationship model for circular variables assuming equal error variances, Pakistan J. Statist. 31 (2015), pp [19] S.M.A.K. Rahman, M.M. Sathik, and K.S. Kannan, Multiple linear regression models in outlier detection,int.j.res.comput.sci.2(2012), pp [20] A. Rambli, I. Mohamed, and A.H.M. Abuzaid, Identification of influential observations in circular regression model, Proceedings of the Regional Conference on Statistical Sciences (RCSS 10), 2010, pp [21] J.O. Ramsay and B.W. Silverman, Functional Data Analysis,Springer,NewYork,1997. [22] S.Z. Satari, Parameter estimation and outlier detection for some types of circular model, Ph.D. thesis, University of Malaya, Malaysia, [23] D.M. Sebert, D.C. Montgomery, and D.A. Rollier, A clustering algorithm for identifying multiple outliers in linear regression, Comput. Statist. Data Anal. 27 (1998),pp [24] M.I. Shamsudheen, Bootstrapping and outlier detection problems in linear functional relationship model for circular data, Master thesis, Universiti Pertahanan Nasional Malaysia, Malaysia, 2014.

On Development of Spoke Plot for Circular Variables

On Development of Spoke Plot for Circular Variables Chiang Mai J. Sci. 2010; 37(3) 369 Chiang Mai J. Sci. 2010; 37(3) : 369-376 www.science.cmu.ac.th/journal-science/josci.html Contributed Paper On Development of Spoke Plot for Circular Variables Fakhrulrozi

More information

A. H. Abuzaid a, A. G. Hussin b & I. B. Mohamed a a Institute of Mathematical Sciences, University of Malaya, 50603,

A. H. Abuzaid a, A. G. Hussin b & I. B. Mohamed a a Institute of Mathematical Sciences, University of Malaya, 50603, This article was downloaded by: [University of Malaya] On: 09 June 2013, At: 19:32 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office:

More information

Outliers Detection in Multiple Circular Regression Model via DFBETAc Statistic

Outliers Detection in Multiple Circular Regression Model via DFBETAc Statistic International Journal of Applied Engineering Research ISSN 973-456 Volume 3, Number (8) pp. 983-99 Outliers Detection in Multiple Circular Regression Model via Statistic Najla Ahmed Alkasadi Student, Institute

More information

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH

IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH SESSION X : THEORY OF DEFORMATION ANALYSIS II IDENTIFYING MULTIPLE OUTLIERS IN LINEAR REGRESSION : ROBUST FIT AND CLUSTERING APPROACH Robiah Adnan 2 Halim Setan 3 Mohd Nor Mohamad Faculty of Science, Universiti

More information

Numerical Comparisons for Various Estimators of Slope Parameters for Unreplicated Linear Functional Model

Numerical Comparisons for Various Estimators of Slope Parameters for Unreplicated Linear Functional Model Matematika, 2004, Jilid 20, bil., hlm. 9 30 c Jabatan Matematik, UTM. Numerical Comparisons for Various Estimators of Slope Parameters for Unreplicated Linear Functional Model Abdul Ghapor Hussin Centre

More information

Prediction Intervals in the Presence of Outliers

Prediction Intervals in the Presence of Outliers Prediction Intervals in the Presence of Outliers David J. Olive Southern Illinois University July 21, 2003 Abstract This paper presents a simple procedure for computing prediction intervals when the data

More information

By Bhattacharjee, Das. Published: 26 April 2018

By Bhattacharjee, Das. Published: 26 April 2018 Electronic Journal of Applied Statistical Analysis EJASA, Electron. J. App. Stat. Anal. http://siba-ese.unisalento.it/index.php/ejasa/index e-issn: 2070-5948 DOI: 10.1285/i20705948v11n1p155 Estimation

More information

Regression Analysis for Data Containing Outliers and High Leverage Points

Regression Analysis for Data Containing Outliers and High Leverage Points Alabama Journal of Mathematics 39 (2015) ISSN 2373-0404 Regression Analysis for Data Containing Outliers and High Leverage Points Asim Kumer Dey Department of Mathematics Lamar University Md. Amir Hossain

More information

Logistic Regression for Circular Data

Logistic Regression for Circular Data Logistic Regression for Circular Data Kadhem Al-Daffaie 1,2,a and Shahjahan Khan 3,b 1 School of Agricultural, Computational and Environmental Sciences University of Southern Queensland, Toowoomba, QLD,

More information

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions

Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error Distributions Journal of Modern Applied Statistical Methods Volume 8 Issue 1 Article 13 5-1-2009 Least Absolute Value vs. Least Squares Estimation and Inference Procedures in Regression Models with Asymmetric Error

More information

Directional Statistics

Directional Statistics Directional Statistics Kanti V. Mardia University of Leeds, UK Peter E. Jupp University of St Andrews, UK I JOHN WILEY & SONS, LTD Chichester New York Weinheim Brisbane Singapore Toronto Contents Preface

More information

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study

The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study MATEMATIKA, 2012, Volume 28, Number 1, 35 48 c Department of Mathematics, UTM. The Goodness-of-fit Test for Gumbel Distribution: A Comparative Study 1 Nahdiya Zainal Abidin, 2 Mohd Bakri Adam and 3 Habshah

More information

Forecasting Wind Ramps

Forecasting Wind Ramps Forecasting Wind Ramps Erin Summers and Anand Subramanian Jan 5, 20 Introduction The recent increase in the number of wind power producers has necessitated changes in the methods power system operators

More information

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002

STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 Time allowed: 3 HOURS. STATISTICS 174: APPLIED STATISTICS FINAL EXAM DECEMBER 10, 2002 This is an open book exam: all course notes and the text are allowed, and you are expected to use your own calculator.

More information

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples

Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples DOI 10.1186/s40064-016-1718-3 RESEARCH Open Access Robustness of the Quadratic Discriminant Function to correlated and uncorrelated normal training samples Atinuke Adebanji 1,2, Michael Asamoah Boaheng

More information

Discussion Paper No. 28

Discussion Paper No. 28 Discussion Paper No. 28 Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on the Circle Yasuhito Tsuruta Masahiko Sagae Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on

More information

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3

Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details. Section 10.1, 2, 3 Inference for Regression Inference about the Regression Model and Using the Regression Line, with Details Section 10.1, 2, 3 Basic components of regression setup Target of inference: linear dependency

More information

Wrapped Gaussian processes: a short review and some new results

Wrapped Gaussian processes: a short review and some new results Wrapped Gaussian processes: a short review and some new results Giovanna Jona Lasinio 1, Gianluca Mastrantonio 2 and Alan Gelfand 3 1-Università Sapienza di Roma 2- Università RomaTRE 3- Duke University

More information

A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression

A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression A Note on Adaptive Box-Cox Transformation Parameter in Linear Regression Mezbahur Rahman, Samah Al-thubaiti, Reid W Breitenfeldt, Jinzhu Jiang, Elliott B Light, Rebekka H Molenaar, Mohammad Shaha A Patwary

More information

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models

The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Int. J. Contemp. Math. Sciences, Vol. 2, 2007, no. 7, 297-307 The Masking and Swamping Effects Using the Planted Mean-Shift Outliers Models Jung-Tsung Chiang Department of Business Administration Ling

More information

Outlier Detection via Feature Selection Algorithms in

Outlier Detection via Feature Selection Algorithms in Int. Statistical Inst.: Proc. 58th World Statistical Congress, 2011, Dublin (Session CPS032) p.4638 Outlier Detection via Feature Selection Algorithms in Covariance Estimation Menjoge, Rajiv S. M.I.T.,

More information

Re-weighted Robust Control Charts for Individual Observations

Re-weighted Robust Control Charts for Individual Observations Universiti Tunku Abdul Rahman, Kuala Lumpur, Malaysia 426 Re-weighted Robust Control Charts for Individual Observations Mandana Mohammadi 1, Habshah Midi 1,2 and Jayanthi Arasan 1,2 1 Laboratory of Applied

More information

Detecting outliers in weighted univariate survey data

Detecting outliers in weighted univariate survey data Detecting outliers in weighted univariate survey data Anna Pauliina Sandqvist October 27, 21 Preliminary Version Abstract Outliers and influential observations are a frequent concern in all kind of statistics,

More information

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST

TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Econometrics Working Paper EWP0402 ISSN 1485-6441 Department of Economics TESTING FOR NORMALITY IN THE LINEAR REGRESSION MODEL: AN EMPIRICAL LIKELIHOOD RATIO TEST Lauren Bin Dong & David E. A. Giles Department

More information

Sociology 6Z03 Review I

Sociology 6Z03 Review I Sociology 6Z03 Review I John Fox McMaster University Fall 2016 John Fox (McMaster University) Sociology 6Z03 Review I Fall 2016 1 / 19 Outline: Review I Introduction Displaying Distributions Describing

More information

Efficiency of NNBD and NNBIBD using autoregressive model

Efficiency of NNBD and NNBIBD using autoregressive model 2018; 3(3): 133-138 ISSN: 2456-1452 Maths 2018; 3(3): 133-138 2018 Stats & Maths www.mathsjournal.com Received: 17-03-2018 Accepted: 18-04-2018 S Saalini Department of Statistics, Loyola College, Chennai,

More information

Bootstrap Procedures for Testing Homogeneity Hypotheses

Bootstrap Procedures for Testing Homogeneity Hypotheses Journal of Statistical Theory and Applications Volume 11, Number 2, 2012, pp. 183-195 ISSN 1538-7887 Bootstrap Procedures for Testing Homogeneity Hypotheses Bimal Sinha 1, Arvind Shah 2, Dihua Xu 1, Jianxin

More information

Journal of Biostatistics and Epidemiology

Journal of Biostatistics and Epidemiology Journal of Biostatistics and Epidemiology Original Article Robust correlation coefficient goodness-of-fit test for the Gumbel distribution Abbas Mahdavi 1* 1 Department of Statistics, School of Mathematical

More information

Canonical Correlation Analysis of Longitudinal Data

Canonical Correlation Analysis of Longitudinal Data Biometrics Section JSM 2008 Canonical Correlation Analysis of Longitudinal Data Jayesh Srivastava Dayanand N Naik Abstract Studying the relationship between two sets of variables is an important multivariate

More information

Experimental Design and Data Analysis for Biologists

Experimental Design and Data Analysis for Biologists Experimental Design and Data Analysis for Biologists Gerry P. Quinn Monash University Michael J. Keough University of Melbourne CAMBRIDGE UNIVERSITY PRESS Contents Preface page xv I I Introduction 1 1.1

More information

Computational Connections Between Robust Multivariate Analysis and Clustering

Computational Connections Between Robust Multivariate Analysis and Clustering 1 Computational Connections Between Robust Multivariate Analysis and Clustering David M. Rocke 1 and David L. Woodruff 2 1 Department of Applied Science, University of California at Davis, Davis, CA 95616,

More information

Overview of clustering analysis. Yuehua Cui

Overview of clustering analysis. Yuehua Cui Overview of clustering analysis Yuehua Cui Email: cuiy@msu.edu http://www.stt.msu.edu/~cui A data set with clear cluster structure How would you design an algorithm for finding the three clusters in this

More information

The effect of wind direction on ozone levels - a case study

The effect of wind direction on ozone levels - a case study The effect of wind direction on ozone levels - a case study S. Rao Jammalamadaka and Ulric J. Lund University of California, Santa Barbara and California Polytechnic State University, San Luis Obispo Abstract

More information

UNIVERSITY OF TORONTO Faculty of Arts and Science

UNIVERSITY OF TORONTO Faculty of Arts and Science UNIVERSITY OF TORONTO Faculty of Arts and Science December 2013 Final Examination STA442H1F/2101HF Methods of Applied Statistics Jerry Brunner Duration - 3 hours Aids: Calculator Model(s): Any calculator

More information

Introduction to Statistical Inference

Introduction to Statistical Inference Structural Health Monitoring Using Statistical Pattern Recognition Introduction to Statistical Inference Presented by Charles R. Farrar, Ph.D., P.E. Outline Introduce statistical decision making for Structural

More information

Computation of an efficient and robust estimator in a semiparametric mixture model

Computation of an efficient and robust estimator in a semiparametric mixture model Journal of Statistical Computation and Simulation ISSN: 0094-9655 (Print) 1563-5163 (Online) Journal homepage: http://www.tandfonline.com/loi/gscs20 Computation of an efficient and robust estimator in

More information

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories.

Chapter Goals. To understand the methods for displaying and describing relationship among variables. Formulate Theories. Chapter Goals To understand the methods for displaying and describing relationship among variables. Formulate Theories Interpret Results/Make Decisions Collect Data Summarize Results Chapter 7: Is There

More information

Remedial Measures for Multiple Linear Regression Models

Remedial Measures for Multiple Linear Regression Models Remedial Measures for Multiple Linear Regression Models Yang Feng http://www.stat.columbia.edu/~yangfeng Yang Feng (Columbia University) Remedial Measures for Multiple Linear Regression Models 1 / 25 Outline

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition

Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Applied Multivariate Statistical Analysis Richard Johnson Dean Wichern Sixth Edition Pearson Education Limited Edinburgh Gate Harlow Essex CM20 2JE England and Associated Companies throughout the world

More information

Outlier detection and variable selection via difference based regression model and penalized regression

Outlier detection and variable selection via difference based regression model and penalized regression Journal of the Korean Data & Information Science Society 2018, 29(3), 815 825 http://dx.doi.org/10.7465/jkdi.2018.29.3.815 한국데이터정보과학회지 Outlier detection and variable selection via difference based regression

More information

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata

Principles of Pattern Recognition. C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata Principles of Pattern Recognition C. A. Murthy Machine Intelligence Unit Indian Statistical Institute Kolkata e-mail: murthy@isical.ac.in Pattern Recognition Measurement Space > Feature Space >Decision

More information

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction

Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction Model Based Statistics in Biology. Part V. The Generalized Linear Model. Chapter 16 Introduction ReCap. Parts I IV. The General Linear Model Part V. The Generalized Linear Model 16 Introduction 16.1 Analysis

More information

Logistic Regression: Regression with a Binary Dependent Variable

Logistic Regression: Regression with a Binary Dependent Variable Logistic Regression: Regression with a Binary Dependent Variable LEARNING OBJECTIVES Upon completing this chapter, you should be able to do the following: State the circumstances under which logistic regression

More information

A nonparametric two-sample wald test of equality of variances

A nonparametric two-sample wald test of equality of variances University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 211 A nonparametric two-sample wald test of equality of variances David

More information

Minimax design criterion for fractional factorial designs

Minimax design criterion for fractional factorial designs Ann Inst Stat Math 205 67:673 685 DOI 0.007/s0463-04-0470-0 Minimax design criterion for fractional factorial designs Yue Yin Julie Zhou Received: 2 November 203 / Revised: 5 March 204 / Published online:

More information

Regression Diagnostics for Survey Data

Regression Diagnostics for Survey Data Regression Diagnostics for Survey Data Richard Valliant Joint Program in Survey Methodology, University of Maryland and University of Michigan USA Jianzhu Li (Westat), Dan Liao (JPSM) 1 Introduction Topics

More information

Multivariate statistical methods and data mining in particle physics

Multivariate statistical methods and data mining in particle physics Multivariate statistical methods and data mining in particle physics RHUL Physics www.pp.rhul.ac.uk/~cowan Academic Training Lectures CERN 16 19 June, 2008 1 Outline Statement of the problem Some general

More information

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course.

* Tuesday 17 January :30-16:30 (2 hours) Recored on ESSE3 General introduction to the course. Name of the course Statistical methods and data analysis Audience The course is intended for students of the first or second year of the Graduate School in Materials Engineering. The aim of the course

More information

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model

The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for Bayesian Estimation in a Finite Gaussian Mixture Model Thai Journal of Mathematics : 45 58 Special Issue: Annual Meeting in Mathematics 207 http://thaijmath.in.cmu.ac.th ISSN 686-0209 The Jackknife-Like Method for Assessing Uncertainty of Point Estimates for

More information

Multilevel Statistical Models: 3 rd edition, 2003 Contents

Multilevel Statistical Models: 3 rd edition, 2003 Contents Multilevel Statistical Models: 3 rd edition, 2003 Contents Preface Acknowledgements Notation Two and three level models. A general classification notation and diagram Glossary Chapter 1 An introduction

More information

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling

Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling Kriging models with Gaussian processes - covariance function estimation and impact of spatial sampling François Bachoc former PhD advisor: Josselin Garnier former CEA advisor: Jean-Marc Martinez Department

More information

Regression Modelling. Dr. Michael Schulzer. Centre for Clinical Epidemiology and Evaluation

Regression Modelling. Dr. Michael Schulzer. Centre for Clinical Epidemiology and Evaluation Regression Modelling Dr. Michael Schulzer Centre for Clinical Epidemiology and Evaluation REGRESSION Origins: a historical note. Sir Francis Galton: Natural inheritance (1889) Macmillan, London. Regression

More information

holding all other predictors constant

holding all other predictors constant Multiple Regression Numeric Response variable (y) p Numeric predictor variables (p < n) Model: Y = b 0 + b 1 x 1 + + b p x p + e Partial Regression Coefficients: b i effect (on the mean response) of increasing

More information

An Empirical Characteristic Function Approach to Selecting a Transformation to Normality

An Empirical Characteristic Function Approach to Selecting a Transformation to Normality Communications for Statistical Applications and Methods 014, Vol. 1, No. 3, 13 4 DOI: http://dx.doi.org/10.5351/csam.014.1.3.13 ISSN 87-7843 An Empirical Characteristic Function Approach to Selecting a

More information

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness

On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Statistics and Applications {ISSN 2452-7395 (online)} Volume 16 No. 1, 2018 (New Series), pp 289-303 On Modifications to Linking Variance Estimators in the Fay-Herriot Model that Induce Robustness Snigdhansu

More information

CHAPTER 5. Outlier Detection in Multivariate Data

CHAPTER 5. Outlier Detection in Multivariate Data CHAPTER 5 Outlier Detection in Multivariate Data 5.1 Introduction Multivariate outlier detection is the important task of statistical analysis of multivariate data. Many methods have been proposed for

More information

Useful Numerical Statistics of Some Response Surface Methodology Designs

Useful Numerical Statistics of Some Response Surface Methodology Designs Journal of Mathematics Research; Vol. 8, No. 4; August 20 ISSN 19-9795 E-ISSN 19-9809 Published by Canadian Center of Science and Education Useful Numerical Statistics of Some Response Surface Methodology

More information

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and

Discriminant Analysis with High Dimensional. von Mises-Fisher distribution and Athens Journal of Sciences December 2014 Discriminant Analysis with High Dimensional von Mises - Fisher Distributions By Mario Romanazzi This paper extends previous work in discriminant analysis with von

More information

Dr. Allen Back. Sep. 23, 2016

Dr. Allen Back. Sep. 23, 2016 Dr. Allen Back Sep. 23, 2016 Look at All the Data Graphically A Famous Example: The Challenger Tragedy Look at All the Data Graphically A Famous Example: The Challenger Tragedy Type of Data Looked at the

More information

Geometric View of Measurement Errors

Geometric View of Measurement Errors This article was downloaded by: [University of Virginia, Charlottesville], [D. E. Ramirez] On: 20 May 205, At: 06:4 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number:

More information

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems

Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity and Outliers Problems Modern Applied Science; Vol. 9, No. ; 05 ISSN 9-844 E-ISSN 9-85 Published by Canadian Center of Science and Education Using Ridge Least Median Squares to Estimate the Parameter by Solving Multicollinearity

More information

Modelling Academic Risks of Students in a Polytechnic System With the Use of Discriminant Analysis

Modelling Academic Risks of Students in a Polytechnic System With the Use of Discriminant Analysis Progress in Applied Mathematics Vol. 6, No., 03, pp. [59-69] DOI: 0.3968/j.pam.955803060.738 ISSN 95-5X [Print] ISSN 95-58 [Online] www.cscanada.net www.cscanada.org Modelling Academic Risks of Students

More information

Machine Learning Linear Classification. Prof. Matteo Matteucci

Machine Learning Linear Classification. Prof. Matteo Matteucci Machine Learning Linear Classification Prof. Matteo Matteucci Recall from the first lecture 2 X R p Regression Y R Continuous Output X R p Y {Ω 0, Ω 1,, Ω K } Classification Discrete Output X R p Y (X)

More information

Simulating Uniform- and Triangular- Based Double Power Method Distributions

Simulating Uniform- and Triangular- Based Double Power Method Distributions Journal of Statistical and Econometric Methods, vol.6, no.1, 2017, 1-44 ISSN: 1792-6602 (print), 1792-6939 (online) Scienpress Ltd, 2017 Simulating Uniform- and Triangular- Based Double Power Method Distributions

More information

Conditions for Segmentation of Motion with Affine Fundamental Matrix

Conditions for Segmentation of Motion with Affine Fundamental Matrix Conditions for Segmentation of Motion with Affine Fundamental Matrix Shafriza Nisha Basah 1, Reza Hoseinnezhad 2, and Alireza Bab-Hadiashar 1 Faculty of Engineering and Industrial Sciences, Swinburne University

More information

arxiv: v1 [physics.ins-det] 2 Jul 2013

arxiv: v1 [physics.ins-det] 2 Jul 2013 Comparison of magnetic field uniformities for discretized and finite-sized standard cos θ, solenoidal, and spherical coils arxiv:1307.0864v1 [physics.ins-det] Jul 013 Abstract N. Nouri, B. Plaster Department

More information

Detection of outliers in multivariate data:

Detection of outliers in multivariate data: 1 Detection of outliers in multivariate data: a method based on clustering and robust estimators Carla M. Santos-Pereira 1 and Ana M. Pires 2 1 Universidade Portucalense Infante D. Henrique, Oporto, Portugal

More information

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models

Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Computationally Efficient Estimation of Multilevel High-Dimensional Latent Variable Models Tihomir Asparouhov 1, Bengt Muthen 2 Muthen & Muthen 1 UCLA 2 Abstract Multilevel analysis often leads to modeling

More information

Multivariate Analysis

Multivariate Analysis Prof. Dr. J. Franke All of Statistics 3.1 Multivariate Analysis High dimensional data X 1,..., X N, i.i.d. random vectors in R p. As a data matrix X: objects values of p features 1 X 11 X 12... X 1p 2.

More information

A CIRCULAR CIRCULAR REGRESSION MODEL

A CIRCULAR CIRCULAR REGRESSION MODEL Statistica Sinica 18(2008), 633-645 A CIRCULAR CIRCULAR REGRESSION MODEL Shogo Kato, Kunio Shimizu and Grace S. Shieh Institute of Statistical Mathematics, Keio University and Academia Sinica Abstract:

More information

Quasi-likelihood Scan Statistics for Detection of

Quasi-likelihood Scan Statistics for Detection of for Quasi-likelihood for Division of Biostatistics and Bioinformatics, National Health Research Institutes & Department of Mathematics, National Chung Cheng University 17 December 2011 1 / 25 Outline for

More information

UNCERTAINTY EVALUATION OF MILITARY TERRAIN ANALYSIS RESULTS BY SIMULATION AND VISUALIZATION

UNCERTAINTY EVALUATION OF MILITARY TERRAIN ANALYSIS RESULTS BY SIMULATION AND VISUALIZATION Geoinformatics 2004 Proc. 12th Int. Conf. on Geoinformatics Geospatial Information Research: Bridging the Pacific and Atlantic University of Gävle, Sweden, 7-9 June 2004 UNCERTAINTY EVALUATION OF MILITARY

More information

ANALYSIS AND RATIO OF LINEAR FUNCTION OF PARAMETERS IN FIXED EFFECT THREE LEVEL NESTED DESIGN

ANALYSIS AND RATIO OF LINEAR FUNCTION OF PARAMETERS IN FIXED EFFECT THREE LEVEL NESTED DESIGN ANALYSIS AND RATIO OF LINEAR FUNCTION OF PARAMETERS IN FIXED EFFECT THREE LEVEL NESTED DESIGN Mustofa Usman 1, Ibnu Malik 2, Warsono 1 and Faiz AM Elfaki 3 1 Department of Mathematics, Universitas Lampung,

More information

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances

Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances Advances in Decision Sciences Volume 211, Article ID 74858, 8 pages doi:1.1155/211/74858 Research Article A Nonparametric Two-Sample Wald Test of Equality of Variances David Allingham 1 andj.c.w.rayner

More information

TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014

TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION. Anthony Atkinson, 25th March 2014 TESTS FOR TRANSFORMATIONS AND ROBUST REGRESSION Anthony Atkinson, 25th March 2014 Joint work with Marco Riani, Parma Department of Statistics London School of Economics London WC2A 2AE, UK a.c.atkinson@lse.ac.uk

More information

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics

Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Monte Carlo Study on the Successive Difference Replication Method for Non-Linear Statistics Amang S. Sukasih, Mathematica Policy Research, Inc. Donsig Jang, Mathematica Policy Research, Inc. Amang S. Sukasih,

More information

A Power Analysis of Variable Deletion Within the MEWMA Control Chart Statistic

A Power Analysis of Variable Deletion Within the MEWMA Control Chart Statistic A Power Analysis of Variable Deletion Within the MEWMA Control Chart Statistic Jay R. Schaffer & Shawn VandenHul University of Northern Colorado McKee Greeley, CO 869 jay.schaffer@unco.edu gathen9@hotmail.com

More information

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris.

Joint work with Nottingham colleagues Simon Preston and Michail Tsagris. /pgf/stepx/.initial=1cm, /pgf/stepy/.initial=1cm, /pgf/step/.code=1/pgf/stepx/.expanded=- 10.95415pt,/pgf/stepy/.expanded=- 10.95415pt, /pgf/step/.value required /pgf/images/width/.estore in= /pgf/images/height/.estore

More information

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy

Testing Goodness-of-Fit for Exponential Distribution Based on Cumulative Residual Entropy This article was downloaded by: [Ferdowsi University] On: 16 April 212, At: 4:53 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 172954 Registered office: Mortimer

More information

arxiv: v2 [math.st] 20 Jun 2014

arxiv: v2 [math.st] 20 Jun 2014 A solution in small area estimation problems Andrius Čiginas and Tomas Rudys Vilnius University Institute of Mathematics and Informatics, LT-08663 Vilnius, Lithuania arxiv:1306.2814v2 [math.st] 20 Jun

More information

On construction of constrained optimum designs

On construction of constrained optimum designs On construction of constrained optimum designs Institute of Control and Computation Engineering University of Zielona Góra, Poland DEMA2008, Cambridge, 15 August 2008 Numerical algorithms to construct

More information

Leibniz Algebras Associated to Extensions of sl 2

Leibniz Algebras Associated to Extensions of sl 2 Communications in Algebra ISSN: 0092-7872 (Print) 1532-4125 (Online) Journal homepage: http://www.tandfonline.com/loi/lagb20 Leibniz Algebras Associated to Extensions of sl 2 L. M. Camacho, S. Gómez-Vidal

More information

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018

SRMR in Mplus. Tihomir Asparouhov and Bengt Muthén. May 2, 2018 SRMR in Mplus Tihomir Asparouhov and Bengt Muthén May 2, 2018 1 Introduction In this note we describe the Mplus implementation of the SRMR standardized root mean squared residual) fit index for the models

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models.

Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. 1 Transformations 1.1 Introduction Diagnostics can identify two possible areas of failure of assumptions when fitting linear models. (i) lack of Normality (ii) heterogeneity of variances It is important

More information

Y it = µ + τ i + ε it ; ε it ~ N(0, σ 2 ) (1)

Y it = µ + τ i + ε it ; ε it ~ N(0, σ 2 ) (1) DOE (Linear Model) Strategy for checking experimental model assumptions Part 1 When we discuss experiments whose data are described or analyzed by the one-way analysis of variance ANOVA model, we note

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 .0.0 5 5 1.0 7 5 X2 X2 7 1.5 1.0 0.5 3 1 2 Hierarchical clustering

More information

The Model Building Process Part I: Checking Model Assumptions Best Practice

The Model Building Process Part I: Checking Model Assumptions Best Practice The Model Building Process Part I: Checking Model Assumptions Best Practice Authored by: Sarah Burke, PhD 31 July 2017 The goal of the STAT T&E COE is to assist in developing rigorous, defensible test

More information

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP

Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP Statistics for analyzing and modeling precipitation isotope ratios in IsoMAP The IsoMAP uses the multiple linear regression and geostatistical methods to analyze isotope data Suppose the response variable

More information

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION

Journal of Asian Scientific Research COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY AND AUTOCORRELATION Journal of Asian Scientific Research ISSN(e): 3-1331/ISSN(p): 6-574 journal homepage: http://www.aessweb.com/journals/5003 COMBINED PARAMETERS ESTIMATION METHODS OF LINEAR REGRESSION MODEL WITH MULTICOLLINEARITY

More information

Multiple linear regression S6

Multiple linear regression S6 Basic medical statistics for clinical and experimental research Multiple linear regression S6 Katarzyna Jóźwiak k.jozwiak@nki.nl November 15, 2017 1/42 Introduction Two main motivations for doing multiple

More information

Reconstruction of individual patient data for meta analysis via Bayesian approach

Reconstruction of individual patient data for meta analysis via Bayesian approach Reconstruction of individual patient data for meta analysis via Bayesian approach Yusuke Yamaguchi, Wataru Sakamoto and Shingo Shirahata Graduate School of Engineering Science, Osaka University Masashi

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

Correlation and Regression

Correlation and Regression Correlation and Regression October 25, 2017 STAT 151 Class 9 Slide 1 Outline of Topics 1 Associations 2 Scatter plot 3 Correlation 4 Regression 5 Testing and estimation 6 Goodness-of-fit STAT 151 Class

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.0 Discrete distributions in statistical analysis Discrete models play an extremely important role in probability theory and statistics for modeling count data. The use of discrete

More information

Course in Data Science

Course in Data Science Course in Data Science About the Course: In this course you will get an introduction to the main tools and ideas which are required for Data Scientist/Business Analyst/Data Analyst. The course gives an

More information

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp

Detection of Influential Observation in Linear Regression. R. Dennis Cook. Technometrics, Vol. 19, No. 1. (Feb., 1977), pp Detection of Influential Observation in Linear Regression R. Dennis Cook Technometrics, Vol. 19, No. 1. (Feb., 1977), pp. 15-18. Stable URL: http://links.jstor.org/sici?sici=0040-1706%28197702%2919%3a1%3c15%3adoioil%3e2.0.co%3b2-8

More information

Practical Statistics for the Analytical Scientist Table of Contents

Practical Statistics for the Analytical Scientist Table of Contents Practical Statistics for the Analytical Scientist Table of Contents Chapter 1 Introduction - Choosing the Correct Statistics 1.1 Introduction 1.2 Choosing the Right Statistical Procedures 1.2.1 Planning

More information

Lecture 5: Clustering, Linear Regression

Lecture 5: Clustering, Linear Regression Lecture 5: Clustering, Linear Regression Reading: Chapter 10, Sections 3.1-3.2 STATS 202: Data mining and analysis October 4, 2017 1 / 22 Hierarchical clustering Most algorithms for hierarchical clustering

More information