Construction of Composite Indices in Presence of Outliers

Similar documents
Algorithms behind the Correlation Setting Window

KURODA S METHOD FOR CONSTRUCTING CONSISTENT INPUT-OUTPUT DATA SETS. Peter J. Wilcoxen. Impact Research Centre, University of Melbourne.

Standard Deviation for PDG Mass Data

The Mathematics of Portfolio Theory

Some Different Perspectives on Linear Least Squares

Functions of Random Variables

A New Method for Solving Fuzzy Linear. Programming by Solving Linear Programming

7.0 Equality Contraints: Lagrange Multipliers

Summary of the lecture in Biostatistics

Median as a Weighted Arithmetic Mean of All Sample Observations

Some results and conjectures about recurrence relations for certain sequences of binomial sums.

Econometric Methods. Review of Estimation

SUBCLASS OF HARMONIC UNIVALENT FUNCTIONS ASSOCIATED WITH SALAGEAN DERIVATIVE. Sayali S. Joshi

D. L. Bricker, 2002 Dept of Mechanical & Industrial Engineering The University of Iowa. CPL/XD 12/10/2003 page 1

Mean is only appropriate for interval or ratio scales, not ordinal or nominal.

ESS Line Fitting

2.28 The Wall Street Journal is probably referring to the average number of cubes used per glass measured for some population that they have chosen.

CS 2750 Machine Learning. Lecture 7. Linear regression. CS 2750 Machine Learning. Linear regression. is a linear combination of input components x

Department of Mathematics UNIVERSITY OF OSLO. FORMULAS FOR STK4040 (version 1, September 12th, 2011) A - Vectors and matrices

2/20/2013. Topics. Power Flow Part 1 Text: Power Transmission. Power Transmission. Power Transmission. Power Transmission

Comparison of Dual to Ratio-Cum-Product Estimators of Population Mean

ECONOMETRIC THEORY. MODULE VIII Lecture - 26 Heteroskedasticity

Analysis of Variance with Weibull Data

f f... f 1 n n (ii) Median : It is the value of the middle-most observation(s).

Parallelized methods for solving polynomial equations

3.1 Introduction to Multinomial Logit and Probit

CHAPTER VI Statistical Analysis of Experimental Data

ε. Therefore, the estimate

12.2 Estimating Model parameters Assumptions: ox and y are related according to the simple linear regression model

Third handout: On the Gini Index

Chapter Two. An Introduction to Regression ( )

2SLS Estimates ECON In this case, begin with the assumption that E[ i

Investigation of Partially Conditional RP Model with Response Error. Ed Stanek

Module 7: Probability and Statistics

Lecture 1 Review of Fundamental Statistical Concepts

MEASURES OF DISPERSION

Statistics MINITAB - Lab 5

Simple Linear Regression

The number of observed cases The number of parameters. ith case of the dichotomous dependent variable. the ith case of the jth parameter

Ordinary Least Squares Regression. Simple Regression. Algebra and Assumptions.

Collapsing to Sample and Remainder Means. Ed Stanek. In order to collapse the expanded random variables to weighted sample and remainder

Can we take the Mysticism Out of the Pearson Coefficient of Linear Correlation?

Lecture 8 IEEE DCF Performance

THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA

Module 7. Lecture 7: Statistical parameter estimation

Multiple Choice Test. Chapter Adequacy of Models for Regression

A Bivariate Distribution with Conditional Gamma and its Multivariate Form

Chapter 14 Logistic Regression Models

Chapter Statistics Background of Regression Analysis

Derivation of 3-Point Block Method Formula for Solving First Order Stiff Ordinary Differential Equations

A Mean Deviation Based Method for Intuitionistic Fuzzy Multiple Attribute Decision Making

PTAS for Bin-Packing

b. There appears to be a positive relationship between X and Y; that is, as X increases, so does Y.

A Conventional Approach for the Solution of the Fifth Order Boundary Value Problems Using Sixth Degree Spline Functions

Application of Calibration Approach for Regression Coefficient Estimation under Two-stage Sampling Design

Coherent Potential Approximation

Investigating Cellular Automata

Sebastián Martín Ruiz. Applications of Smarandache Function, and Prime and Coprime Functions

Continuous Distributions

Non-degenerate Perturbation Theory

Lecture Notes Types of economic variables

Unimodality Tests for Global Optimization of Single Variable Functions Using Statistical Methods

Chapter 5 Properties of a Random Sample

Lecture 07: Poles and Zeros

ENGI 3423 Simple Linear Regression Page 12-01

Study of Correlation using Bayes Approach under bivariate Distributions

Comparing Different Estimators of three Parameters for Transmuted Weibull Distribution

A Family of Non-Self Maps Satisfying i -Contractive Condition and Having Unique Common Fixed Point in Metrically Convex Spaces *

STA302/1001-Fall 2008 Midterm Test October 21, 2008

ENGI 4421 Joint Probability Distributions Page Joint Probability Distributions [Navidi sections 2.5 and 2.6; Devore sections

PRACTICAL CONSIDERATIONS IN HUMAN-INDUCED VIBRATION

5. Data Compression. Review of Last Lecture. Outline of the Lecture. Course Overview. Basics of Information Theory: Markku Juntti

{ }{ ( )} (, ) = ( ) ( ) ( ) Chapter 14 Exercises in Sampling Theory. Exercise 1 (Simple random sampling): Solution:

The Geometric Least Squares Fitting Of Ellipses

1 Lyapunov Stability Theory

Statistics Descriptive and Inferential Statistics. Instructor: Daisuke Nagakura

Lecture 7. Confidence Intervals and Hypothesis Tests in the Simple CLR Model

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 17

A Penalty Function Algorithm with Objective Parameters and Constraint Penalty Parameter for Multi-Objective Programming

Chapter Business Statistics: A First Course Fifth Edition. Learning Objectives. Correlation vs. Regression. In this chapter, you learn:

ROOT-LOCUS ANALYSIS. Lecture 11: Root Locus Plot. Consider a general feedback control system with a variable gain K. Y ( s ) ( ) K

Analytical Study of Fractal Dimension Types in the Context of SPC Technical Paper. Noa Ruschin Rimini, Irad Ben-Gal and Oded Maimon

Chapter 8. Inferences about More Than Two Population Central Values

Simulation Output Analysis


Relations to Other Statistical Methods Statistical Data Analysis with Positive Definite Kernels

Maximum Likelihood Estimation

Lecture 3 Probability review (cont d)

Objectives of Multiple Regression

Confidence Intervals for Double Exponential Distribution: A Simulation Approach

Multivariate Transformation of Variables and Maximum Likelihood Estimation

ECON 5360 Class Notes GMM

An Innovative Algorithmic Approach for Solving Profit Maximization Problems

Probability and. Lecture 13: and Correlation

Lecture 3. Sampling, sampling distributions, and parameter estimation

22 Nonparametric Methods.

Likewise, properties of the optimal policy for equipment replacement & maintenance problems can be used to reduce the computation.

MULTIOBJECTIVE NONLINEAR FRACTIONAL PROGRAMMING PROBLEMS INVOLVING GENERALIZED d - TYPE-I n -SET FUNCTIONS

Measures of Dispersion

International Journal of Mathematical Archive-3(5), 2012, Available online through ISSN

Transcription:

Costructo of Coposte dces Presece of Outlers SK Mshra Dept. of Ecoocs North-Easter Hll Uversty Shllog (da). troducto: Oftetes we requre costructg coposte dces by a lear cobato of a uber of dcator varables. f we deote the dcator varables by X = [ x, x,..., x ] where each x has observatos (cases) ad weghts assged to those varables by w = [ w, w,..., w ] the the coposte dex = Xw obtas a sgle value for each case, or = x w ; =,. The weghts ay be detered subectvely or obectvely by certa cosderatos extraeous to the dataset X, or alteratvely they ay edogeously be detered by the statstcal forato obtaed fro dataset X tself. Edogeous weghts are frequetly obtaed by a statstcal techque called the Prcpal Copoets Aalyss (PCA), whch axzes the su of squared coeffcets of (the product oet) correlato betwee the derved coposte dex ad the dcator varables, X, or stated dfferetly, = Xw such that r (, x ) s axu. presece of szeable outlers the data varables, X, we caot expect the product oets correlato coeffcets to rea uaffected. The outlers dstort ea, stadard devato ad the covarace structure of the dcator varables leadg to dstorto the coeffcet of correlato (Hapel, 00). t ay be desrable, therefore, to devse a techque that would ze the fluece of outlers o the coposte dex. Our obectve ths paper s to propose a ew techque to costruct such a coposte dex. We also deostrate the effectveess of the proposed techque by a sulato experet.. The Coeffcet of Correlato the Meda Faly: t s well ow that eda as a easure of cetral tedecy s (orally) uaffected by the presece of outlers the data. The eda s a aalogue of the (arthetc) ea; t zes the su of probablty-weghted absolute devatos of data pots fro tself ( c = L x c p for L=) whle the arthetc ea zes the probablty-weghted su of squared devatos of data pots fro tself (that ples c = L x c p / L for L=). (985) showed that f ( u, v ); =, are pars of values such that the varables u ad v have the sae eda = 0 ad the sae ea devato (fro eda) or = =, both of whch codtos ay be et by ay par (/ ) u = (/ ) v = d 0 / L

of varables whe sutably trasfored, the the absolute correlato ay be defed as ρ ( u, v) = ( u + v u v ) ( u + v ). = =. Costructo of a Coposte dex Usg s Correlato: s coeffcet of correlato (that belogs to the eda faly) s a aalogue of the Pearso s product oet correlato coeffcet ( the faly of arthetc ea). t appears therefore that oe ay costruct a coposte dex y axzato of the su of absolute values of s coeffcet of correlato betwee the coposte dex, ad the dcator varables (although ay other easure of correlato e.g. Shevlyaov 997 ay also be used). Ths s to say that we ca obta = Xw such that ρ(, x ) s axal. Ths coposte dex,, wll be aalogous to the PCA-based dex,, that axzes the su of squared su of the Pearso s coeffcets of correlato betwee the coposte dex ad the dcator varables or = Xw : ax r (, x ) ax r (, x ). V. ssues Relatg to Maxzato: Obtag the PCA-based coposte dex s spler sce t has a closed for forula. The (Pearso s) correlato atrx, R s costructed fro X such that R = (/ ) X X where x X has zero ea ad ut stadard devato. The largest egevalue ( λ ) ad the assocated egevector ( e ) of R s obtaed. The egevector s oralzed so that e =. The oralzed egevector s used as the weght, w, to obta. = Xw t s possble, evertheless, to drectly obta the coposte dex,, by axzg r (, x ) : = Xw. There s o closed for forula for obtag = Xw such that ρ / (, x ) to drectly obta t by solvg the trcate axzato proble. s axal. Hece, oe has V. Nolear Optzato by Dfferetal Evoluto: The ethod of Dfferetal Evoluto (DE) s oe of the ost powerful self-orgazg, evolutoary, populatobased ad stochastc global optzato ethods. t s a outgrowth of the Geetc Algorths. The crucal dea behd DE s a schee for geeratg tral paraeter vectors. tally, a populato of pots (p d-desoal space) s geerated ad evaluated (.e. f(p) s obtaed) for ther ftess. The for each pot (p ) three dfferet pots (p a, p b ad p c ) are radoly chose fro the populato. A ew pot (p z ) s costructed fro those three pots by addg the weghted dfferece betwee two pots (w(p b -p c )) to the thrd pot (p a ). The ths ew pot (p z ) s subected to a crossover wth the curret pot (p ) wth a probablty of crossover (c r ), yeldg a caddate pot, say p u. Ths pot, p u, s evaluated ad f foud better tha p the t replaces p else p reas. Thus we obta a ew vector whch all pots are ether better tha or as good as the curret pots. Ths ew vector s used for the ext terato. Ths process aes the dfferetal evaluato schee copletely self-orgazg. Ths ethod has bee successfully appled for optzg extreely olear ad ultodal fuctos (Mshra, 007a, 007b ad 007c).

V. A Sulato Experet: We have coducted a sulato experet to exae the effectveess of our proposed ethod. We have geerated a atrx, X, of sx varables, each 30 observatos. The correlato atrx of these varables s gve Table-. Usg these varables, we have obtaed two coposte dces by drect optzato: the oe ( 0 ) relatg to the ethod proposed by us ad the other ( 0 ) relatg to the PCA. Both of these dces are stadardzed by usg the relatoshp [ ( ) ]/ ax( ) ( )] ; =, so as to ae the dex values le [ betwee zero ad uty These coposte dces serve as referece sce X does ot cota outlers. t s terestg to ote (see table-) that 0 ad 0 are hghly correlated (r = 0.998), although weghts (w ) ad correlato coeffcets (ρ) are uforly saller ( agtude) tha the Pearso weghts (w ) ad correlato coeffcets (r). Next, we troduce outlers to X. Three outlers (ragg betwee -0 to 0) have bee added to each dcator varable (x ;, ) at rado locatos. The, usg these (cotaated) varables, the two coposte dces ( ad ) have bee obtaed. The dces have bee stadardzed as before to le betwee zero ad uty. The results are preseted Table-. All derved coposte dces are preseted Table-3. The root-ea-square (RMS) = = (/ ) ( ) = 0.0608 0 = 0 for our proposed ethod vs-à-vs RMS = (/ ) ( ) = 0.07306 obtaed for the PCA-based dex suggests us that presece of outlers our proposed ethod wll perfor better. As show the graph (Fg.), the fluctuatos appear to be ore tha those. Table. : Correlato Coeffcets ad Weghts for the Referece dcator Varables (Wthout Outlers) Varables X X X 3 X 4 X 5 X 6 0 0 X.00000 0.9 0.79774-0.80408 0.90597-0.8839 0.9833 0.97609 X 0.9.00000 0.658-0.7037 0.8905-0.76986 0.998 0.9074 X 3 0.79774 0.658.00000-0.7699 0.6645-0.7764 0.8477 0.84445 X 4-0.80408-0.7037-0.7699.00000-0.874 0.6984-0.86607-0.8794 X 5 0.90597 0.8905 0.6645-0.874.00000-0.78670 0.9443 0.93406 X 6-0.8839-0.76986-0.7764 0.6984-0.78670.00000-0.88785-0.9049 0 0.9833 0.998 0.8477-0.86607 0.9443-0.88785.00000 0.998 0 0.97609 0.9074 0.84445-0.8794 0.93406-0.9049 0.998.00000 weghts Correlato Pearso weghts Pearso correlato 0.45546 0.376 0.3684-0.943 0.35443-0.693 0.8974 0.7579 0.7083-0.68475 0.783-0.75640 0.54837 0.56794 0.7076-0.80485 0.5640-0.58643 0.97609 0.9074 0.84445-0.8794 0.93406-0.9049 0 = Coposte dex by axzato of the su of absolute s Correlato Coeffcets 0 = Coposte dex by axzato of the su of squared Pearso s Correlato Coeffcets 3

Table. : Correlato Coeffcets ad Weghts for the Referece dcator Varables (Wth three Outlers betwee -0 ad 0) Varables X X X 3 X 4 X 5 X 6 X.00000 0.6890 0.63464-0.60439 0.8649-0.74930 0.96985 0.9635 X 0.6890.00000 0.53335-0.374 0.6300-0.4538 0.73477 0.7478 X 3 0.63464 0.53335.00000-0.87 0.48497-0.45498 0.6536 0.7046 X 4-0.60439-0.374-0.87.00000-0.6073 0.45490-0.57758-0.65697 X 5 0.8649 0.6300 0.48497-0.6073.00000-0.60940 0.9400 0.898 X 6-0.74930-0.4538-0.45498 0.45490-0.60940.00000-0.7637-0.78645 0.96985 0.73477 0.6536-0.57758 0.9400-0.7637.00000 0.9853 0.9635 0.7478 0.7046-0.65697 0.898-0.78645 0.9853.00000 0.35778 0.0945 0.3863 0.0485 0.5405-0.586 = Coposte dex by weghts axzato of the su of absolute s 0.87477 0.6553 0.56840-0.5093 0.80043-0.6808 Correlato Correlato Coeffcets Pearso 0.45695 0.4839 0.557-0.47088 0.5366-0.4539 = Coposte dex by weghts axzato of the su Pearso of squared Pearso s 0.9635 0.7478 0.7047-0.65696 0.898-0.78645 correlato Correlato Coeffcets 4

Sl. Table.3 : Coposte dces wth (-0, 0 rage) Outlers ad Wthout Outlers Wthout Outlers Wth Outlers Wthout Outlers Wth Outlers No. 0 0 Sl. No. 0 0 0.00000 0.03 0.066 0.05730 6 0.045 0.08 0.00000 0.00000 0.348 0.4609 0.966 0.7855 7 0.5309 0.5573 0.5343 0.57499 3 0.88073 0.84975 0.9008 0.8784 8 0.63358 0.65675 0.6446 0.676 4 0.68067 0.67673 0.6788 0.5797 9 0.774 0.70344 0.7556 0.739 5 0.7654 0.78795 0.886 0.9680 0 0.65483 0.6435 0.66060 0.6780 6 0.38436 0.37895 0.3050 0.34575 0.379 0.3374 0.389 0.4899 7 0.0063 0.00000 0.07506 0.0755 0.6 0.633 0.7385 0.73 8 0.3555 0.3465 0.35433 0.3673 3 0.4573 0.46566 0.4880 0.4906 9 0.64 0.559 0.455 0.554 4 0.3696 0.9988 0.39360 0.37343 0 0.4863 0.47765 0.49373 0.50036 5 0.7854 0.7967 0.69088 0.56360 0.6808 0.69665 0.6697 0.7403 6 0.454 0.4897 0.45679 0.47503 0.3875 0.3640 0.4909 0.3784 7 0.40770 0.37683 0.5886 0.460 3 0.56575 0.5739 0.57338 0.595 8 0.9677 0.87900 0.9738 0.84678 4 0.4006 0.405 0.3900 0.406 9 0.99074.00000 0.85489 0.8848 5.00000 0.98508.00000.00000 30 0.67744 0.67370 0.69074 0.6983 Refereces, C. (985) The Absolute Correlato, The Matheatcal Gazette, 69(447), pp. -7. Hapel, F. (00) Robust Statstcs: a Bref troducto ad Overvew, ftp://ftp.stat.ath.ethz.ch/research-reports/94.pdf Mshra, S.K. (007a): Perforace of Dfferetal Evoluto Method Least Squares Fttg of Soe Typcal Nolear Curves Joural of Quattatve Ecoocs, 5(), pp. 40-77. Mshra, S.K. (007b): Least Squares Estato of Jot Producto Fuctos by the Dfferetal Evoluto ethod of Global Optzato. Ecoocs Bullet, 3(5), pp. -3. Mshra, S.K. (007c) "Costructo of a dex by Maxzato of the Su of ts Absolute Correlato Coeffcets wth the Costtuet Varables" SSRN: http://ssr.co/abstract=989088 Shevlyaov, G.L. (997) O Robust Estato of a Correlato Coeffcet, Joural of Matheatcal Sceces, 83(3), pp. 434-438. Note: A Fortra Coputer progra to copute Coposte dces usg s absolute correlato ad PCA by drect axzato s avalable o http://www.webg.co/ecoocs 5