Improved Closest Fit Techniques to Handle Missing Attribute Values
|
|
- Ilene Singleton
- 6 years ago
- Views:
Transcription
1 J. Comp. & Math. Sci. Vol.2 (2), (2011) Improved Closest Fit Techniques to Handle Missing Attribute Values SANJAY GAUR and M S DULAWAT Department of Mathematics and Statistics, Maharana, Bhupal Campus Mohanlal Sukhadia University, Udaipur India sanjay.since@gmai.com & dulawat_ms@rediffmail.com ABSTRACT Data preparation for data mining is a fundamental stage of data analysis. Completeness, quality and real world data preparation is a key pre-requisite of successful data mining with its aims to discover something new from the facts already recorded in a certain database. Data with missing values complicates analysis and the application of a solution to new data. To overcome this situation, certain statistical techniques are to be employed during the data preparation. With the help of statistical methods and techniques, we can recover incompleteness of missing data and reduce ambiguities. In this paper, we introduce two sequential methods by which missing attribute values are replaced. A comparative study between both the methods is given are based on moving average method for numerical variables of time series data. Keywords: Missing Values, Attribute, Data preparation, Incompleteness, Moving average, Chronological. MSC (2010) Subject Classification: 62-07, 62N02, 62Q INTRODUCTION Missing values in database is solitary of the biggest problems faced in data analysis and in data mining applications. This missing values problem provoked imbalanced databases. The effects of these missing values are reflected on the final results. Our prime goal is to achieve the final result in the consolidated form on which we are taking decision. In this study, three statistical methods are introduced and discussed which provides an approach to find out
2 385 Sanjay Gaur, et al., J. Comp. & Math. Sci. Vol.2 (2), (2011) pattern to recover or generate missing values from a real imbalanced database with missing values. Therefore, the objective of this comparative study is to find out best fitted method to recover missing values and select records completely filled for further applications. This is based on bivariate analysis. The utility of statistical methods has gained objects in exploring estimation and prediction techniques. Buck 2 suggested estimation of missing values for use with an electronic computer. Kim and Curry 8 considered the treatment of missing data in their analysis. Rubin 10 explored about inference and missing data and multiple imputations for non-response in the survey. Allison 1 investigated estimates of linear models with incomplete data and on missing data. Smyth 11 and Zhang et al. 12 have considered that data preparation is a fundamental stage of data analysis. Chen et al. 3 studied and discussed about multiple imputation for missing ordinal data. Qin 9 considered the semi-parametric optimization for missing data imputation. Gaur and Dulawat 4,5 discussed various algorithms which are useful for estimation of missing values also gave univariate analysis by using mean value at the place of missing values for data preparation. Gyzymala-Busse 7 give idea that every missing attributes values is replaced by all possible known values. They also provided global closest fit and concept closest fit method for missing attribute values. The objective of proposed study is to determine the statistical technique which may be significant in the handling of missing attribute values. 2. FORMULATION OF PROBLEM The proposed methods are based on replacing missing attribute values by the moving average generated values. These methods are very much useful for numerical attributes and accountable under the flag of chronological analysis. In general, these methods are centralized on search of values which is very close to the central tendency of the attribute and closest to the value of just preceding and succeeding value of the missing values. 2.1 Average Fit Approach This is one of the simplest approach of generation of close fit values for missing value place. In this, we first read the complete attribute with missing value cases. Values of attributes are divided under two section that is observed and missing values. Now search of missing case in the attribute get start. The missing value case is pointed by the subscript of the attribute and denoted by the variable. After pointing missing value case, we have to record the preceding value ( ) and succeeding value ( ) from the missing value subscript (
3 Sanjay Gaur, et al., J. Comp. & Math. Sci. Vol.2 (2), (2011) 386 where and NULL At the next stage, after recording the values of just preceding value and succeeding value of the missing value subscript, we compute the average of both values ( ) Now at the average of the values received by the equation (2.1.3) is treated as the estimated values for the current missing values subscript. This estimated value may be as follows: = The value of is separately computed for every missing values subscripts Algorithm (Average Fit Approach) Read {,, } // Attribute with observed and missing values where {,, } // Attribute values observed {,, } // Attribute values missing For i =1 to n do If ( value (x i ) == NULL) then x p = value(x i -1 ) // Value of preceding of x i x s = value(x i +1 ) // Value of succeeding of x i = (x p + x s ) / 2 // Average of preceding and succeeding x est = // Estimated value value (x i ) = x est // Assigning estimated value to missing value place i = i + 1 repeat un till( i >=n) Stop 2.2 Moving Average Fitting Approach The moving average fitting method is based on the moving average concept. This approach is also very much useful for numerical attributes, is search for close fitting value which is very close to the true mean of the attribute and close to the value of just preceding and succeeding value of the missing values in association of the central tendency of attribute. In the proposed method, we first find out the range of moving average of the attribute. Here we proposed range is at least 10% of the used dataset. Therefore, the preceding range would be half (50%) of the moving average range and same for succeeding rage. Now the searches of missing case in the attribute get start. The missing value case is pointed by the subscript of the attribute and denoted by the variable. After pointing missing value case, we have to record the preceding values (,,.., ) and succeeding values (,,.., ) from the missing value subscript (.
4 387 Sanjay Gaur, et al., J. Comp. & Math. Sci. Vol.2 (2), (2011) The values for preceding are computed as x p1 = value (x i -1 ), x p2 = value (x i -2 ) and x pm = value(x i -m ), for succeeding x s1 = value(x i +1 ), x s2 = value(x i +2 ) similarly x sm = value(x i +m ). At the next stage, after recording of preceding & Succeeding values, calculate the average of preceding ( and same for succeeding ( ) = (x p1 + x p2 + +x pm ) / m = (x s1 + x s2 + +x sm ) / m The average of preceding ( and succeeding ( ) is the moving average or estimated values for missing data cell The estimated value is moving average values which is computed may be represent as follows = The estimated value is replaced at the place of missing values. x i = The process of searching of missing value is continuing till the last element of the attribute Algorithm (Moving Average Fitting Approach) Read {,, } where {,, } {,, } x t = int ((count (X) *10)/100) // Attribute values observed // Attribute values missing // Set the range of moving average. Here it is 10% of the dataset. N= x t %2 // Find the reminder If (N==0) then m= x t / 2 else m=( x t + 1)/ 2 Read {,, } // Attribute with observed and missing values For i =1 to n do If ( value (x i ) == NULL) then X p1 = value(x i -1 ) // Value of preceding of x i-1 x p2 = value(x i -2 ) // Value of preceding of x i-2 x pm = value(x i -m ) // Value of preceding of x i-m x s1 = value(x i +1 ) // Value of succeeding of x i+1
5 Sanjay Gaur, et al., J. Comp. & Math. Sci. Vol.2 (2), (2011) 388 x s2 = value(x i +2 ) // Value of succeeding of x i+2.. x sm = value(x i +m ) = (x p1 + x p2 + +x pm ) / m = (x s1 + x s2 + +x sm ) / m = ( + ) / 2 x est = value (x i ) = x est i = i + 1 repeat un till( i >=n) Stop // Value of succeeding of x i+m // Average of preceding and succeeding // Estimated value // Assigning estimated value to missing value place 3. DISCUSSION OF RESULTS Table-A given in appendix shows the world wide emission of carbon dioxide (CO 2 ) from the consumption of Oil and Natural Gas respectively for the years 1960 to The mean emission of carbon dioxide (CO 2 ) due to Oil and Natural Gas are 2262 and 879 respectively. Table-B shows the variables with observed and missing values. It may be noted that in the planned way 20 % of the values are missing in the random manner for all the variables from Table-A. The means calculated from incomplete data sets are 2259 for Oil and 874 for Natural Gas. It is observed that mean values of incomplete data sets of Table-B are slightly lower than the mean values from all the three variables of Table-A. The proposed Simple average fit method is applied on the data sets of Table- B to fill up the missing values. Values recovered or generated from this approach are shown in Table-C for both variables which are highlighted by underline. Further, it is observed that the mean values obtained after replacing the missing values by the closest fit values in Table-C are quite close to the actual mean as given in Table-A. Another proposed moving average approaches gives similar result as the simple average fit method. This is again near to the original mean as given in the table. 4. CONCLUSION It is universally known that there is not 100 % efficient technique of handling missing attribute values. The proposed moving average fit methods are useful for numerical attribute, having minor deviation from the mean. This method is appropriate for the consolidated report, also more appropriate and suitable to fit individual missing values. Here the estimated value gives a resemblance order from the preceding and succeeding values. Consequently, it is observed that techniques for handling of missing attribute values should be chosen individually or based on the nature and type of data.
6 389 Sanjay Gaur, et al., J. Comp. & Math. Sci. Vol.2 (2), (2011) 5. REFERENCES 1. Allison, P.D., Estimation of linear models with incomplete data, Social Methodology, San Francisco: Jossey Bass, pp (1987). 2. Buck, S.F., A method of estimation of missing values in multivariate data suitable for use with an electronic computer, J. Royal Statistical Society, Series B, Vol-2, pp (1960). 3. Chen, L., Drane, M.T., Valois, R.F., and Drane, J.W., Multiple imputation for missing ordinal data, Journal of Modern Applied Statistical Methods, Vol.-4, No.1, pp (2005). 4. Gaur, Sanjay and Dulawat, M.S., A perception of statistical inference in data mining, International Journal of Computer Science and Communication, Vol.-1, No. 2, pp (2010). 5. Gaur, Sanjay and Dulawat, M.S., Univariate Analysis for Data Preparation in context of Missing Values, Journal of Computer and Mathematical Sciences, Vol.-1, No. 5, pp (2010). 6. Gaur, Sanjay and Dulawat, M.S., A Closest Fit Approach to Missing Attribute Values in Data Mining, Communicated to publishing, (2011). 7. Grzymala-Busse, J. W., Data with missing attribute values: Generalization of in-discernibility realtion and rules induction, Transactions of Rough Sets, Lecture Notesin Computer Science Journal Subline, Springer- Verlag, Vol-1, pp (2004). 8. Kim, J. O., and Curry, J., The treatment of missing data in multivariate analysis, Social Methods and Research, Vol.-6, pp (1977). 9. Qin, Y. S., Semi-parametric optimization for missing data imputation, Applied Intelligence, Vol.-27, No. 1, pp (2007). 10. Rubin, D.B., Inference and missing data, Biometrika, 63, pp (1976). 11. Smyth, P., Data mining at the interface of computer Science and Statistics, Data mining for scientific and engineering applications, Department of Information and Computer Science, University of California, CA, , Chapter-1, pp (2001). 12. Zhang, S., Zhang, C., and Young, Q., Data preparation for data mining, Applied Artificial Intelligence, Vol.- 17, pp (2003). Appendix: Global Carbon Dioxide Emissions from Fossil Fuel Burning by Fuel Type, Table -A (Original Table) Table -B (Missing Values) Table -C (Simple Average) Table -D ( Moving Average) Year Oil Natural Gas Year Oil Natural Gas Year Oil Natural Gas Year Oil Natural Gas Million Tonnes Million Tonnes Million Tonnes Million Tonnes , , ,
7 Sanjay Gaur, et al., J. Comp. & Math. Sci. Vol.2 (2), (2011) , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , ,517 1, ,517 1, ,517 1, ,517 1, ,627 1, , ,511 1, ,498 1, ,506 1, ,506 1, ,506 1, ,506 1, ,537 1, ,537 1, ,537 1, ,537 1, ,562 1, ,562 1, ,562 1, ,562 1, ,586 1, ,586 1, ,586 1, ,586 1, ,624 1, , ,634 1, ,645 1, ,707 1, , ,707 1, ,707 1, ,763 1, ,763 1, ,763 1, ,763 1, ,716 1, ,716 1, ,716 1, ,716 1, ,831 1, , ,779 1, ,796 1, ,842 1, ,842 1, ,842 1, ,842 1, ,819 1, , ,819 1, ,819 1, ,928 1, ,928 1, ,928 1, ,928 1, ,032 1, ,032 1, ,032 1, ,032 1, ,079 1, , ,062 1, ,006 1, ,092 1, , ,092 1, ,092 1, ,087 1, ,087 1, ,087 1, ,087 1, ,079 1, ,079 1, ,079 1, ,079 1, ,019 1, ,019 1, ,019 1, ,019 1,552 Average 2, Average 2, Average 2, Average 2, Source:
Modelling Dropouts by Conditional Distribution, a Copula-Based Approach
The 8th Tartu Conference on MULTIVARIATE STATISTICS, The 6th Conference on MULTIVARIATE DISTRIBUTIONS with Fixed Marginals Modelling Dropouts by Conditional Distribution, a Copula-Based Approach Ene Käärik
More informationImputation Algorithm Using Copulas
Metodološki zvezki, Vol. 3, No. 1, 2006, 109-120 Imputation Algorithm Using Copulas Ene Käärik 1 Abstract In this paper the author demonstrates how the copulas approach can be used to find algorithms for
More informationParameters to find the cause of Global Terrorism using Rough Set Theory
Parameters to find the cause of Global Terrorism using Rough Set Theory Sujogya Mishra Research scholar Utkal University Bhubaneswar-751004, India Shakti Prasad Mohanty Department of Mathematics College
More informationClassification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach
Classification of Voice Signals through Mining Unique Episodes in Temporal Information Systems: A Rough Set Approach Krzysztof Pancerz, Wies law Paja, Mariusz Wrzesień, and Jan Warcho l 1 University of
More informationAlgorithmic probability, Part 1 of n. A presentation to the Maths Study Group at London South Bank University 09/09/2015
Algorithmic probability, Part 1 of n A presentation to the Maths Study Group at London South Bank University 09/09/2015 Motivation Effective clustering the partitioning of a collection of objects such
More informationCS570 Data Mining. Anomaly Detection. Li Xiong. Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber.
CS570 Data Mining Anomaly Detection Li Xiong Slide credits: Tan, Steinbach, Kumar Jiawei Han and Micheline Kamber April 3, 2011 1 Anomaly Detection Anomaly is a pattern in the data that does not conform
More informationA Fuzzy Entropy Algorithm For Data Extrapolation In Multi-Compressor System
A Fuzzy Entropy Algorithm For Data Extrapolation In Multi-Compressor System Gursewak S Brar #, Yadwinder S Brar $, Yaduvir Singh * Abstract-- In this paper incomplete quantitative data has been dealt by
More informationStreamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level
Streamlining Missing Data Analysis by Aggregating Multiple Imputations at the Data Level A Monte Carlo Simulation to Test the Tenability of the SuperMatrix Approach Kyle M Lang Quantitative Psychology
More informationQuantization of Rough Set Based Attribute Reduction
A Journal of Software Engineering and Applications, 0, 5, 7 doi:46/sea05b0 Published Online Decemer 0 (http://wwwscirporg/ournal/sea) Quantization of Rough Set Based Reduction Bing Li *, Peng Tang, Tommy
More informationRough Set Approaches for Discovery of Rules and Attribute Dependencies
Rough Set Approaches for Discovery of Rules and Attribute Dependencies Wojciech Ziarko Department of Computer Science University of Regina Regina, SK, S4S 0A2 Canada Abstract The article presents an elementary
More informationDiscrete Multivariate Statistics
Discrete Multivariate Statistics Univariate Discrete Random variables Let X be a discrete random variable which, in this module, will be assumed to take a finite number of t different values which are
More informationIntroduction An approximated EM algorithm Simulation studies Discussion
1 / 33 An Approximated Expectation-Maximization Algorithm for Analysis of Data with Missing Values Gong Tang Department of Biostatistics, GSPH University of Pittsburgh NISS Workshop on Nonignorable Nonresponse
More informationAn Akaike Criterion based on Kullback Symmetric Divergence in the Presence of Incomplete-Data
An Akaike Criterion based on Kullback Symmetric Divergence Bezza Hafidi a and Abdallah Mkhadri a a University Cadi-Ayyad, Faculty of sciences Semlalia, Department of Mathematics, PB.2390 Marrakech, Moroco
More informationAijun An and Nick Cercone. Department of Computer Science, University of Waterloo. methods in a context of learning classication rules.
Discretization of Continuous Attributes for Learning Classication Rules Aijun An and Nick Cercone Department of Computer Science, University of Waterloo Waterloo, Ontario N2L 3G1 Canada Abstract. We present
More informationOutlier Detection Using Rough Set Theory
Outlier Detection Using Rough Set Theory Feng Jiang 1,2, Yuefei Sui 1, and Cungen Cao 1 1 Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences,
More informationToutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates
Toutenburg, Fieger: Using diagnostic measures to detect non-mcar processes in linear regression models with missing covariates Sonderforschungsbereich 386, Paper 24 (2) Online unter: http://epub.ub.uni-muenchen.de/
More informationAction rules mining. 1 Introduction. Angelina A. Tzacheva 1 and Zbigniew W. Raś 1,2,
Action rules mining Angelina A. Tzacheva 1 and Zbigniew W. Raś 1,2, 1 UNC-Charlotte, Computer Science Dept., Charlotte, NC 28223, USA 2 Polish Academy of Sciences, Institute of Computer Science, Ordona
More informationF-tests for Incomplete Data in Multiple Regression Setup
F-tests for Incomplete Data in Multiple Regression Setup ASHOK CHAURASIA Advisor: Dr. Ofer Harel University of Connecticut / 1 of 19 OUTLINE INTRODUCTION F-tests in Multiple Linear Regression Incomplete
More informationPrerequisite: STATS 7 or STATS 8 or AP90 or (STATS 120A and STATS 120B and STATS 120C). AP90 with a minimum score of 3
University of California, Irvine 2017-2018 1 Statistics (STATS) Courses STATS 5. Seminar in Data Science. 1 Unit. An introduction to the field of Data Science; intended for entering freshman and transfers.
More informationStudia Scientiarum Mathematicarum Hungarica 42 (2), (2005) Communicated by D. Miklós
Studia Scientiarum Mathematicarum Hungarica 4 (), 7 6 (5) A METHOD TO FIND THE BEST BOUNDS IN A MULTIVARIATE DISCRETE MOMENT PROBLEM IF THE BASIS STRUCTURE IS GIVEN G MÁDI-NAGY Communicated by D Miklós
More informationFrom statistics to data science. BAE 815 (Fall 2017) Dr. Zifei Liu
From statistics to data science BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Why? How? What? How much? How many? Individual facts (quantities, characters, or symbols) The Data-Information-Knowledge-Wisdom
More informationA CUSUM approach for online change-point detection on curve sequences
ESANN 22 proceedings, European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning. Bruges Belgium, 25-27 April 22, i6doc.com publ., ISBN 978-2-8749-49-. Available
More informationLikelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern
The Korean Communications in Statistics Vol. 12 No. 2, 2005 pp. 473-481 Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern Byungjin
More informationPerformance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project
Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project Devin Cornell & Sushruth Sastry May 2015 1 Abstract In this article, we explore
More informationMARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES
REVSTAT Statistical Journal Volume 13, Number 3, November 2015, 233 243 MARGINAL HOMOGENEITY MODEL FOR ORDERED CATEGORIES WITH OPEN ENDS IN SQUARE CONTINGENCY TABLES Authors: Serpil Aktas Department of
More informationRelationship between Loss Functions and Confirmation Measures
Relationship between Loss Functions and Confirmation Measures Krzysztof Dembczyński 1 and Salvatore Greco 2 and Wojciech Kotłowski 1 and Roman Słowiński 1,3 1 Institute of Computing Science, Poznań University
More informationIntroduction to Matrix Algebra and the Multivariate Normal Distribution
Introduction to Matrix Algebra and the Multivariate Normal Distribution Introduction to Structural Equation Modeling Lecture #2 January 18, 2012 ERSH 8750: Lecture 2 Motivation for Learning the Multivariate
More informationData Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur
Data Mining Prof. Pabitra Mitra Department of Computer Science & Engineering Indian Institute of Technology, Kharagpur Lecture 21 K - Nearest Neighbor V In this lecture we discuss; how do we evaluate the
More informationMohsen Pourahmadi. 1. A sampling theorem for multivariate stationary processes. J. of Multivariate Analysis, Vol. 13, No. 1 (1983),
Mohsen Pourahmadi PUBLICATIONS Books and Editorial Activities: 1. Foundations of Time Series Analysis and Prediction Theory, John Wiley, 2001. 2. Computing Science and Statistics, 31, 2000, the Proceedings
More informationEstimating complex causal effects from incomplete observational data
Estimating complex causal effects from incomplete observational data arxiv:1403.1124v2 [stat.me] 2 Jul 2014 Abstract Juha Karvanen Department of Mathematics and Statistics, University of Jyväskylä, Jyväskylä,
More informationDegenerate Expectation-Maximization Algorithm for Local Dimension Reduction
Degenerate Expectation-Maximization Algorithm for Local Dimension Reduction Xiaodong Lin 1 and Yu Zhu 2 1 Statistical and Applied Mathematical Science Institute, RTP, NC, 27709 USA University of Cincinnati,
More informationBagging During Markov Chain Monte Carlo for Smoother Predictions
Bagging During Markov Chain Monte Carlo for Smoother Predictions Herbert K. H. Lee University of California, Santa Cruz Abstract: Making good predictions from noisy data is a challenging problem. Methods
More informationDynamic Clustering-Based Estimation of Missing Values in Mixed Type Data
Dynamic Clustering-Based Estimation of Missing Values in Mixed Type Data Vadim Ayuyev, Joseph Jupin, Philip Harris and Zoran Obradovic Temple University, Philadelphia, USA 2009 Real Life Data is Often
More informationNew congruences for overcubic partition pairs
New congruences for overcubic partition pairs M. S. Mahadeva Naika C. Shivashankar Department of Mathematics, Bangalore University, Central College Campus, Bangalore-560 00, Karnataka, India Department
More informationOn Improving the k-means Algorithm to Classify Unclassified Patterns
On Improving the k-means Algorithm to Classify Unclassified Patterns Mohamed M. Rizk 1, Safar Mohamed Safar Alghamdi 2 1 Mathematics & Statistics Department, Faculty of Science, Taif University, Taif,
More informationSelected Algorithms of Machine Learning from Examples
Fundamenta Informaticae 18 (1993), 193 207 Selected Algorithms of Machine Learning from Examples Jerzy W. GRZYMALA-BUSSE Department of Computer Science, University of Kansas Lawrence, KS 66045, U. S. A.
More informationA Spatial Regression Analysis Model for Temporal Data Mining in Estimation of House Hold Data Through Different States in India
A Spatial Regression Analysis Model for Temporal Data Mining in Estimation of House Hold Data Through Different States in India A.V.N.Krishna Professor, Computer Science Dept., Indur Institute of Eng.
More informationA COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky
A COMPARISON OF POISSON AND BINOMIAL EMPIRICAL LIKELIHOOD Mai Zhou and Hui Fang University of Kentucky Empirical likelihood with right censored data were studied by Thomas and Grunkmier (1975), Li (1995),
More informationSome methods for handling missing values in outcome variables. Roderick J. Little
Some methods for handling missing values in outcome variables Roderick J. Little Missing data principles Likelihood methods Outline ML, Bayes, Multiple Imputation (MI) Robust MAR methods Predictive mean
More informationEasy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix
Easy Categorization of Attributes in Decision Tables Based on Basic Binary Discernibility Matrix Manuel S. Lazo-Cortés 1, José Francisco Martínez-Trinidad 1, Jesús Ariel Carrasco-Ochoa 1, and Guillermo
More informationRough Set Model Selection for Practical Decision Making
Rough Set Model Selection for Practical Decision Making Joseph P. Herbert JingTao Yao Department of Computer Science University of Regina Regina, Saskatchewan, Canada, S4S 0A2 {herbertj, jtyao}@cs.uregina.ca
More informationSTATE COUNCIL OF EDUCATIONAL RESEARCH AND TRAINING TNCF DRAFT SYLLABUS
STATE COUNCIL OF EDUCATIONAL RESEARCH AND TRAINING TNCF 2017 - DRAFT SYLLABUS Subject :Business Maths Class : XI Unit 1 : TOPIC Matrices and Determinants CONTENT Determinants - Minors; Cofactors; Evaluation
More informationKEYWORDS: fuzzy set, fuzzy time series, time variant model, first order model, forecast error.
IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY A NEW METHOD FOR POPULATION FORECASTING BASED ON FUZZY TIME SERIES WITH HIGHER FORECAST ACCURACY RATE Preetika Saxena*, Satyam
More informationParts 3-6 are EXAMPLES for cse634
1 Parts 3-6 are EXAMPLES for cse634 FINAL TEST CSE 352 ARTIFICIAL INTELLIGENCE Fall 2008 There are 6 pages in this exam. Please make sure you have all of them INTRODUCTION Philosophical AI Questions Q1.
More informationMost hand warmers work by using the heat released from the slow oxidation of iron: The amount your hand temperature rises depends on several factors:
Lecture Presentation Chapter 6 Thermochemistry Chemical Hand Warmers Most hand warmers work by using the heat released from the slow oxidation of iron: Exothermic reaction 4 Fe(s) + 3 O 2 (g) 2 Fe 2 O
More informationTime series denoising with wavelet transform
Paper Time series denoising with wavelet transform Bartosz Kozłowski Abstract This paper concerns the possibilities of applying wavelet analysis to discovering and reducing distortions occurring in time
More informationStatistical Analysis of Competing Risks With Missing Causes of Failure
Proceedings 59th ISI World Statistics Congress, 25-3 August 213, Hong Kong (Session STS9) p.1223 Statistical Analysis of Competing Risks With Missing Causes of Failure Isha Dewan 1,3 and Uttara V. Naik-Nimbalkar
More informationMeasurement Error and Causal Discovery
Measurement Error and Causal Discovery Richard Scheines & Joseph Ramsey Department of Philosophy Carnegie Mellon University Pittsburgh, PA 15217, USA 1 Introduction Algorithms for causal discovery emerged
More informationLikelihood-based inference with missing data under missing-at-random
Likelihood-based inference with missing data under missing-at-random Jae-kwang Kim Joint work with Shu Yang Department of Statistics, Iowa State University May 4, 014 Outline 1. Introduction. Parametric
More informationKnowledge Discovery. Zbigniew W. Ras. Polish Academy of Sciences, Dept. of Comp. Science, Warsaw, Poland
Handling Queries in Incomplete CKBS through Knowledge Discovery Zbigniew W. Ras University of orth Carolina, Dept. of Comp. Science, Charlotte,.C. 28223, USA Polish Academy of Sciences, Dept. of Comp.
More informationAn Approach to Classification Based on Fuzzy Association Rules
An Approach to Classification Based on Fuzzy Association Rules Zuoliang Chen, Guoqing Chen School of Economics and Management, Tsinghua University, Beijing 100084, P. R. China Abstract Classification based
More informationRegression III Lecture 1: Preliminary
Regression III Lecture 1: Preliminary Dave Armstrong University of Western Ontario Department of Political Science Department of Statistics and Actuarial Science (by courtesy) e: dave.armstrong@uwo.ca
More informationNotes on Systems of Linear Congruences
MATH 324 Summer 2012 Elementary Number Theory Notes on Systems of Linear Congruences In this note we will discuss systems of linear congruences where the moduli are all different. Definition. Given the
More informationMultiple Imputation for Missing Data in Repeated Measurements Using MCMC and Copulas
Multiple Imputation for Missing Data in epeated Measurements Using MCMC and Copulas Lily Ingsrisawang and Duangporn Potawee Abstract This paper presents two imputation methods: Marov Chain Monte Carlo
More informationGroup Decision-Making with Incomplete Fuzzy Linguistic Preference Relations
Group Decision-Making with Incomplete Fuzzy Linguistic Preference Relations S. Alonso Department of Software Engineering University of Granada, 18071, Granada, Spain; salonso@decsai.ugr.es, F.J. Cabrerizo
More informationFinite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier
Finite Mixture Model of Bounded Semi-naive Bayesian Networks Classifier Kaizhu Huang, Irwin King, and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong Shatin,
More informationAsymptotic Distribution of the Largest Eigenvalue via Geometric Representations of High-Dimension, Low-Sample-Size Data
Sri Lankan Journal of Applied Statistics (Special Issue) Modern Statistical Methodologies in the Cutting Edge of Science Asymptotic Distribution of the Largest Eigenvalue via Geometric Representations
More informationChallenges in Geocoding Socially-Generated Data
Challenges in Geocoding Socially-Generated Data Jonny Huck (2 nd year part-time PhD student) Duncan Whyatt Paul Coulton Lancaster Environment Centre School of Computing and Communications Royal Wedding
More informationUpstream LNG Technology Prof. Pavitra Sandilya Department of Cryogenic Engineering Centre Indian Institute of Technology, Kharagpur
Upstream LNG Technology Prof. Pavitra Sandilya Department of Cryogenic Engineering Centre Indian Institute of Technology, Kharagpur Lecture 10 Thermophysical Properties of Natural Gas- I Welcome, today
More informationASA Section on Survey Research Methods
REGRESSION-BASED STATISTICAL MATCHING: RECENT DEVELOPMENTS Chris Moriarity, Fritz Scheuren Chris Moriarity, U.S. Government Accountability Office, 411 G Street NW, Washington, DC 20548 KEY WORDS: data
More informationShu Yang and Jae Kwang Kim. Harvard University and Iowa State University
Statistica Sinica 27 (2017), 000-000 doi:https://doi.org/10.5705/ss.202016.0155 DISCUSSION: DISSECTING MULTIPLE IMPUTATION FROM A MULTI-PHASE INFERENCE PERSPECTIVE: WHAT HAPPENS WHEN GOD S, IMPUTER S AND
More informationAnalysis of Gamma and Weibull Lifetime Data under a General Censoring Scheme and in the presence of Covariates
Communications in Statistics - Theory and Methods ISSN: 0361-0926 (Print) 1532-415X (Online) Journal homepage: http://www.tandfonline.com/loi/lsta20 Analysis of Gamma and Weibull Lifetime Data under a
More informationCompatibility of conditionally specified models
Compatibility of conditionally specified models Hua Yun Chen Division of epidemiology & Biostatistics School of Public Health University of Illinois at Chicago 1603 West Taylor Street, Chicago, IL 60612
More informationEvaluation Metrics for Intrusion Detection Systems - A Study
Evaluation Metrics for Intrusion Detection Systems - A Study Gulshan Kumar Assistant Professor, Shaheed Bhagat Singh State Technical Campus, Ferozepur (Punjab)-India 152004 Email: gulshanahuja@gmail.com
More informationA New Method for Forecasting Enrollments based on Fuzzy Time Series with Higher Forecast Accuracy Rate
A New Method for Forecasting based on Fuzzy Time Series with Higher Forecast Accuracy Rate Preetika Saxena Computer Science and Engineering, Medi-caps Institute of Technology & Management, Indore (MP),
More informationPrivacy-Preserving Data Imputation
Privacy-Preserving Data Imputation Geetha Jagannathan Stevens Institute of Technology Hoboken, NJ, 07030, USA gjaganna@cs.stevens.edu Rebecca N. Wright Stevens Institute of Technology Hoboken, NJ, 07030,
More informationStatistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach
Statistical Methods for Handling Incomplete Data Chapter 2: Likelihood-based approach Jae-Kwang Kim Department of Statistics, Iowa State University Outline 1 Introduction 2 Observed likelihood 3 Mean Score
More informationPooling multiple imputations when the sample happens to be the population.
Pooling multiple imputations when the sample happens to be the population. Gerko Vink 1,2, and Stef van Buuren 1,3 arxiv:1409.8542v1 [math.st] 30 Sep 2014 1 Department of Methodology and Statistics, Utrecht
More informationData Mining: Concepts and Techniques. (3 rd ed.) Chapter 8. Chapter 8. Classification: Basic Concepts
Data Mining: Concepts and Techniques (3 rd ed.) Chapter 8 1 Chapter 8. Classification: Basic Concepts Classification: Basic Concepts Decision Tree Induction Bayes Classification Methods Rule-Based Classification
More informationBasics of Modern Missing Data Analysis
Basics of Modern Missing Data Analysis Kyle M. Lang Center for Research Methods and Data Analysis University of Kansas March 8, 2013 Topics to be Covered An introduction to the missing data problem Missing
More informationA Scientometrics Study of Rough Sets in Three Decades
A Scientometrics Study of Rough Sets in Three Decades JingTao Yao and Yan Zhang Department of Computer Science University of Regina [jtyao, zhang83y]@cs.uregina.ca Oct. 8, 2013 J. T. Yao & Y. Zhang A Scientometrics
More informationA Neural Network learning Relative Distances
A Neural Network learning Relative Distances Alfred Ultsch, Dept. of Computer Science, University of Marburg, Germany. ultsch@informatik.uni-marburg.de Data Mining and Knowledge Discovery aim at the detection
More informationLinear binary codes arising from finite groups
Linear binary codes arising from finite groups Yannick Saouter, Member, IEEE Institut Telecom - Telecom Bretagne, Technopôle Brest-Iroise - CS 83818 29238 Brest Cedex, France Email: Yannick.Saouter@telecom-bretagne.eu
More informationMultiple Imputation For Missing Ordinal Data
Journal of Modern Applied Statistical Methods Volume 4 Issue 1 Article 26 5-1-2005 Multiple Imputation For Missing Ordinal Data Ling Chen University of Arizona Marian Toma-Drane University of South Carolina
More informationPotentials of Unbalanced Complex Kinetics Observed in Market Time Series
Potentials of Unbalanced Complex Kinetics Observed in Market Time Series Misako Takayasu 1, Takayuki Mizuno 1 and Hideki Takayasu 2 1 Department of Computational Intelligence & Systems Science, Interdisciplinary
More informationON INTUITIONISTIC FUZZY SOFT TOPOLOGICAL SPACES. 1. Introduction
TWMS J. Pure Appl. Math. V.5 N.1 2014 pp.66-79 ON INTUITIONISTIC FUZZY SOFT TOPOLOGICAL SPACES SADI BAYRAMOV 1 CIGDEM GUNDUZ ARAS) 2 Abstract. In this paper we introduce some important properties of intuitionistic
More informationFeature Selection with Fuzzy Decision Reducts
Feature Selection with Fuzzy Decision Reducts Chris Cornelis 1, Germán Hurtado Martín 1,2, Richard Jensen 3, and Dominik Ślȩzak4 1 Dept. of Mathematics and Computer Science, Ghent University, Gent, Belgium
More informationA review of some semiparametric regression models with application to scoring
A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France
More informationForecasting Enrollments based on Fuzzy Time Series with Higher Forecast Accuracy Rate
Forecasting Enrollments based on Fuzzy Time Series with Higher Forecast Accuracy Rate Preetika Saxena preetikasaxena06@gmail.com Kalyani Sharma kalyanisharma13@gmail.com Santhosh Easo san.easo@gmail.com
More informationMapcube and Mapview. Two Web-based Spatial Data Visualization and Mining Systems. C.T. Lu, Y. Kou, H. Wang Dept. of Computer Science Virginia Tech
Mapcube and Mapview Two Web-based Spatial Data Visualization and Mining Systems C.T. Lu, Y. Kou, H. Wang Dept. of Computer Science Virginia Tech S. Shekhar, P. Zhang, R. Liu Dept. of Computer Science University
More informationDIAGNOSIS OF BIVARIATE PROCESS VARIATION USING AN INTEGRATED MSPC-ANN SCHEME
DIAGNOSIS OF BIVARIATE PROCESS VARIATION USING AN INTEGRATED MSPC-ANN SCHEME Ibrahim Masood, Rasheed Majeed Ali, Nurul Adlihisam Mohd Solihin and Adel Muhsin Elewe Faculty of Mechanical and Manufacturing
More informationCausal Reasoning. Note. Being g is necessary for being f iff being f is sufficient for being g
145 Often need to identify the cause of a phenomenon we ve observed. Perhaps phenomenon is something we d like to reverse (why did car stop?). Perhaps phenomenon is one we d like to reproduce (how did
More informationFundamentals Of Combustion (Part 1) Dr. D.P. Mishra Department of Aerospace Engineering Indian Institute of Technology, Kanpur
Fundamentals Of Combustion (Part 1) Dr. D.P. Mishra Department of Aerospace Engineering Indian Institute of Technology, Kanpur Lecture 09 Stoichiometric calculations for air-gas mixture Let us start this
More informationGranular Computing: Granular Classifiers and Missing Values
1 Granular Computing: Granular Classifiers and Missing Values Lech Polkowski 1,2 and Piotr Artiemjew 2 Polish-Japanese Institute of Information Technology 1 Koszykowa str. 86, 02008 Warsaw, Poland; Department
More informationANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW
SSC Annual Meeting, June 2015 Proceedings of the Survey Methods Section ANALYSIS OF ORDINAL SURVEY RESPONSES WITH DON T KNOW Xichen She and Changbao Wu 1 ABSTRACT Ordinal responses are frequently involved
More informationNonparametric Density Estimation. October 1, 2018
Nonparametric Density Estimation October 1, 2018 Introduction If we can t fit a distribution to our data, then we use nonparametric density estimation. Start with a histogram. But there are problems with
More informationA Generic Multivariate Distribution for Counting Data
arxiv:1103.4866v1 [stat.ap] 24 Mar 2011 A Generic Multivariate Distribution for Counting Data Marcos Capistrán and J. Andrés Christen Centro de Investigación en Matemáticas, A. C. (CIMAT) Guanajuato, MEXICO.
More informationCENTRAL TENDENCY (1 st Semester) Presented By Dr. Porinita Dutta Department of Statistics
CENTRAL TENDENCY (1 st Semester) Presented By Dr. Porinita Dutta Department of Statistics OUTLINES Descriptive Statistics Introduction of central tendency Classification Characteristics Different measures
More informationInterpreting Low and High Order Rules: A Granular Computing Approach
Interpreting Low and High Order Rules: A Granular Computing Approach Yiyu Yao, Bing Zhou and Yaohua Chen Department of Computer Science, University of Regina Regina, Saskatchewan, Canada S4S 0A2 E-mail:
More informationDecision Tree Learning
Decision Tree Learning Berlin Chen Department of Computer Science & Information Engineering National Taiwan Normal University References: 1. Machine Learning, Chapter 3 2. Data Mining: Concepts, Models,
More informationMultivariate Analysis of Ecological Data using CANOCO
Multivariate Analysis of Ecological Data using CANOCO JAN LEPS University of South Bohemia, and Czech Academy of Sciences, Czech Republic Universitats- uric! Lanttesbibiiothek Darmstadt Bibliothek Biologie
More informationConvergence Rate of Expectation-Maximization
Convergence Rate of Expectation-Maximiation Raunak Kumar University of British Columbia Mark Schmidt University of British Columbia Abstract raunakkumar17@outlookcom schmidtm@csubcca Expectation-maximiation
More informationLocal Feature Extraction Models from Incomplete Data in Face Recognition Based on Nonnegative Matrix Factorization
American Journal of Software Engineering and Applications 2015; 4(3): 50-55 Published online May 12, 2015 (http://www.sciencepublishinggroup.com/j/ajsea) doi: 10.11648/j.ajsea.20150403.12 ISSN: 2327-2473
More informationAbstract. Three Methods and Their Limitations. N-1 Experiments Suffice to Determine the Causal Relations Among N Variables
N-1 Experiments Suffice to Determine the Causal Relations Among N Variables Frederick Eberhardt Clark Glymour 1 Richard Scheines Carnegie Mellon University Abstract By combining experimental interventions
More informationCOMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY
Annales Univ Sci Budapest, Sect Comp 45 2016) 45 55 COMPETING RISKS WEIBULL MODEL: PARAMETER ESTIMATES AND THEIR ACCURACY Ágnes M Kovács Budapest, Hungary) Howard M Taylor Newark, DE, USA) Communicated
More informationROUGHNESS IN MODULES BY USING THE NOTION OF REFERENCE POINTS
Iranian Journal of Fuzzy Systems Vol. 10, No. 6, (2013) pp. 109-124 109 ROUGHNESS IN MODULES BY USING THE NOTION OF REFERENCE POINTS B. DAVVAZ AND A. MALEKZADEH Abstract. A module over a ring is a general
More informationA Simple Implementation of the Stochastic Discrimination for Pattern Recognition
A Simple Implementation of the Stochastic Discrimination for Pattern Recognition Dechang Chen 1 and Xiuzhen Cheng 2 1 University of Wisconsin Green Bay, Green Bay, WI 54311, USA chend@uwgb.edu 2 University
More informationA stochastic modeling for paddy production in Tamilnadu
2017; 2(5): 14-21 ISSN: 2456-1452 Maths 2017; 2(5): 14-21 2017 Stats & Maths www.mathsjournal.com Received: 04-07-2017 Accepted: 05-08-2017 M Saranyadevi Assistant Professor (GUEST), Department of Statistics,
More informationMixture Models and EM
Mixture Models and EM Goal: Introduction to probabilistic mixture models and the expectationmaximization (EM) algorithm. Motivation: simultaneous fitting of multiple model instances unsupervised clustering
More informationMinimal Attribute Space Bias for Attribute Reduction
Minimal Attribute Space Bias for Attribute Reduction Fan Min, Xianghui Du, Hang Qiu, and Qihe Liu School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu
More information