Double Gamma Principal Components Analysis

Similar documents
Gaussian Copula Regression Application

Estimation of Stress-Strength Reliability for Kumaraswamy Exponential Distribution Based on Upper Record Values

The Inverse Weibull Inverse Exponential. Distribution with Application

Parameter Estimation of Power Lomax Distribution Based on Type-II Progressively Hybrid Censoring Scheme

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Solving Homogeneous Systems with Sub-matrices

Convex Sets Strict Separation in Hilbert Spaces

On a Certain Representation in the Pairs of Normed Spaces

The Modified Adomian Decomposition Method for. Solving Nonlinear Coupled Burger s Equations

Approximations to the t Distribution

Machine Learning. Lecture 4: Regularization and Bayesian Statistics. Feng Li.

An Improved Hybrid Algorithm to Bisection Method and Newton-Raphson Method

Recursive Relation for Zero Inflated Poisson Mixture Distributions

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Empirical Comparison of ML and UMVU Estimators of the Generalized Variance for some Normal Stable Tweedie Models: a Simulation Study

On Monitoring Shift in the Mean Processes with. Vector Autoregressive Residual Control Charts of. Individual Observation

Estimation of the Bivariate Generalized. Lomax Distribution Parameters. Based on Censored Samples

PILCO: A Model-Based and Data-Efficient Approach to Policy Search

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

The EM Algorithm for the Finite Mixture of Exponential Distribution Models

Research on Independence of. Random Variables

Estimation Under Multivariate Inverse Weibull Distribution

Why Bellman-Zadeh Approach to Fuzzy Optimization

STA 4273H: Sta-s-cal Machine Learning

Maximum Likelihood Estimation. only training data is available to design a classifier

Histogram Arithmetic under Uncertainty of. Probability Density Function

A Class of Multi-Scales Nonlinear Difference Equations

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables THE UNIVERSITY OF MANCHESTER. 21 June :45 11:45

Bayesian and Non Bayesian Estimations for. Birnbaum-Saunders Distribution under Accelerated. Life Testing Based oncensoring sampling

Variational Principal Components

Density Estimation: ML, MAP, Bayesian estimation

Machine Learning Basics Lecture 2: Linear Classification. Princeton University COS 495 Instructor: Yingyu Liang

Catastrophe Theory and Postmodern General Equilibrium: Some Critical Observations

Lecture 7 Introduction to Statistical Decision Theory

International Mathematical Forum, Vol. 9, 2014, no. 36, HIKARI Ltd,

Some Reviews on Ranks of Upper Triangular Block Matrices over a Skew Field

Explicit Expressions for Free Components of. Sums of the Same Powers

A Note on the Variational Formulation of PDEs and Solution by Finite Elements

Ensemble Spatial Autoregressive Model on. the Poverty Data in Java

A Disaggregation Approach for Solving Linear Diophantine Equations 1

Double Total Domination on Generalized Petersen Graphs 1

On Positive Stable Realization for Continuous Linear Singular Systems

Forecasting Egyptian GDP Using ARIMA Models

Unsupervised Learning

A Note on Multiplicity Weight of Nodes of Two Point Taylor Expansion

Logistic-Modified Weibull Distribution and Parameter Estimation

Some Properties of a Semi Dynamical System. Generated by von Forester-Losata Type. Partial Equations

Parameter Estimation. Industrial AI Lab.

Machine learning - HT Maximum Likelihood

The Expansion of the Confluent Hypergeometric Function on the Positive Real Axis

Generalized Functions for the Fractional Calculus. and Dirichlet Averages

On Symmetric Bi-Multipliers of Lattice Implication Algebras

Pattern Recognition. Parameter Estimation of Probability Density Functions

ECE 275A Homework 7 Solutions

On Generalized Derivations and Commutativity. of Prime Rings with Involution

Double Total Domination in Circulant Graphs 1

Parameter Estimation for ARCH(1) Models Based on Kalman Filter

Akaike Information Criterion to Select the Parametric Detection Function for Kernel Estimator Using Line Transect Data

Note on the Expected Value of a Function of a Fuzzy Variable

Introduction to Probabilistic Machine Learning

The Expected Opportunity Cost and Selecting the Optimal Subset

Exact Solutions for a Fifth-Order Two-Mode KdV Equation with Variable Coefficients

STATISTICS ANCILLARY SYLLABUS. (W.E.F. the session ) Semester Paper Code Marks Credits Topic

Statistical Inference Using Progressively Type-II Censored Data with Random Scheme

Step-Stress Models and Associated Inference

Multilevel Structural Equation Model with. Gifi System in Understanding the. Satisfaction of Health Condition at Java

Improvements in Newton-Rapshon Method for Nonlinear Equations Using Modified Adomian Decomposition Method

STA414/2104 Statistical Methods for Machine Learning II

Parameters Estimation for a Linear Exponential Distribution Based on Grouped Data

Machine Learning Basics: Maximum Likelihood Estimation

Mixtures of Robust Probabilistic Principal Component Analyzers

Dynamic Model of Space Robot Manipulator

Mathematical Models Based on Boussinesq Love equation

Forecasting with Expert Opinions

On Two New Classes of Fibonacci and Lucas Reciprocal Sums with Subscripts in Arithmetic Progression

Chapter 4: Factor Analysis

A Signed-Rank Test Based on the Score Function

Determination of Young's Modulus by Using. Initial Data for Different Boundary Conditions

Review. DS GA 1002 Statistical and Mathematical Models. Carlos Fernandez-Granda

Research Article Comparative Features Extraction Techniques for Electrocardiogram Images Regression

Mathematical statistics

Introduction to Probability and Statistics (Continued)

Recent Advances in Bayesian Inference Techniques

IE 303 Discrete-Event Simulation

Mathematical statistics

Metric Analysis Approach for Interpolation and Forecasting of Time Processes

y Xw 2 2 y Xw λ w 2 2

A Queueing Model for Sleep as a Vacation

New Nonlinear Conditions for Approximate Sequences and New Best Proximity Point Theorems

2 Statistical Estimation: Basic Concepts

Exact Linear Likelihood Inference for Laplace

Overview. Probabilistic Interpretation of Linear Regression Maximum Likelihood Estimation Bayesian Estimation MAP Estimation

Generalization of the Banach Fixed Point Theorem for Mappings in (R, ϕ)-spaces

Formula for Lucas Like Sequence of Fourth Step and Fifth Step

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

the Presence of k Outliers

Mixture Models and EM

A Note on Linearly Independence over the Symmetrized Max-Plus Algebra

Problem 1 (20) Log-normal. f(x) Cauchy

Alternative Biased Estimator Based on Least. Trimmed Squares for Handling Collinear. Leverage Data Points

Transcription:

Applied Mathematical Sciences, Vol. 12, 2018, no. 11, 523-533 HIKARI Ltd, www.m-hikari.com https://doi.org/10.12988/ams.2018.8455 Double Gamma Principal Components Analysis Ameerah O. Bahashwan, Zakiah I. Kalantan and Samia A. Adham Department of Statistics, Faculty of Science, King Abdulaziz University Jeddah, Kingdom of Saudi Arabia Copyright 2018 Ameerah O. Bahashwan, Zakiah I. Kalantan and Samia A. Adham. This article is distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract This paper proposes a Double Gamma (DGamma) Principal Components Analysis (DGamma PCA); it considers effective PCA method to noise. We will utilize the DGamma distribution to model noise. An exact form of the probability density function (pdf) of DGamma distribution will be viewed. In addition, introduce some of graphical illustration of the pdf of the DGamma distribution; view Moment generation function of DGamma distribution. Moreover, maximum likelihood estimator (MLE) of DGamma distribution is obtained. Finally, experimental results on simulated data of DGamma PCA to noise are demonstrated. Keywords: Double Gamma distribution, Maximum likelihood estimation, DGamma PCA 1. Introduction The dimension reduction is a process of projecting high-dimensional data to a much lower-dimensional space. Determining patterns in data of high dimension can be hard to find. Principal Components Analysis (PCA) is a way of identifying patterns in data, and expressing the data so that PCA is considering a powerful tool for analyzing data [1]. PCA based on Gaussian noise model is sensitive to the noise [2]. Principal Components Analysis (PCA) is a standard statistical tool which have been widely used in dimensionality reduction, data compression and image processing. It looks for a linear transformation where reduces a large set of variables in which the amount of variance in the data is maximal. PCA method is applied in many fields, such as pattern recognition [3], image processing [4], regression application [5] and data mining [6].

524 Ameerah O. Bahashwan et al. Historically, a number of natural approaches of PCA have been explored and proposed in the literature over several decades. Robust PCA methods can be categorized into two paradigms: non-probabilistic approaches and probabilistic approaches. The basic strategy of non-probabilistic methods is to remove the influence of large noise in corrupted data items. While, probabilistic approaches demonstrating that PCA may indeed be derived within a density-estimation framework [7]. Actually, noise in data reduces the quality of the information. PCA is considered one of the techniques that is interested in reducing the number of dimensions and extract the most important information without much loss of information [8]. Many of studies of PCA assume that the data are distributed according to a Gaussian distribution. The Gaussian PCA is sensitive to the noise of large magnitude. To robustify PCA, a number of improvements have been proposed by replacing the Gaussian distribution by another one [9]. The objective of this paper is replacement the Gaussian PCA to the DGamma PCA. A new approach of DGamma PCA of modeling noise is studied and the results are obtained. This paper is organized as follows: Section 2 presents DGamma distribution: it s probability density function, graphical illustration, Moment generating function and the computation of the maximum likelihood estimation. Section 3 presents the maximum likelihood estimation of the parameters of the DGamma distribution. Then, Section 4 provides case study of a simulated DGamma PCA of modeling noise. Finally, conclusions are drawn in Section 5. 2. Double Gamma Distribution The Gamma distribution (or also known as the Erlang distribution, named for the Danish mathematician Agner Erlang) has considerable attention in reliability theory [10]. The general form for the probability density function (pdf) of the DGamma distribution (also referred to as reflected gamma distribution) is given by x μ 1 1) f(x; μ, θ 1, θ 2 ) = ( 1 ) ( x μ θ2 )(θ ( e θ2 ) 2 θ 2 ᴦθ 1, < x <, (1) where μ and θ 2 are the positive location and scale parameters respectively, and Γ is the gamma function which has the form 0 a t a 1 e t dt. The form of the DGamma distribution when μ = 0 is given by f(x; μ, θ 1, θ 2 ) = ( 1 2 ) ( x x 1 1) θ2 )(θ ( e θ2 ) θ 2 ᴦθ 1. (2)

Double gamma principal components analysis 525 The standard forms of DGamma distribution of equation (1) where μ = 0 and θ 2 = 1 is given by f(x; θ) = ( 1 ) x (θ 1) e x. (3) 2 ᴦθ Some different shapes of pdf for DGamma distribution in different values of their parameters are presented. The following is the plot of the DGamma probability density function a) DGamma PDF when (θ 1 =.3, θ 2 =5) b) DGamma PDF when (θ 1 =1, θ 2 =4) c) DGamma PDF when (θ 1 =2, θ 2 =2) d) DGamma PDF when (θ 1 =6, θ 2 =2) e) DGamma PDF when (θ 1 =5, θ 2 =1) f) DGamma PDF when (θ 1 =10, θ 2 =10) Figure 1: Different shapes of the DGamma densities. Figure 1 shows different shapes of DGamma densities. Where plots [c], [d] and [e] show clearly the densities are bimodal, with a valley spreading them. The density in plot [f] is a bathtub shape. Whereas in plot [b], when θ 1 = 1, the density shape reduces to the Laplace distribution (non-smoothness property). Finally in plot [a] the pdf curve appears like two exponential distributions one increasing and the other is decreasing where θ 1 < 1. The moment generating function of DGamma in equation (3) is: M(t) = 1 2(1 t) θ + 1 2(1+t) θ.

526 Ameerah O. Bahashwan et al. 3. Maximum Likelihood Estimation Definition: Let represent x 1, x 2, x n a random sample from a density function f(x; θ) and let L(θ) = L(θ; x 1, x 2, x n ) be corresponding likelihood function, and is given by: L(θ) = L(θ; x 1, x 2, x n ) = f(x i ; θ). As a general procedure for constructing estimators, the value of θ, that maximize L(θ) will be chosen. Any value of θ satisfies the following inequality n i 1 L(θ ) L(θ) for all θ θ. [11] The likelihood function for n i.i.d observations (x 1,, x n ) from DGamma with μ = 0 is (2θ 2 ᴦθ 1 ) n ( x i ) n i=1 θ 2 θ 1 1 e n x i i=1 θ 2, giving the log-likelihood: n n ln(2θ 2 ᴦθ 1 ) + (θ 1 1) ln ( x i n i=1 x i i=1. (4) Hence, a numerical matter can be used to solve the log-likelihood Equation (4) in order to compute the maximum likelihood estimates of θ 1,θ 2. The maximum likelihood estimates of the two parameters of the DGamma distribution is applied for different random samples of size n generated from DGamma distribution. Then, the function nlm from package STATS4 of R statistical package is used to compute the ML estimates of θ 1 and θ 2. The confidence intervals, MSE and Bias are also computed. θ 2 ) θ 2

Double gamma principal components analysis 527 n,r θ 1 θ 2 θ 1 θ 2 Confidence interval 0f θ 1(L, U) Confidence interval θ 2(L, U) MSE of θ 1 MSE of θ 2 Bias of θ 1 Bias of θ 2 5 2 7.257404 1.860666 (5.235087, 9.279721) (0.4382013, 3.2831313) 0.2123280457 0.0008089117 0.094058503 0.005805571 24,50 2.5 7 3.252194 6.738703.9.05 1.25774564 0.04234315 (0.8550092, 5.6493796) (0.6838629, 1.8316283) (4.645845, 8.831560) (0.01839678, 0.06628953) 0.023574851 0.002844845 5.332581e-03 2.442804e-06 0.03134143 0.01088739 0.0149060684 0.0003190352.5.3 0.5824239 0.2891483 (0.1835674, 0.9812804) (0.1351743, 0.4431222) 2.830707e-04 4.906676e-06 0.0034343285 0.0004521558 50,25 5 2 5.668794 1.931147 (3.291254, 8.046335) 2.5 7 2.610725 6.858722 (1.471981 3.749468).9.05 0.97193670 0.04878351 (0.5667539, 1.3771195).5.3 0.6023119 0.2667678 (0.4466918, 0.7579320) (1.333724, 2.528569) (4.599197 9.118247) (0.03360175, 0.06396526) (0.06487944, 0.46865612) 8.945720e-03 9.481596e-05 0.0178454 0.002338958 0.0002451999 0.0003991898 0.002214497 0.002825561 1.034978e-04 2.959714e-08 1.438734e-03 2.432988e-05 2.093545e-04 2.208761e-05 0.0020462381 0.0006646444

528 Ameerah O. Bahashwan et al. 5 2 5.147613 2.034591 (3.553930, 6.741296) (1.539917, 2.529264) 2.178961e-04 1.196507e-05 0.0014761303 0.0003459056 100,50 2.5 7 2.566057 7.021842.9.05 0.94981609 0.04820232 (1.683882, 3.448233) (0.7209524, 1.1786798) (5.494472, 8.549212) (0.03405301, 0.06235163) 4.363565e-05 2.481643e-05 4.770669e-06 3.231666e-08 0.0006605729 0.0002184186 4.981609e-04 1.797683e-05.5.3 0.5218649 0.2807291 (0.3609550, 0.6827748) (0.1842645, 0.3771937) 4.780741e-06 3.713671e-06 0.0002186490 0.0001927089 Table 1 ML estimates of the parameters θ 1,θ 2 of the DG distribution, 95% Confidence interval, Mean Squared Error and Bias

Double gamma principal components analysis 529 In Table 1 below, n and R the sample size and number of samples, respectively. Table 1 shows that, in general, when sample size n increases, the estimates for the two parameters θ 1,θ 2 are improved. In addition, the lengths of the confidence intervals of the two parameters decrease when the sample size increases. The computed MSE and Bais for the two parameters also decrease when n increases. Therefore, one can conclude that, the results are getting better when the sample size increases. Which is true for all ML estimations; and this simulation study proves it when applying simulated data. 4. DGamma PCA of Modelling Noise In this section, the resistance DGamma PCA for noise by demonstrates some of case study from a simulation study is performed to evaluate the DGamma PCA to the noise. Generate low rank matrices B 5 5 matrices from DGamma(α = 9, β =.5) with sample size n=100. Then corrupt them with noise that has rate 10% (where 10% considered the largest proportion could corrupt the data by it) and trying to recover them by DGamma PCA technique. The cases that viewed below form of 60% of cases that appeared when applying the implementation of DGamma PCA with noise. Case 1: [a1] [a2] Figure 2: DGamma PCA at n=100; [a1] scree plot for data befor nosing [a2] scree plot when 10% noising done.

530 Ameerah O. Bahashwan et al. Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 1.86826 0.95084 0.77348 0.0848387 Proportion of Variance 0.69808 0.18082 0.11965 0.0014395 Cumulative Proportion 0.69808 0.87890 0.99856 1.0000000 [b1] Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 1.72136 1.02851 0.85992 0.489464 Proportion of Variance 0.59262 0.21156 0.14789 0.047915 Cumulative Proportion 0.59262 0.80419 0.95208 1.000000 [b2] Table 2 Summary table of the implementation of DGamma PCA at n=100; [b1] summary of the data before nosing [b2] summary of data when noising done. Figure 2 displays the scree plot of a simulated data comes from DGamma distribution at n=100. Firstly, from [a1] and [a2] noticing how the noise changes the variation in each component in [a2]. Secondly, from Table 2 someone can notice how the cumulative proportion was changed by comparing between [b1] and [b2]. it was in component 1 and component 2 in [b1] explain 87% of the total variation before existing any noise in data but after adding noise in data then applying DGamma PCA technique to recover data, the cumulative proportion of the component 1 and component 2 in [b2] becomes explain 80% of the total variation. Therefore, the result after applying DGamma PCA considers acceptable.

Double gamma principal components analysis 531 Case 2: [a1] [a2] Figure 3 DGamma PCA at n=100; [a1] scree plot for data before nosing [a2] scree plot when 10% noising done. Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 1.645290 1.364236 0.620942 0.2151970 Proportion of Variance 0.541395 0.372228 0.077113 0.0092619 Cumulative Proportion 0.541395 0.913624 0.9907380 1.000000 [b1] Importance of components: Comp.1 Comp.2 Comp.3 Comp.4 Standard deviation 1.551819 1.401815 0.770416 0.1822885 Proportion of Variance 0.481628 0.393017 0.118708 0.0066458 Cumulative Proportion 0.481628 0.874645 0.993354 1.0000000 [b2] Table 3 Summary table of the implementation of DGamma PCA at n=100; [b1] summary of the data before nosing [b2] summary of data when noising done.

532 Ameerah O. Bahashwan et al. Figure 3 displays the scree plot of a simulated data comes from DGamma distribution at n=100. By comparing between [a1] and [a2] will notice how the noise change the variation in each component in [a2] In addition, from Table 3 the one can notice how the cumulative proportion was changed. The component 1 and component 2 in [b1] explain 91% of the total variation before existing any noise in data but after adding noise in data then applying DGamma PCA technique to recover data the cumulative proportion of the component 1 and component 2 in [b2] becomes explain 87% of the total variation. Therefore, DGamma PCA technique gives acceptable results. 5. Conclusions In this paper, the DGamma distribution is viewed and some of its properties are shown. In addition, maximum likelihood estimates of the two parameters are computed and the numerical results are presented and discussed. From results, one extracts when the sample size increases the estimates results improves. Which is true for all ML estimation; and this simulation study proves it when applying simulated data. Moreover, when applying DGamma PCA technique on data with 10% noising, the results were suitable. Therefore, one can conclude that the DGamma PCA technique has acceptable behavior on data with noise. References [1] I. T. Jolliffe, Principal Component Analysis and Factor Analysis, Chapter in Principal Component Analysis, Springer, 1986, 115-128. https://doi.org/10.1007/978-1-4757-1904-8_7 [2] C. Archambeau, N. Delannay and M. Verleysen, Robust probabilistic projections, iproceedings of the 23rd International Conference on Machine Learning, ACM, 2006, 33-40. https://doi.org/10.1145/1143844.1143849 [3] Y. Wang and Y. Zhang, Facial recognition based on kernel PCA, 2010 3rd International Conference on Intelligent Networks and Intelligent Systems (ICINIS), 2010, 88-91. https://doi.org/10.1109/icinis.2010.88 [4] P. K. Pandey, Y. Singh and S. Tripathi, Image processing using principle component analysis, International Journal of Computer Applications, 15 (2011), no. 4, 37-40. https://doi.org/10.5120/1935-2582 [5] A. Wibowo and Y. Yamamoto, A note on kernel principal component regression, Computational Mathematics and Modeling, 23 (2012), no. 3, 350-367. https://doi.org/10.1007/s10598-012-9143-0

Double gamma principal components analysis 533 [6] K. Poorani and K. Brindha, Data Mining Based on Principal Component Analysis for Rainfall Forecasting in India, International Journal of Advanced Research in Computer Science and Software Engineering, 3 (2013), no. 9. [7] P. Xie and E. Xing, Cauchy Principal Component Analysis, 2014. http://www.cs.cmu.edu/~pengtaox/papers/cpca.pdf [8] L. I. Smith, A tutorial on principal components analysis, Cornell University, USA, Vol. 51, (2002), no. 52. http://www.cs.otago.ac.nz/cosc453/student_tutorials/principal_components.p df [9] P. Xie and E. Xing, Cauchy Principal Component Analysis, 2014. arxiv preprint arxiv:1412.6506 [10] L. J. Bain and M. Engelhardt, Introduction to Probability and Mathematical Statistics, Brooks/Cole, 1987. [11] A. M. Mood, Introduction to the Theory of Statistics, 1950. Received: April 19, 2018; Published: May 14, 2018