Covariance Matrix Estimation for Reinforcement Learning

Similar documents
Combining Logistic Regression with Kriging for Mapping the Risk of Occurrence of Unexploded Ordnance (UXO)

Estimation of the large covariance matrix with two-step monotone missing data

4. Score normalization technical details We now discuss the technical details of the score normalization method.

Radial Basis Function Networks: Algorithms

arxiv: v1 [physics.data-an] 26 Oct 2012

A Comparison between Biased and Unbiased Estimators in Ordinary Least Squares Regression

arxiv: v2 [stat.me] 3 Nov 2014

Information collection on a graph

Research Note REGRESSION ANALYSIS IN MARKOV CHAIN * A. Y. ALAMUTI AND M. R. MESHKANI **

General Linear Model Introduction, Classes of Linear models and Estimation

Information collection on a graph

On split sample and randomized confidence intervals for binomial proportions

A New Asymmetric Interaction Ridge (AIR) Regression Method

On parameter estimation in deformable models

Using the Divergence Information Criterion for the Determination of the Order of an Autoregressive Process

Probability Estimates for Multi-class Classification by Pairwise Coupling

Approximating min-max k-clustering

MATHEMATICAL MODELLING OF THE WIRELESS COMMUNICATION NETWORK

Distributed Rule-Based Inference in the Presence of Redundant Information

Robustness of classifiers to uniform l p and Gaussian noise Supplementary material

Bayesian Model Averaging Kriging Jize Zhang and Alexandros Taflanidis

System Reliability Estimation and Confidence Regions from Subsystem and Full System Tests

ESTIMATION OF THE RECIPROCAL OF THE MEAN OF THE INVERSE GAUSSIAN DISTRIBUTION WITH PRIOR INFORMATION

Estimating Time-Series Models

For q 0; 1; : : : ; `? 1, we have m 0; 1; : : : ; q? 1. The set fh j(x) : j 0; 1; ; : : : ; `? 1g forms a basis for the tness functions dened on the i

AI*IA 2003 Fusion of Multiple Pattern Classifiers PART III

Bayesian Spatially Varying Coefficient Models in the Presence of Collinearity

Unsupervised Hyperspectral Image Analysis Using Independent Component Analysis (ICA)

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

A multiple testing approach to the regularisation of large sample correlation matrices

DETC2003/DAC AN EFFICIENT ALGORITHM FOR CONSTRUCTING OPTIMAL DESIGN OF COMPUTER EXPERIMENTS

An Analysis of Reliable Classifiers through ROC Isometrics

On Line Parameter Estimation of Electric Systems using the Bacterial Foraging Algorithm

The analysis and representation of random signals

Estimating function analysis for a class of Tweedie regression models

DIFFERENTIAL evolution (DE) [3] has become a popular

STABILITY ANALYSIS TOOL FOR TUNING UNCONSTRAINED DECENTRALIZED MODEL PREDICTIVE CONTROLLERS

Feedback-error control

Deriving Indicator Direct and Cross Variograms from a Normal Scores Variogram Model (bigaus-full) David F. Machuca Mory and Clayton V.

Linear diophantine equations for discrete tomography

arxiv: v2 [stat.me] 27 Apr 2018

Evaluating Circuit Reliability Under Probabilistic Gate-Level Fault Models

Robust Solutions to Markov Decision Problems

Hotelling s Two- Sample T 2

Spectral Analysis by Stationary Time Series Modeling

Yixi Shi. Jose Blanchet. IEOR Department Columbia University New York, NY 10027, USA. IEOR Department Columbia University New York, NY 10027, USA

Session 5: Review of Classical Astrodynamics

Notes on Instrumental Variables Methods

LINEAR SYSTEMS WITH POLYNOMIAL UNCERTAINTY STRUCTURE: STABILITY MARGINS AND CONTROL

Towards understanding the Lorenz curve using the Uniform distribution. Chris J. Stephens. Newcastle City Council, Newcastle upon Tyne, UK

An Ant Colony Optimization Approach to the Probabilistic Traveling Salesman Problem

Ratio Estimators in Simple Random Sampling Using Information on Auxiliary Attribute

x and y suer from two tyes of additive noise [], [3] Uncertainties e x, e y, where the only rior knowledge is their boundedness and zero mean Gaussian

Solved Problems. (a) (b) (c) Figure P4.1 Simple Classification Problems First we draw a line between each set of dark and light data points.

RANDOM WALKS AND PERCOLATION: AN ANALYSIS OF CURRENT RESEARCH ON MODELING NATURAL PROCESSES

TIME-FREQUENCY BASED SENSOR FUSION IN THE ASSESSMENT AND MONITORING OF MACHINE PERFORMANCE DEGRADATION

A Simple Weight Decay Can Improve. Abstract. It has been observed in numerical simulations that a weight decay can improve

MODELING THE RELIABILITY OF C4ISR SYSTEMS HARDWARE/SOFTWARE COMPONENTS USING AN IMPROVED MARKOV MODEL

Published: 14 October 2013

State Estimation with ARMarkov Models

A Recursive Block Incomplete Factorization. Preconditioner for Adaptive Filtering Problem

Lower Confidence Bound for Process-Yield Index S pk with Autocorrelated Process Data

Genetic Algorithms, Selection Schemes, and the Varying Eects of Noise. IlliGAL Report No November Department of General Engineering

Named Entity Recognition using Maximum Entropy Model SEEM5680

Hidden Predictors: A Factor Analysis Primer

Introduction to Probability and Statistics

Convex Optimization methods for Computing Channel Capacity

AN OPTIMAL CONTROL CHART FOR NON-NORMAL PROCESSES

On Wald-Type Optimal Stopping for Brownian Motion

Design of NARMA L-2 Control of Nonlinear Inverted Pendulum

Evaluation of the critical wave groups method for calculating the probability of extreme ship responses in beam seas

Shadow Computing: An Energy-Aware Fault Tolerant Computing Model

An Improved Generalized Estimation Procedure of Current Population Mean in Two-Occasion Successive Sampling

arxiv: v3 [physics.data-an] 23 May 2011

Adaptive estimation with change detection for streaming data

A Qualitative Event-based Approach to Multiple Fault Diagnosis in Continuous Systems using Structural Model Decomposition

Pairwise active appearance model and its application to echocardiography tracking

Scaling Multiple Point Statistics for Non-Stationary Geostatistical Modeling

SIMULATED ANNEALING AND JOINT MANUFACTURING BATCH-SIZING. Ruhul SARKER. Xin YAO

PER-PATCH METRIC LEARNING FOR ROBUST IMAGE MATCHING. Sezer Karaoglu, Ivo Everts, Jan C. van Gemert, and Theo Gevers

PArtially observable Markov decision processes

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

DEPARTMENT OF ECONOMICS ISSN DISCUSSION PAPER 20/07 TWO NEW EXPONENTIAL FAMILIES OF LORENZ CURVES

Topology Optimization of Three Dimensional Structures under Self-weight and Inertial Forces

Uncorrelated Multilinear Discriminant Analysis with Regularization and Aggregation for Tensor Object Recognition

Tests for Two Proportions in a Stratified Design (Cochran/Mantel-Haenszel Test)

dn i where we have used the Gibbs equation for the Gibbs energy and the definition of chemical potential

Monte Carlo Studies. Monte Carlo Studies. Sampling Distribution

Aggregate Prediction With. the Aggregation Bias

Applications to stochastic PDE

Estimation of component redundancy in optimal age maintenance

Sampling and Distortion Tradeoffs for Bandlimited Periodic Signals

t 0 Xt sup X t p c p inf t 0

COMPARISON OF VARIOUS OPTIMIZATION TECHNIQUES FOR DESIGN FIR DIGITAL FILTERS

Distributed K-means over Compressed Binary Data

Tensor-Based Sparsity Order Estimation for Big Data Applications

CONVOLVED SUBSAMPLING ESTIMATION WITH APPLICATIONS TO BLOCK BOOTSTRAP

Algorithms for Air Traffic Flow Management under Stochastic Environments

A New GP-evolved Formulation for the Relative Permittivity of Water and Steam

Implementation and Validation of Finite Volume C++ Codes for Plane Stress Analysis

Transcription:

Covariance Matrix Estimation for Reinforcement Learning Tomer Lancewicki Deartment of Electrical Engineering and Comuter Science University of Tennessee Knoxville, TN 37996 tlancewi@utk.edu Itamar Arel Deartment of Electrical Engineering and Comuter Science University of Tennessee Knoxville, TN 37996 itamar@eecs.utk.edu Abstract One of the goals in scaling reinforcement learning RL) ertains to dealing with high-dimensional and continuous stateaction saces. In order to tackle this roblem, recent efforts have focused on harnessing well-develoed methodologies from statistical learning, estimation theory and emirical inference. A key related challenge is tuning the many arameters and efficiently addressing numerical roblems, such that ultimately efficient RL algorithms could be scaled to real-world roblem settings. Methods such as Covariance Matrix Adatation - Evolutionary Strategy CMAES), Policy Imrovement with Path Integral PI ) and their variations heavily deends on the covariance matrix of the noisy data observed by the agent. It is well known that covariance matrix estimation is roblematic when the number of samles is relatively small comared to the number of variables. One way to tackle this roblem is through the use of shrinkage estimators that offer a comromise between the samle covariance matrix and a well-conditioned matrix also known as the target) with the aim of minimizing the mean-squared error MSE). Recently, it has been shown that a Multi-Target Shrinkage Estimator MTSE) can greatly imrove the single-target variation by utilizing several targets simultaneously. Unlike the comutationally comlex cross-validation CV) rocedure, the shrinkage estimators rovide an analytical framework which is an attractive alternative to the CV comuting rocedure. We consider the alication of shrinkage estimators in dealing with a function aroximation roblem, using the quadratic discriminant analysis QDA) technique and show that a two-target shrinkage estimator generates imroved erformance. The aroach aves the way for imroved value function estimation in large-scale RL settings, offering higher efficiency and fewer hyer-arameters. Keywords: covariance matrix estimation, ath integral, classification uncertainty The authors are with the Machine Intelligence Lab at the University of Tennessee - htt://mil.engr.utk.edu

1 Introduction Reinforcement learning RL) alied to real-world roblems inherently involves combining otimal control theory and dynamic rogramming methods with learning techniques from statistical estimation theory [1,, 3, 4]. The motivation is achieving efficient value function aroximation for the non-stationary iterative learning rocess involved, articularly when the number of state variables exceeds 10 [5]. Recent efforts in scaling RL address continuous state and/or action saces by otimizing arametrized olicies. For examle, the Policy Imrovement with Path Integral PI ) [5] combines a derivation from first rinciles of stochastic otimal control with tools from statistical estimation theory. It has been shown in [6] that PI is a member of a wider family of methods which share robabilistic modeling concets such as Covariance Matrix Adatation - Evolutionary Strategy CMAES) [7] and the Cross-Entroy Methods CEM) [8]. The Path Integral Policy Imrovement with Covariance Matrix Adatation PI -CMA) [6] takes advantage on the PI method by determining the magnitude of the exloration noise automatically [6]. The PI -SEQ [9] scheme alies PI to sequences of motion rimitives. One alication of the PI -SEQ is concerned with object grasing under uncertainty [9, Sec. 5] while alying the exerimental aradigm of [10]. The latter aroach has illustrated that over time, humans adat their reaching motion and gras to the shae of the object osition distribution, determined by the orientation of the main axis of its covariance matrix. Moreover, it has been shown that the PI otimal control olicy can be aroximated through linear regression [11]. This connection allows the use of well-develoed linear regression algorithms for learning the otimal olicy. The aforementioned methods rely on accurate covariance matrix estimation of the multivariate data involved. Unfortunately, when the number of observations n is comarable to the number of state variables the covariance estimation roblem become more challenging. In such scenarios, the samle covariance matrix is not well-conditioned and is not necessarily invertible desite the fact that those two roerties are required for most alications). When n, the inversion cannot be comuted at all [5, Sec..]. The same covariance roblem arises in other related alications of RL. For examle, in RL with Gaussian rocesses, the covariance matrix is regularized [1, Sec. ]. However, although the regularization arameter lays a ivotal role, it is not clear how it should be set [1, Sec. 3]. Other related work [13] study the ability to mitigate otentially overconfident classifications by assessing how qualified the system is to make a judgment on the current test datum. It is well known that for a small ratio of training observations n to observation dimensionality, conventional Quadratic Discriminant Analysis QDA) classifier erform oorly, due to a highly variable class conditional samle covariance matrices. In order to imrove the classifiers erformance, regularization is recommended, with the aim of roviding an aroriate comromise between the bias and variance of the solution. While other regularization methods [14] define regularization coefficients by the comutationally comlicated cross-validation CV) rocedure, the shrinkage estimators studied in this aer rovide an analytical solution, which is an attractive alternative to the CV rocedure. This aer elaborates on the Multi-Target Shrinkage Estimator MTSE) [15] that addresses the roblem of covariance matrix estimation when the number of samles is relatively small comared to the number of variables. MTSE offers a comromise between the samle covariance matrix and well-conditioned matrices also known as targets) with the aim of minimizing the mean-squared error MSE). Section resents the MTSE and examine the squared biases of two diagonal targets. In Section 3, we conduct a careful exerimental study and examine the two-target and one-target shrinkage estimator, as well as the Lediot-Wolf LW) [16] method for different covariance matrices. We demonstrate an alication for the quadratic discriminant analysis QDA) classifier, showing that the test classification accuracy rate TCAR) is higher when using the two-target, rather than one-target, shrinkage regularization. The QDA classifier is a fundamental comonent in DeSTIN [17] which is a dee learning system for satiotemoral feature extraction. The DeSTIN architecture currently assumes diagonal covariance matrices, which is one of the targets examined in this aer. In our future research we intend to utilize the results shown in this aer in order to imrove the DeSTIN architecture. Multi-Target Shrinkage Estimation Let {x i } n be a samle of indeendent identical distributed i.i.d.) -dimensional vectors drawn from a density having zero mean and covariance Σ = {σ ij }. The most common estimator of Σ is the samle covariance matrix S = {s ij }, defined as S = 1 n x i x T i 1) n and is unbiased, i.e., E {S} = Σ. The MTSE model [15] defined as ) t t ˆΣ γ) = 1 γ i S + γ i T i, ) 1

where t is the number of the targets T i, i = 1,..., t and γ = [γ 1,..., γ t ] T is the vector of shrinkage coefficients. Our objective is therefore to find ˆΣ γ) ), which minimizes the MSE loss function { } L γ) = E ˆΣ γ) Σ. 3) The otimal shrinkage coefficient vector γ that minimize L γ) 3) can be found by using a strictly convex quadratic rogram [15]. In this aer, we use the two diagonal targets T 1 = F Tr S) I, T = diags). 4) Following the develoments in [16, Sec..], the covariance matrix Σ can be written as Σ = VΛV T, where V and Λ are the eigenvector and eigenvalue matrices of Σ, resectively. The eigenvalues of Σ are denoted as ζ i, i = 1,..., in increasing order, i.e., ζ 1 ζ... ζ, and it is well known that ζ i = Tr Σ). As a result, the squared bias of T 1 with resect to Σ can be written as E {T 1 } Σ F = 1 Tr Σ) I VΛVT F = ζi ζ ), ζ = Tr Σ) = 1 ζ i 5) where ζ is the mean of the eigenvalues ζ i, i = 1,...,. The above result shows that E {T 1 } Σ F is equal to the disersion of the eigenvalues around their mean. Therefore, T 1 becomes less suitable in describing Σ when the disersion of the eigenvalues 5) increases. On the other hand, the exression of the squared bias of T with resect to Σ can be written as E {T } Σ F = diag Σ) Σ F = σ ij, 6) i j which shows that it is equal to the off-diagonal entries in Σ. Therefore, T becomes less suitable for describing Σ when the variables of Σ are more highly correlated. 3 Exeriments In this section, we resent an extensive exerimental study of one-target and two-target shrinkage estimators. The estimators are affected by the squared bias and the variance of a target, when the latter deends on the number of data observations n. Therefore, we examine cases of different true covariance matrices Σ that result in different biases of T 1 and T. We then examine the estimator s erformance as a function of n. In order to study the effect of the squared biases, we create a covariance matrix Σ with determinant of one, i.e., Σ = 1, according to two arameters. The first arameter is the condition number η, which is the ratio of the largest eigenvalue ζ max to the smallest eigenvalue ζ min of Σ, i.e., η = ζmax ζ min. In the exeriments, the eigenvalues of Σ denoted as ζ i, i = 1,,..., are generated according to ) i 1) ζ i = ζ min η 1) 1) + 1, i = 1,...,. 7) Then, the eigenvalue matrix Σ is defined as having elements ζ i, i = 1,,..., in the matrix form Λ η) = diag ζ 1, ζ,..., ζ ). 8) The second arameter K, controls the rotation of Λ η). Our aroach is to select a set of orthonormal transformations, as in [18, Sec..B] E K) = K k=1 k E k = E 1 E... E K, where each matrix E k is defined as E k = E kl =E k1 E k... E K k). 9) The matrix E kl is an orthonormal rotation of 45 0 in a two-coordinate lane for the coordinates k and + 1 l), i.e., where Φ i k, j k ) is defined as [Φ] ij = l=1 E kl = I + Φ k, + 1 l), 10) 1 1 1 1 0 if i = j = i k or i = j = j k if i = i k and j = j k if i = j k and j = i k otherwise. 11)

The arameter K is an integer value with the range 0 K 1, where K = 0 indicates there is no rotation, and K = 1 indicates full rotation, such that all the coordinates rotate with resect to each other at an angle of 45 0. Then, by using Λ η) 8) and E 9), the covariance matrix is created by Σ η, K) = E K) Λ η) E T K). 1) By emloying the covariance matrix 1), the biases of T 1 and T can be controlled indeendently for η > 1. The squared bias E {T 1 } Σ F is affected only by η, and increases as η does, when E {T 1} Σ F = 0 for η = 1. The E {T } Σ F is affected only by K, and increases as K does, when E {T } Σ F = 0 for K = 0. It should be noted that if η = 1 then K has no imact while if η is near 1, then K could has minor imact. The shrinkage estimators used in the study are of the one-target variety with T 1 and T. In the figures that aear in this section, these estimators are denoted as T1 and T, resectively. The LW estimator [16] is of the one-target shrinkage variety with T 1, which uses a biased shrinkage coefficient estimator and is denoted as LW. Finally, the two-target shrinkage estimator aears in the figures as TT. We show that the two-target estimator can imrove classification results comared with one-target estimators, when using the quadratic discriminant analysis QDA) method. The urose of the QDA is to assign observations to one of several g = 1,..., G grous with -variate normal distributions 1 ) f g x) = π) Σ g ex 0.5 x m g ) T Σ 1 g x m g ), 13) where m g and Σ g are the oulation mean vector and covariance matrix of the grou g. An observation x is assigned to a class ĝ according to dĝ x) = min d g x), 14) 1 g G with d g x) = x m g ) T Σ 1 g x m g ) + ln Σ g ln π g, 15) where π g is the unconditional rior robability of observing a member from the grou g. In our exeriments, we classify two grous G = ), with observations generated from a normal distribution with zero mean and π 1 = π. The covariance matrix of the first grou is the identity matrix Σ 1 = I, while that of the second grou is the covariance matrix Σ η, K) = Σ η, K) 1), which is generated on the basis of the revious exeriments. The goal is to study the effectiveness of the shrinkage estimators when using QDA, by assigning observations to one of these two grous, based on the classification rule 14). We run our exeriments for n =, 3,..., 30. For each n, twenty sets of data of size n are roduced. a) b) Figure 1: QDA for a) Σ η, 0) = Λ η) with η = 10 and b) an unrestricted Σ 10, K) with K = 5 We summarize for each exeriment the average test classification accuracy rate TCAR) with standard deviations the bars in the figure) over the twenty relications for each n. For each grou, 10 5 test observations were generated in order to exam the efficiency of the classifier. We rovide the best TCAR, calculated by using 14), when the covariance matrices are known, denoted in the figures as Bayes. We also comare the results for a regularization [19, sec. 6], where the zero eigenvalues were relaced with a small number just large enough to ermit numerically stable inversion. This has the effect of roducing a classification rule based on Euclidean distance in the zero-variance subsace. We denote this rocedure as the zero-variance regularization ZVR). In all exeriments, the TCAR of the two-target estimator is higher than the one-target variety. The LW estimator is inferior to its unbiased version when dealing with a small number of observations, and converges to its unbiased version as the number of observations increases. Fig. 1a) resents the result 3

when the covariance matrix is a diagonal matrix, i.e., Σ η, 0) = Λ η), with η = 10, and therefore T is unbiased while T 1 is biased. The target T 1 rovides a higher TCAR than T for small numbers of observations, and then T rovides a better TCAR. In Fig. 1b), the covariance matrix is unrestricted, i.e., Σ 10, K), with K = 5. The targets T 1 and T are biased. The squared bias of T 1 is not affected by K; whereas the higher the value of K, the higher the squared bias of T, and therefore T loses its advantage over T 1. In conclusion, it has been shown that the Multi-Target Shrinkage Estimator MTSE) [15] can greatly imrove the singletarget variation in the sense of mean-squared error MSE) by utilizing several targets simultaneously. We consider the alication of shrinkage estimator in the context of a function aroximation roblem, using the quadratic discriminant analysis QDA) technique and show that a two-target shrinkage estimator generates imroved erformance. This is done by a careful exerimental study which examines the squared biases of the two diagonal targets. Unlike the comutationally comlex cross-validation CV) rocedure; the shrinkage estimators rovide an analytical solution which is an attractive alternative to the CV comuting rocedure, commonly used in the QDA. The aroach aves the way for imroved value function estimation in large-scale RL settings, offering higher efficiency and fewer hyer-arameters. References [1] P. Dayan and G. E. Hinton, Using exectation-maximization for reinforcement learning, Neural Comutation, vol. 9, no.,. 71 78, 1997. [] M. Ghavamzadeh and Y. Engel, Bayesian actor-critic algorithms, in Proceedings of the 4th international conference on Machine learning. ACM, 007,. 97 304. [3] M. Toussaint and A. Storkey, Probabilistic inference for solving discrete and continuous state markov decision rocesses, in Proceedings of the 3rd international conference on Machine learning. ACM, 006,. 945 95. [4] N. Vlassis, M. Toussaint, G. Kontes, and S. Pieridis, Learning model-free robot control by a monte carlo em algorithm, Autonomous Robots, vol. 7, no.,. 13 130, 009. [5] E. Theodorou, J. Buchli, and S. Schaal, A generalized ath integral control aroach to reinforcement learning, J. Mach. Learn. Res., vol. 11,. 3137 3181, Dec. 010. [6] F. Stul and O. Sigaud, Path integral olicy imrovement with covariance matrix adatation, in Proceedings of the 9th International Conference on Machine Learning ICML), 01. [7] N. Hansen and A. Ostermeier, Comletely derandomized self-adatation in evolution strategies, Evolutionary Comutation, vol. 9, no.,. 159 195, June 001. [8] S. Mannor, R. Y. Rubinstein, and Y. Gat, The cross entroy method for fast olicy search, in ICML, 003,. 51 519. [9] F. Stul, E. Theodorou, and S. Schaal, Reinforcement learning with sequences of motion rimitives for robust maniulation, IEEE Transactions on Robotics, vol. 8, no. 6,. 1360 1370, Dec 01. [10] V. N. Christooulos and P. R. Schrater, Grasing objects with environmentally induced osition uncertainty, PLoS comutational biology, vol. 5, no. 10, 009. [11] F. Farshidian and J. Buchli, Path integral stochastic otimal control for reinforcement learning, in The 1st Multidiscilinary Conference on Reinforcement Learning and Decision Making RLDM013), 013. [1] G. Chowdhary, M. Liu, R. Grande, T. Walsh, J. How, and L. Carin, Off-olicy reinforcement learning with gaussian rocesses, IEEE/CAA Journal of Automatica Sinica, vol. 1, no. 3,. 7 38, 014. [13] H. Grimmett, R. Paul, R. Triebel, and I. Posner, Knowing when we don t know: Introsective classification for mission-critical decision making, in 013 IEEE International Conference on Robotics and Automation ICRA), May 013,. 4531 4538. [14] P. J. Bickel and E. Levina, Regularized estimation of large covariance matrices, The Annals of Statistics, vol. 36, no. 1,.. 199 7, 008. [15] T. Lancewicki and M. Aladjem, Multi-target shrinkage estimation for covariance matrices, IEEE Transactions on Signal Processing, vol. 6, no. 4,. 6380 6390, Dec 014. [16] O. Ledoit and M. Wolf, A well-conditioned estimator for large-dimensional covariance matrices, Journal of Multivariate Analysis, vol. 88, no.,. 365 411, 004. [17] S. Young, J. Lu, J. Holleman, and I. Arel, On the imact of aroximate comutation in an analog destin architecture, IEEE Transactions on Neural Networks and Learning Systems, vol. 5, no. 5,. 934 946, May 014. [18] G. Cao, L. Bachega, and C. Bouman, The sarse matrix transform for covariance estimation and analysis of high dimensional signals, IEEE Transactions on Image Processing, vol. 0, no. 3,. 65 640, 011. [19] J. H. Friedman, Regularized discriminant analysis, Journal of the American Statistical Association, vol. 84, no. 405,. 165 175, 1989. 4