HPC and High-end Data Science - PDF Free Download

HPC and High-end Data Science for the Power Grid Alex Pothen August 3, 2018

Outline High-end Data Science 1 3 Data Anonymization Contingency Analysis 2 4 Parallel Oscillation Monitoring 2 / 22

PMUs in the U.S. Phasor Measurement Units in the North American Grid as of March 2015 [1] 3 / 22 [1] NASPI website: https://www.naspi.org/documents.

Emerging Data Logistics 4 / 22

Challenges in High-end Data Analysis Computing is a major consumer of electricity, recent estimates about 10%. Data movement within a processor consumes more energy than arithmetic. Data movement from the instrument to the curation and archival sites, visualizations, researcher s work spaces, etc. represents a significant use of energy. Hence energy is an overarching challenge for sustainable High-end Data Analysis (HDA). Solutions include: Reduce computational costs by matching platforms to each stage of the scientific method Reduce data movement costs by collocation, compression and caching Reuse data through sharing, metadata, and catalogs 5 / 22

Challenges in High-end Data Analysis Data reduction is a fundamental pattern in the convergence of HPC and HDA. need new algorithms for loss-less and lossy compression of scientific data, and need algorithms that can work with the compressed data. New algorithms and software for HPC, data science, analytics, Machine learning, etc. are needed here. A major challenge is the convergence in the software ecosystem for HPC and HDA. 6 / 22

Challenges in High-end Data Analysis High volume data needs high performance computing Streaming data needs on-line algorithms Integration of heterogeneous data e.g.., consumption forecasting in smart grids: smart meters, utility databases, weather data, microgrid sensor networks, etc. Anonymization of consumer data 7 / 22

Outline High-end Data Science 1 3 Data Anonymization Contingency Analysis 2 4 Parallel Oscillation Monitoring 8 / 22

Contingency Analysis N x contingency analysis on power flow evaluates the stability of a power system by simulating the failures of x out of N transmission lines or generators (components). 2 7 8 9 3 gen-2 5 load 6 gen-3 load gen-1 load 4 1 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9 9 / 22

Contingency Analysis Currently each N x contingency analysis is performed independently, but since the number of cases increases exponentially with x, this is computationally impractical. (For example, if N = 3120 and x = 5, there are in total 2.46 10 15 cases.) Heuristics are needed for choosing the cases to be analyzed. To be able to analyze more cases, we have to speed up the solution time for each case. We propose new algorithms for the problem by observing that only a principal submatrix of the system is changed when a component is removed upon failure. 10 / 22

Augmented Formulation The augmented matrix system can be expressed as: n{ A AH H m{ H A H ÂH x 1 b 0 x 2 = H b. m{ H 0 0 x 3 0 A is the original matrix; H is a submatrix of the identity matrix; Â = A HEH T updated matrix; m n; b is the new right-hand-side of the updated rows. 11 / 22

Runtimes Power Grid Contingency Analysis of 777,646 Nodes 10 1 time (second) 10 0 10 1 PARDISO LUSOL CHOLMOD augmented (Direct) augmented (GMRES) 10 2 1 2 3 4 5 6 7 8 9 1011121314151617181920 x (number of components removed) 12 / 22

Outline High-end Data Science 1 3 Data Anonymization Contingency Analysis 2 4 Parallel Oscillation Monitoring 13 / 22

Adaptive Anonymity Users Features F 1 F 2 F 3 F 4 F 5 F 6 U1 1 0 1 0 1 0 U2 1 1 1 1 1 0 U3 0 1 0 1 0 1 U4 0 0 0 0 0 1 U5 1 1 0 0 0 0 U6 1 1 0 0 0 1 14 / 22

Adaptive Anonymity Users Features F 1 F 2 F 3 F 4 F 5 F 6 U1 1 0 1 0 1 0 U2 1 1 1 1 1 0 U3 0 1 0 1 0 1 U4 0 0 0 0 0 1 U5 1 1 0 0 0 0 U6 1 1 0 0 0 1 Users Features F 1 F 2 F 3 F 4 F 5 F 6 U1 1 1 1 0 U2 1 1 1 0 U3 0 0 0 1 U4 0 0 0 1 U5 1 1 0 0 0 U6 1 1 0 0 0 15 / 22

Adaptive Anonymity with Approximation Algorithms Prob. Inst. Feat. b EdgeCover Time Util. Census1990 158K 68 9m 90% PokerHands 500K 95 2h 17m 84% CMS 1M 512 10h 33m 81% Earlier Algorithms Belief propagation: used exact b-matching algorithm, could solve problems with few thousand instances only. The approximate edge cover approach is two to three orders of magnitude faster on those smaller problems. 16 / 22

Adaptive Anonymity Computation on Cori 17 / 22

Outline High-end Data Science 1 3 Data Anonymization Contingency Analysis 2 4 Parallel Oscillation Monitoring 18 / 22

PMUs in the U.S. Stochastic Subspace Identification Covariance State Space Model xk 1 Axk wk, yk Cxk vk Past & Future Output Matrices y y1 y 0 J 1 y1 y2 y J Yp, y y y I I I J 1 2 y y y I I 1 I J 1 yi 1 yi 2 y I J Yf, y y y I I I J 1 2 2 2 2 Covariance Matrix T H Y Y O G, I f P I 2 I 1 T T, G Exkyk, O C CA CA CA Singular Value Decomposition (SVD) T H O G USV, O U S I I 1/2 1 1, Fast SVD Approaches State & Output Matrices Mode Identification A OI OI, C OI 1: l,:, A c = 1 log A, C T c = C, S Mode Freq. and Damping: eigenvalues of Aˆ c, s ModeShape: ˆ m C. i c i 19 / 22

PMUs in the U.S. Case 1 102 Voltage Phase Angles Computational Time Comparison Mac. No. Full SVD (sec) Strat. 1 sec (speedup) Randomized SVD h = 20, q = 1 Strat. 2 sec (speedup) Strat. 3 sec (speedup) Strat. 4 sec (speedup) Strat. 1 sec (speedup) Lanczos SVD k = 16 Strat. 2 sec (speedup) Strat. 3 sec (speedup) Strat. 4 sec (speedup) 1 112,075 105,826 (1.06x) 206.8 (542x) 135.4 (828x) 105.7 (1060x) 106,234 (1.05x) 81.2 (1380x) 100.2 (1119x) 96.0 (1167x) 2 125,807 121,007 (1.04x) 237.9 (529x) 174.0 (723x) 141.8 (887x) 115,458 (1.09x) 87.4 (1439x) 148.0 (850x) 122.0 (1031x) 3 138,236 135,304 (1.02x) 298.4 (463x) 230.3 (600x) 134.2 (1030x) 133,961 (1.03x) 101.9 (1357x) 177.0 (781x) 115.0 (1202x) 4 170,518 Theoretical speedup 168,617 (1.01x) 58x 304.4 (560x) 226.2 (754x) 140.0 (1218x) 163,028 (1.05x) 140x 104.8 (1627x) 158.0 (1079x) 138.3 (1233x) 20 / 22

References HDA and HPC M. Asch, I. Moore, R. Badia, J. Dongarra et al, Big Data and Extreme-Scale Computing: Pathways to Convergence, Internat. J. High Performance Computing Applications, 2018, 32(4), pp. 435-479. Data Anonymization A. Khan, K. Choromanski, A. Pothen, SM Ferdous, M. Halappanavar and A. Tumeo, Adaptive Anonymization of Data using b-edge cover, ACM/IEEE Proc. Supercomputing, Nov. 2018. Data Reduction for Oscillation Monitoring T. Wu, V. Venkatasubramanian, and A. Pothen, Fast parallel stochastic subspace algorithms for large-scale ambient oscillation monitoring, IEEE Trans. Smart Grid, 8(3), pp. 1494-1503, 2017. 21 / 22

References Contingency Analysis Y. Yeung, A. Pothen, M. Halappanavar and Z. Huang, AMPS: An augmented matrix formulation for principal submatrix updates with application to power grids, SIAM J. Sci. Comput., 39 (3), S809-S827, 2017. 22 / 22