Efficient, portable template attacks

Size: px
Start display at page:

Download "Efficient, portable template attacks"

Transcription

1 Efficient, portable template attacks Marios O. Choudary, Markus G. Kuhn Computer Laboratory Paper: IEEE Trans. Inf. Foren. Sec. 13(2), Feb. 2018, DOI /TIFS / 46

2 2 / 46

3 3 / 46

4 4 / 46

5 8 Current [ma] Time [µs] 5 / 46

6 Side-channel attacks on microcontrollers I The power-supply current waveform of microprocessors (and resulting EM emissions) is affected at each clock cycle by (category of) the executed instruction addresses/registers accessed operands status flags result values prior state (of wires, bus lines, flip flops, memory cells) intermediate activities (e.g., glitches before ALU results are stable) micro-architectural state etc. 6 / 46

7 Side-channel attacks on microcontrollers II Instruction categories are often easy to distinguish visually, e.g. if a conditional branch is taken or not. ( simple power analysis ) In some cases (e.g., with interpreters) this enables reconstruction of executed application instruction sequences from recordings of a single execution. Data-dependent variations require more effort to separate them from measurement noise: repeat measurements statistical signal processing exploitation of knowledge of executed algorithms low-noise/low-jitter measurement setup 7 / 46

8 Current traces for 256 different values of password byte 1 20 wrong inputs: min/max measured currents wrong inputs: min/max difference to median correct input: current correct input: difference to median ma µs 8 / 46

9 Side-channel attacks on microcontroller data busses Many techniques have been demonstrated since 1998 to exploit data-dependent variations in power and EM emissions. Most of these reconstruct subkeys used in known crypto algorithms by observing operation v k (p) = S(p k) with known plain-text input p and substitution table ( s-box ) S, e.g. in first round of a block cipher: Differential Power Analysis: [Kocher, et al., 1998] for all candidate subkey bytes k S and each observed input p predict one bit b in v k (p) estimate leakage trace x k,b (t) as a function of b (by averaging many traces with different p but identical k and b) only the correct candidate key k will cause a significant peak at time t in the difference-of-means trace x k,1 (t) x k,0 (t) Only if the assumed k was correct will we have split our set of recorded traces correctly into two piles, one for b = 0 and one for b = 1, such that the two average traces, one for each pile, show a difference (contributed by b). 9 / 46

10 Side-channel attacks on microcontroller data busses Many techniques have been demonstrated since 1998 to exploit data-dependent variations in power and EM emissions. Most of these reconstruct subkeys used in known crypto algorithms by observing operation v k (p) = S(p k) with known plain-text input p and substitution table ( s-box ) S, e.g. in first round of a block cipher: Correlation Power Analysis: for all candidate subkeys k S predict a value f(v k (p)) that is expected to be proportional to some samples in the leakage traces, e.g. the Hamming weight of v k (p) the correct candidate key k will cause the highest Pearson correlation coefficient between f(v k (p)) and some sample positions in the recorded leakage traces 9 / 46

11 Side-channel attacks on microcontroller data busses Many techniques have been demonstrated since 1998 to exploit data-dependent variations in power and EM emissions. Most of these reconstruct subkeys used in known crypto algorithms by observing operation v k (p) = S(p k) with known plain-text input p and substitution table ( s-box ) S, e.g. in first round of a block cipher: Mutual Information Analysis: the correct candidate key k will cause the highest mutual information between (some function f of) v k (p) and some sample positions in the recorded leakage traces 9 / 46

12 Side-channel attacks on microcontroller data busses Many techniques have been demonstrated since 1998 to exploit data-dependent variations in power and EM emissions. Most of these reconstruct subkeys used in known crypto algorithms by observing operation v k (p) = S(p k) with known plain-text input p and substitution table ( s-box ) S, e.g. in first round of a block cipher: Template Attack: [Chari, et al., 2003] profiling phase: build a Gaussian multivariate model (pdf) for the leakage trace for each result byte v requires access to a test chip/mode where k and hence v is known attack phase: find the maximum-likelihood candidate key k given n a leakage traces x p1, x p2,..., x pna and associated inputs p, using the probability density function f(x p v) built during the profiling phase 9 / 46

13 Side-channel attacks on microcontroller data busses Many techniques have been demonstrated since 1998 to exploit data-dependent variations in power and EM emissions. Most of these reconstruct subkeys used in known crypto algorithms by observing operation v k (p) = S(p k) with known plain-text input p and substitution table ( s-box ) S, e.g. in first round of a block cipher: Stochastic Model : [Schindler, et al., 2005] profiled, like template attack, but rather than building a pdf for each possible value v, model the leakage trace of v as a linear combination of traces for its individual bits (or pairs) shorter profiling phase due to reduced number of parameters to be estimated more practical for 16-bit busses can be less accurate than full template attack, especially with small design sizes (more non-linear effects, capacitive coupling between bus traces, etc.) 9 / 46

14 Side-channel attacks on microcontroller data busses Many techniques have been demonstrated since 1998 to exploit data-dependent variations in power and EM emissions. Most of these reconstruct subkeys used in known crypto algorithms by observing operation v k (p) = S(p k) with known plain-text input p and substitution table ( s-box ) S, e.g. in first round of a block cipher: Deep Learning : profiled attack to train a neural network to classify traces according to v very compute intensive, very large number of parameters convolutional layers may learn to auto align traces, whereas template attacks rely strongly on low-jitter alignment all magic 9 / 46

15 Objectives here: Use template attack independent of any cryptographic algorithm (no known s-box, etc.). Directly eavesdrop on 8-bit parallel bus lines (or 32-bit busses that handle 8-bit data) Demonstration attack target: a single 8-bit load instruction (e.g., RAM to register) in a microcontroller Example targets: data parsers handling secrets, string processing functions, instruction fetch cycles, loading keys into cryptographic hardware, etc. ( sub-cryptographic algorithms ) Such code may still lack masking/hiding countermeasures Much more demanding than DPA-style crypto attacks, as we now depend on all bits being distinguishable (rather than just cruder leakage models, such as Hamming weights) Signal pre-processing and dimensionality reduction to maximize signal-to-noise ratio and reduce number of parameters to estimate become crucial 10 / 46

16 Template attack (basics, notation) Hopefully identical hardware: profiling device, attacked device Goal: infer some secret value k S, processed by the attacked device at some point. For 8-bit microcontroller: S = {0,..., 255} Required: ability to sample supply-current or electro-magnetic waveforms ( raw leakage vectors x r R mr ) at times {t 1,..., t m r} during and near the point in time where k is processed. Profiling phase: record n p raw leakage vectors x r ki Rmr (1 i n p ) from the profiling device for each possible candidate value k S. Result: one raw leakage matrix X r k =. Rnp mr for each k S, containing row vectors x r ki ( = transposed) 11 / 46

17 Trace compression (basics, notation) Raw leakage vectors x r ki may contain mr = hundreds or thousands of samples, due to high sampling rates used. We may compress them before further processing, either by sample selection: keep only a subset of m m r samples dimensionality reduction: Principal Component Analysis (PCA) or Fisher s Linear Discriminant Analysis (LDA) Compressed leakage vectors: x r ki Rmr x ki R m Combine these as rows into the compressed leakage matrix X k =. Rnp m Without any such compression step: X k = X r k and m = mr. 12 / 46

18 Template parameters (basics, notation) Now use compressed leakage matrices X k to estimate for each possible value k S n p Mean trace: x k = 1 n p i=1 Covariance matrix: S k = 1 n p 1 x ki n p (x ki x k )(x ki x k ) i=1 n p Note: (x ki x k )(x ki x k ) = X X k k where X k is X k with x k i=1 subtracted from each row. Side-channel leakage traces can generally be modelled well by a Gaussian multi-variate distribution, meaning that x k and S k are sufficient statistics defining the underlying distribution (probability density function) f(x k) = 1 (2π) m S k e 1 2 (x x k) S 1 k (x x k) 13 / 46

19 Illustrative example Each dot represents a trace x (with just m = 2 samples, colour indicates k), red circles represent mean traces x k, red lines represent eigenvectors of covariance matrix S k, and the green ellipses are equiprobability lines of f(x k). 14 / 46

20 Attack phase (basics, notation) Infer the secret value k S processed by the attacked device: Trigger repeat processing of k for n a times. Use same recording technique and compression method as in profiling phase. Obtain n a leakage vectors x i R m, store in leakage matrix X k =. Rna m For each k S compute a discriminant score D(k X k ). Finally try all k S on the attacked device, in order of decreasing score (optimized brute-force search, e.g. for a password or cryptographic key), until correct k found. 15 / 46

21 Discriminant function Given a trace x i from X k, Bayes rule suggests: D(k x i ) = f(x i k)p (k) or, if P (k) is independent of k (P (k) = S 1 ), then D(k x i ) = f(x i k). The full Bayes likelihood is f(x i k)p (k) L(k x i ) = k f(x i k )P (k ) but we can omit here factors that are same for each k and therefore do not affect the relative order of the discriminat scores. With more than one measurement, assuming noise is independent across repeat measurements, the joint likelihood over all attack traces x i in X k is L(k X k ) = L(k x i ) x i in X k Is this a better discriminat than L(k n 1 na a i=1 x i), i.e. averaging all attack traces first before looking up a pdf? Yes, but / 46

22 Numerical problems So far so simple. But in practice the pdf f(x k) = 1 (2π) m S k e 1 2 (x x k) S 1 k (x x k) can easily cause numerical problems that require attention: S k may not be invertible ( S k 0): In fact S k cannot be invertible if n p m: This is because S k is essentially X k X k, and therefore X k R np m and S k R m m have the same rank. S k may also overflow easily e x may overflow easily IEEE double covers e x only for x < 710, easily exceeded for large m. 17 / 46

23 Pooled co-variance matrix The template mean vectors x k characterize the signal. The co-variance matrices S k characterize the noise. If the measured noise is independent of the signal, then the underlying covariances estimated by the S k will be identical ( homoscedasticity ). We can then average the S k into a single pooled covariance matrix: This has many advantages: S pooled = 1 S better noise model (more data) k S relaxation of the necessary condition for S pooled being invertible: m < S n p, or n p > m S enables compression with Linear Discriminant Analysis (LDA) enables faster and more stable discriminant functions But: some side-channel countermeasures can result in data-dependent noise. S k 18 / 46

24 Illustrative example All S = 8 error ellipses are identically sized and orientated, and do not depend on k. 19 / 46

25 Compression: sample selection I keeping the dimension m of the multivariate pdf model small helps avoid numerical problems many samples in x r i will contain no data-dependent variation discarding too much information will reduce success rate Data-dependent variation characterized by between-groups vectors: τ k = x r k xr where x r = 1 x r k S. k S Various per-sample signal-strength estimates have been proposed: Difference of Means (DOM), the Sum of Squared Differences (SOSD), the Signal to Noise Ratio (SNR) and SOST. Example: s DOM (t) = 1 k<k < S x r k (t) xr k (t) 20 / 46

26 Compression: sample selection II Normalized signal-strength estimates from DOM, SOSD and SNR on our reference data set (Grizzly Beta). dom sosd snr std clock clock cycles Simplest techniques: take the m samples with the highest signal strength s(t), or all above some threshold. But these may all come from the same clock cycle and be highly correlated with each other (i.e., not say much new). Alternative strategy: Take a maximum number of samples (e.g., 1, 3, 20) from each clock clock cycle. 21 / 46

27 Covariance of the between-group vectors The between-groups vectors τ k = x r k xr shown in blue. 22 / 46

28 Principal Component Analysis [Archambeau et al., 2006] Sample-between-groups matrix: B = ( x r k xr )( x r k xr ) k S Singular value decomposition: B = UDU each column of the orthonormal matrix U R mr m r is an eigenvector u j of B diagonal matrix D R mr m r contains the corresponding eigenvalues δ j, with δ 1 δ 2 δ m r. Only the first m S eigenvectors (u 1... u m ) = U m are needed to preserve most of the variability from the mean vectors x r k. Compression step: X k = X r k Um This projects each raw trace x r i in Xr k eigenvectors of B: x i = x r i Um. onto the just m largest 23 / 46

29 PCA example: eigenvectors of B u1 u2 u3 u4 u5 u6 24 / 46

30 PCA example: eigenvalues of B / 46

31 Linear discriminant analysis: maximising SNR LDA uses two covariance matrixes: B for signal and S pooled for noise, and projects the x r i onto the largest eigenvectors of the signal-to-noise matrix ( S r ) 1B. pooled 26 / 46

32 Linear discriminant analysis I [Standaert/Archambeau, 2008] PCA finds directions δ j where the signal is strong, to project onto, but ignores the noise. Fisher s LDA instead considers projections y j = a j x r and finds directions a j R mr that maximize between-groups variance within-groups variance = ( E (yjk ) E (y j ) ) 2 ( ( aj E (x r k ) E (xr ) )) 2 k S Var (y jk ) k S which can be estimated as = k S k S Var ( a j x r ) k S (n p 1) (a j ( x r k xr )) 2 k S a j Ba j n p = a j S r a j (x ki x k )(x ki x k ) pooled a a j j k S i=1 27 / 46

33 Linear discriminant analysis II The coefficient a j that maximises a j Ba j a j S r pooled a j is the first eigenvector (i.e., the one with the largest associated eigenvalue) of ( S r pooled ) 1B With the constraint Cov(y ik, y jk ) = 0, the other a j that maximise the above ratio are the eigenvectors with the next largest eigenvalues. Note that ( S r pooled) 1B is not necessarily symmetric, so we cannot directly apply singular-value decomposition to obtain orthonormal eigenvectors. Instead, we can first compute the eigenvectors u j of the symmetric matrix ( S r pooled) 1 2 B ( S r pooled) 1 2, which has the same eigenvalues as ( S r pooled) 1B, and from which we can then obtain the coefficients a j = ( S r pooled) 1 2 u j. There are a maximum of s = min(m r, S 1) non-zero eigenvectors, as that is the maximum number of independent linear combinations available in B. 28 / 46

34 LDA example: eigenvectors of B u1 u2 u3 u4 u5 u6 29 / 46

35 LDA example: eigenvectors of ( S r pooled ) u1 u2 u3 u4 u5 u6 30 / 46

36 LDA example: eigenvectors of ( S r pooled) 1B u1 u2 u3 u4 u5 u6 31 / 46

37 Linear discriminant analysis III Like with PCA, pick m such that the first m eigenvalues of ( S r pooled ) 1B cover e.g. 95% of the sum of all eigenvalues. Let A = (a 1... a m ) be the matrix of the first m eigenvectors of ( S r pooled) 1B, then project each leakage matrix as X k = X r k A LDA generally outperforms all other compression methods, but relies on homoscedasticity, therefore PCA remains useful where the noise is not easily characterized. When we scale the coefficients a j, such that a j S r pooled a j = 1 the covariance in the discriminant function becomes the identity matrix, i.e. S k = I, which greatly reduces computation and storage requirements. 32 / 46

38 After linear discriminant analysis / 46

39 The log-likelihood discriminant Recall the numerical problems with f(x k) = 1 (2π) m S k e 1 2 (x x k) S 1 k (x x k) Avoid overflowing e x and S k by using instead the log-likelihood log f(x k) = m 2 log 2π 1 2 log S k 1 2 (x x k) S 1 k (x x k) Compute log S k = 2 m i=1 log c ii using the Cholesky decomposition S k = C C. Since C is triangular, its determinant is the product of its diagonal elements c ii. Dropping the first term (constant across all k) gives us a robust discriminant based on the log-likelihood: D log (k x i ) = 1 2 log S k 1 2 (x i x k ) S 1 k (x i x k ) 34 / 46

40 The linear discriminant Using S pooled, we can discard log S k as well. This leaves the Mahalanobis distance d 2 M (x, x k) = (x x k ) S 1 pooled (x x k) 0 to compare candidates k. (Covariance is positive semidefinite.) Rewrite as d 2 M (x, x k) = x S 1 pooled x 2 x k S 1 pooled x + x k S 1 pooled x k and drop the first term (constant for all candidates k) to obtain a discriminant that depends linearly on x i : D linear (k x i ) = x k S 1 pooled x i 1 2 x k S 1 pooled x k 35 / 46

41 Joint discriminants Recall that to combine n a attack traces (essential for the success of many side-channel attacks), we need to compute a discriminant based on their their joint likelihood L(k X k ) = n a L(k x i ) or log L(k X k ) = log L(k x i ) x i in X k i=1 This costs O(n a m 2 ) for D log (k X k ) = n a 2 log S k 1 2 but only O(n a m + m 2 ) for D linear (k X k ) = x k S 1 pooled ( na n a i=1 i=1 (x i x k ) S 1 k (x i x k ) x i ) n a 2 x k S 1 pooled x k since x k S 1 pooled and x k S 1 pooled x k only need to be done once. Practical evaluation example: D log 3.5 days, D linear 30 min! 36 / 46

42 Example: comparison of different compression methods Our test dataset Grizzly (available online): Atmel XMEGA 256 A3U processor 10 ohm resistor in ground line powered from 3.3 V battery via voltage regulator 1 MHz sine wave clock 250 MHz sampling frequency, 8-bit samples 3072 traces for each byte, m r = 2500 samples per trace sequence of LOAD instructions, where only one handles k, all others handle constant value zero Guessing entropy: Binary logarithm of rank order of correct k in list of k value sorted by decreasing discriminant function, averaged over 10 attacks. Sample selections: 1 samples/clock (1ppc, m 8), 3 samples/clock 3ppc (m 25), 20ppc (m 77) and allap (m 125) selections (all selected samples above the highest 95th percentile of s(t)). 37 / 46

43 Guessing entropy (bits) Guessing entropy (bits) n p = 200 n p = 2000 Guessing entropy (bits) Guessing entropy (bits) PCA PCA, m=4 sample, 1ppc sample, 3ppc sample, 20ppc sample, allap Sk (Dlog) na (log axis) na (log axis) LDA 1ppc LDA, m=4 PCA, m=4 sample, 1ppc sample, 3ppc sample, 20ppc sample, allap Spooled (Dlinear) na (log axis) na (log axis) 38 / 46

44 Guessing entropy (bits) Guessing entropy (bits) Attacks on AES software/hardware implementations LDA, m=4 PCA, m=4 1ppc 3ppc 20ppc allap n a (log axis) LDA, m=10 PCA, m=10 1ppc, m=6 3ppc, m=18 20ppc, m= na (log axis) Left: Guessing entropy after template attack on the Grizzly dataset in an AES S-box scenario (simulated). DPA-style attack on AES much easier than direct eavesdropping of a single LOAD instruction. Right: Template attack on AES engine (Polar dataset). Software implementation much easier to attack than hardware implementation. 39 / 46

45 Attacks on different devices Four XMEGA PCB devices used in our experiments. 40 / 46

46 Guessing entropy (bits) Guessing entropy (bits) Classic template attacks in different scenarios LDA, m=4 PCA, m=4 sample, 1ppc sample, 3ppc sample, 20ppc sample, allap LDA, K=4 PCA, K=4 sample, 1ppc sample, 3ppc sample, 20ppc sample, allap na (log axis) na (log axis) Left: using device Alpha for profiling and device Beta for attack. Right: using same device (Beta) but different acquisition campaigns for profile (Beta) and attack (Beta Bis) all compression techniques (except for LDA!) failed badly across different devices or even across different campaigns on the same device. 41 / 46

47 ma ma Major cause: DC drift across devices, boards, campaigns 5 4 single trace from Beta Alpha Beta Beta bis Gamma Delta Beta + ci Beta - ci SNR of Beta Top: Trace from Beta (first clock cycle of target LOAD) Bottom: overall mean vectors x r for all campaigns minus overall mean vector of Beta 42 / 46

48 LDA gets this: ( S r pooled) (noise) has DC eigenvector u1 u2 u3 u4 u5 u6 43 / 46

49 No major incompatibility of underlying leakage model ma ma Normal distribution at sample index j = 884 based on the template parameters ( x r k, Sr pooled ) for k {0,..., 9} on Alpha (left) and Beta (right). 44 / 46

50 Template attacks are very sensitive to changes in DC bias Changes in DC bias can also happen within a single campaign (e.g. due to temperature changes) This causes a DC eigenvector to emerge in S r pooled which LDA utilizes to ignore DC drift as noise Workarounds: Use different devices during profiling campaigns. Allow temperature variation during profiling campaigns (can also affect switching thresholds). Use LDA. Where LDA is not applicable: use PCA with random DC offsets added to mean vectors before calculating B, to push most DC signal into a single eigenvector and keep the rest DC-free. Apply DC-block filter: happens already automatically if EM sensors or other high-pass filters are used. However this can also significantly increase noise, by spreading nearby variability via filter impulse response. 45 / 46

51 Guessing entropy (bits) Profiling on Alpha, attack on Beta Guessing entropy (bits) PCA m = 4 LDA m = 3, m = 4 LDA, m=4 PCA, m=4 sample, 1ppc sample, 3ppc sample, 20ppc sample, allap LDA, m=3 LDA, m=5 LDA, m=6 LDA, m=40 PCA, m=5 PCA, m=6 PCA, m= LDA, m=4 LDA, m=5 PCA, m=4 PCA, m= na (log axis) na (log axis) Left: using various compressions with the classic method. DC eigenvector of B: j = 5 Right: using PCA and LDA after adding random DC offset. DC eigenvector of B: j = 1 PCA benefits from including DC eigenvector in projection, LDA does not. 46 / 46

Efficient Template Attacks

Efficient Template Attacks Efficient Template Attacks Omar Choudary and Markus G. Kuhn Computer Laboratory, University of Cambridge firstname.lastname@cl.cam.ac.uk Abstract. Template attacks remain a powerful side-channel technique

More information

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol Introduction to Side Channel Analysis Elisabeth Oswald University of Bristol Outline Part 1: SCA overview & leakage Part 2: SCA attacks & exploiting leakage and very briefly Part 3: Countermeasures Part

More information

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides

Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Lecture 16: Small Sample Size Problems (Covariance Estimation) Many thanks to Carlos Thomaz who authored the original version of these slides Intelligent Data Analysis and Probabilistic Inference Lecture

More information

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Data Mining. Dimensionality reduction. Hamid Beigy. Sharif University of Technology. Fall 1395 Data Mining Dimensionality reduction Hamid Beigy Sharif University of Technology Fall 1395 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1395 1 / 42 Outline 1 Introduction 2 Feature selection

More information

DPA-Resistance without routing constraints?

DPA-Resistance without routing constraints? Introduction Attack strategy Experimental results Conclusion Introduction Attack strategy Experimental results Conclusion Outline DPA-Resistance without routing constraints? A cautionary note about MDPL

More information

Start Simple and then Refine: Bias-Variance Decomposition as a Diagnosis Tool for Leakage Profiling

Start Simple and then Refine: Bias-Variance Decomposition as a Diagnosis Tool for Leakage Profiling IEEE TRANSACTIONS ON COMPUTERS, VOL.?, NO.?,?? 1 Start Simple and then Refine: Bias-Variance Decomposition as a Diagnosis Tool for Leakage Profiling Liran Lerman, Nikita Veshchikov, Olivier Markowitch,

More information

Introduction to Machine Learning

Introduction to Machine Learning 10-701 Introduction to Machine Learning PCA Slides based on 18-661 Fall 2018 PCA Raw data can be Complex, High-dimensional To understand a phenomenon we measure various related quantities If we knew what

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

PCA & ICA. CE-717: Machine Learning Sharif University of Technology Spring Soleymani PCA & ICA CE-717: Machine Learning Sharif University of Technology Spring 2015 Soleymani Dimensionality Reduction: Feature Selection vs. Feature Extraction Feature selection Select a subset of a given

More information

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to:

System 1 (last lecture) : limited to rigidly structured shapes. System 2 : recognition of a class of varying shapes. Need to: System 2 : Modelling & Recognising Modelling and Recognising Classes of Classes of Shapes Shape : PDM & PCA All the same shape? System 1 (last lecture) : limited to rigidly structured shapes System 2 :

More information

Investigations of Power Analysis Attacks on Smartcards *

Investigations of Power Analysis Attacks on Smartcards * Investigations of Power Analysis Attacks on Smartcards * Thomas S. Messerges Ezzy A. Dabbish Robert H. Sloan 1 Dept. of EE and Computer Science Motorola Motorola University of Illinois at Chicago tomas@ccrl.mot.com

More information

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University

Lecture 24: Principal Component Analysis. Aykut Erdem May 2016 Hacettepe University Lecture 4: Principal Component Analysis Aykut Erdem May 016 Hacettepe University This week Motivation PCA algorithms Applications PCA shortcomings Autoencoders Kernel PCA PCA Applications Data Visualization

More information

7 Cryptanalysis. 7.1 Structural Attacks CA642: CRYPTOGRAPHY AND NUMBER THEORY 1

7 Cryptanalysis. 7.1 Structural Attacks CA642: CRYPTOGRAPHY AND NUMBER THEORY 1 CA642: CRYPTOGRAPHY AND NUMBER THEORY 1 7 Cryptanalysis Cryptanalysis Attacks such as exhaustive key-search do not exploit any properties of the encryption algorithm or implementation. Structural attacks

More information

Principal Component Analysis

Principal Component Analysis B: Chapter 1 HTF: Chapter 1.5 Principal Component Analysis Barnabás Póczos University of Alberta Nov, 009 Contents Motivation PCA algorithms Applications Face recognition Facial expression recognition

More information

Elliptic Curve Cryptography and Security of Embedded Devices

Elliptic Curve Cryptography and Security of Embedded Devices Elliptic Curve Cryptography and Security of Embedded Devices Ph.D. Defense Vincent Verneuil Institut de Mathématiques de Bordeaux Inside Secure June 13th, 2012 V. Verneuil - Elliptic Curve Cryptography

More information

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation)

Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) Principal Component Analysis -- PCA (also called Karhunen-Loeve transformation) PCA transforms the original input space into a lower dimensional space, by constructing dimensions that are linear combinations

More information

CS281 Section 4: Factor Analysis and PCA

CS281 Section 4: Factor Analysis and PCA CS81 Section 4: Factor Analysis and PCA Scott Linderman At this point we have seen a variety of machine learning models, with a particular emphasis on models for supervised learning. In particular, we

More information

Review (Probability & Linear Algebra)

Review (Probability & Linear Algebra) Review (Probability & Linear Algebra) CE-725 : Statistical Pattern Recognition Sharif University of Technology Spring 2013 M. Soleymani Outline Axioms of probability theory Conditional probability, Joint

More information

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag

A Tutorial on Data Reduction. Principal Component Analysis Theoretical Discussion. By Shireen Elhabian and Aly Farag A Tutorial on Data Reduction Principal Component Analysis Theoretical Discussion By Shireen Elhabian and Aly Farag University of Louisville, CVIP Lab November 2008 PCA PCA is A backbone of modern data

More information

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview

Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System 2 Overview 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 4 4 4 6 Modeling Classes of Shapes Suppose you have a class of shapes with a range of variations: System processes System Overview Previous Systems:

More information

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction

ECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering

More information

Clustering VS Classification

Clustering VS Classification MCQ Clustering VS Classification 1. What is the relation between the distance between clusters and the corresponding class discriminability? a. proportional b. inversely-proportional c. no-relation Ans:

More information

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations.

Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations. Previously Focus was on solving matrix inversion problems Now we look at other properties of matrices Useful when A represents a transformations y = Ax Or A simply represents data Notion of eigenvectors,

More information

Expectation Maximization

Expectation Maximization Expectation Maximization Machine Learning CSE546 Carlos Guestrin University of Washington November 13, 2014 1 E.M.: The General Case E.M. widely used beyond mixtures of Gaussians The recipe is the same

More information

Random Delay Insertion: Effective Countermeasure against DPA on FPGAs

Random Delay Insertion: Effective Countermeasure against DPA on FPGAs Random Delay Insertion: Effective Countermeasure against DPA on FPGAs Lu, Yingxi Dr. Máire O Neill Prof. John McCanny Overview September 2004 PRESENTATION OUTLINE DPA and countermeasures Random Delay Insertion

More information

1 GSW Sets of Systems

1 GSW Sets of Systems 1 Often, we have to solve a whole series of sets of simultaneous equations of the form y Ax, all of which have the same matrix A, but each of which has a different known vector y, and a different unknown

More information

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas Dimensionality Reduction: PCA Nicholas Ruozzi University of Texas at Dallas Eigenvalues λ is an eigenvalue of a matrix A R n n if the linear system Ax = λx has at least one non-zero solution If Ax = λx

More information

Channel Equalization for Side Channel Attacks

Channel Equalization for Side Channel Attacks Channel Equalization for Side Channel Attacks Colin O Flynn and Zhizhang (David) Chen Dalhousie University, Halifax, Canada {coflynn, z.chen}@dal.ca Revised: July 10, 2014 Abstract. This paper introduces

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Machine Learning 2nd Edition

Machine Learning 2nd Edition INTRODUCTION TO Lecture Slides for Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some parts from http://www.cs.tau.ac.il/~apartzin/machinelearning/ The MIT Press, 2010

More information

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur March 19, 2018 Modular Exponentiation Public key Cryptography March 19, 2018 Branch Prediction Attacks 2 / 54 Modular Exponentiation

More information

How to Evaluate Side-Channel Leakages

How to Evaluate Side-Channel Leakages How to Evaluate Side-Channel Leakages 7. June 2017 Ruhr-Universität Bochum Acknowledgment Tobias Schneider 2 Motivation Security Evaluation Attack based Testing Information theoretic Testing Testing based

More information

Using Subspace-Based Template Attacks to Compare and Combine Power and Electromagnetic Information Leakages

Using Subspace-Based Template Attacks to Compare and Combine Power and Electromagnetic Information Leakages Using Subspace-Based Template Attacks to Compare and Combine Power and Electromagnetic Information Leakages F.-X. Standaert 1 C. Archambeau 2 1 UCL Crypto Group, Université catholique de Louvain, 2 Centre

More information

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis

December 20, MAA704, Multivariate analysis. Christopher Engström. Multivariate. analysis. Principal component analysis .. December 20, 2013 Todays lecture. (PCA) (PLS-R) (LDA) . (PCA) is a method often used to reduce the dimension of a large dataset to one of a more manageble size. The new dataset can then be used to make

More information

THEORETICAL SIMPLE POWER ANALYSIS OF THE GRAIN STREAM CIPHER. A. A. Zadeh and Howard M. Heys

THEORETICAL SIMPLE POWER ANALYSIS OF THE GRAIN STREAM CIPHER. A. A. Zadeh and Howard M. Heys THEORETICAL SIMPLE POWER ANALYSIS OF THE GRAIN STREAM CIPHER A. A. Zadeh and Howard M. Heys Electrical and Computer Engineering Faculty of Engineering and Applied Science Memorial University of Newfoundland

More information

Dimensionality reduction

Dimensionality reduction Dimensionality Reduction PCA continued Machine Learning CSE446 Carlos Guestrin University of Washington May 22, 2013 Carlos Guestrin 2005-2013 1 Dimensionality reduction n Input data may have thousands

More information

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) Principal Component Analysis (PCA) Salvador Dalí, Galatea of the Spheres CSC411/2515: Machine Learning and Data Mining, Winter 2018 Michael Guerzhoy and Lisa Zhang Some slides from Derek Hoiem and Alysha

More information

Hardware Security Side channel attacks

Hardware Security Side channel attacks Hardware Security Side channel attacks R. Pacalet renaud.pacalet@telecom-paristech.fr May 24, 2018 Introduction Outline Timing attacks P. Kocher Optimizations Conclusion Power attacks Introduction Simple

More information

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD

DATA MINING LECTURE 8. Dimensionality Reduction PCA -- SVD DATA MINING LECTURE 8 Dimensionality Reduction PCA -- SVD The curse of dimensionality Real data usually have thousands, or millions of dimensions E.g., web documents, where the dimensionality is the vocabulary

More information

Regularized Discriminant Analysis and Reduced-Rank LDA

Regularized Discriminant Analysis and Reduced-Rank LDA Regularized Discriminant Analysis and Reduced-Rank LDA Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Regularized Discriminant Analysis A compromise between LDA and

More information

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014

Dimensionality Reduction Using PCA/LDA. Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction Using PCA/LDA Hongyu Li School of Software Engineering TongJi University Fall, 2014 Dimensionality Reduction One approach to deal with high dimensional data is by reducing their

More information

Information Security Theory vs. Reality

Information Security Theory vs. Reality Information Security Theory vs. Reality 0368-4474, Winter 2015-2016 Lecture 3: Power analysis, correlation power analysis Lecturer: Eran Tromer 1 Power Analysis Simple Power Analysis Correlation Power

More information

Linear Subspace Models

Linear Subspace Models Linear Subspace Models Goal: Explore linear models of a data set. Motivation: A central question in vision concerns how we represent a collection of data vectors. The data vectors may be rasterized images,

More information

Table of Contents. Multivariate methods. Introduction II. Introduction I

Table of Contents. Multivariate methods. Introduction II. Introduction I Table of Contents Introduction Antti Penttilä Department of Physics University of Helsinki Exactum summer school, 04 Construction of multinormal distribution Test of multinormality with 3 Interpretation

More information

14 Singular Value Decomposition

14 Singular Value Decomposition 14 Singular Value Decomposition For any high-dimensional data analysis, one s first thought should often be: can I use an SVD? The singular value decomposition is an invaluable analysis tool for dealing

More information

1 Singular Value Decomposition and Principal Component

1 Singular Value Decomposition and Principal Component Singular Value Decomposition and Principal Component Analysis In these lectures we discuss the SVD and the PCA, two of the most widely used tools in machine learning. Principal Component Analysis (PCA)

More information

Cryptanalysis of the Light-Weight Cipher A2U2 First Draft version

Cryptanalysis of the Light-Weight Cipher A2U2 First Draft version Cryptanalysis of the Light-Weight Cipher A2U2 First Draft version Mohamed Ahmed Abdelraheem, Julia Borghoff, Erik Zenner Technical University of Denmark, DK-2800 Kgs. Lyngby, Denmark {M.A.Abdelraheem,J.Borghoff,E.Zenner}@mat.dtu.dk

More information

Differential-Linear Cryptanalysis of Serpent

Differential-Linear Cryptanalysis of Serpent Differential-Linear Cryptanalysis of Serpent Eli Biham, 1 Orr Dunkelman, 1 Nathan Keller 2 1 Computer Science Department, Technion. Haifa 32000, Israel {biham,orrd}@cs.technion.ac.il 2 Mathematics Department,

More information

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) Chapter 5 The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) 5.1 Basics of SVD 5.1.1 Review of Key Concepts We review some key definitions and results about matrices that will

More information

Linear Regression and Its Applications

Linear Regression and Its Applications Linear Regression and Its Applications Predrag Radivojac October 13, 2014 Given a data set D = {(x i, y i )} n the objective is to learn the relationship between features and the target. We usually start

More information

Motivating the Covariance Matrix

Motivating the Covariance Matrix Motivating the Covariance Matrix Raúl Rojas Computer Science Department Freie Universität Berlin January 2009 Abstract This note reviews some interesting properties of the covariance matrix and its role

More information

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary

Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics. 1 Executive summary ECE 830 Spring 207 Instructor: R. Willett Lecture 5: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we saw that the likelihood

More information

Lecture 6: Methods for high-dimensional problems

Lecture 6: Methods for high-dimensional problems Lecture 6: Methods for high-dimensional problems Hector Corrada Bravo and Rafael A. Irizarry March, 2010 In this Section we will discuss methods where data lies on high-dimensional spaces. In particular,

More information

The problem is to infer on the underlying probability distribution that gives rise to the data S.

The problem is to infer on the underlying probability distribution that gives rise to the data S. Basic Problem of Statistical Inference Assume that we have a set of observations S = { x 1, x 2,..., x N }, xj R n. The problem is to infer on the underlying probability distribution that gives rise to

More information

PROBABILISTIC LATENT SEMANTIC ANALYSIS

PROBABILISTIC LATENT SEMANTIC ANALYSIS PROBABILISTIC LATENT SEMANTIC ANALYSIS Lingjia Deng Revised from slides of Shuguang Wang Outline Review of previous notes PCA/SVD HITS Latent Semantic Analysis Probabilistic Latent Semantic Analysis Applications

More information

Neuroscience Introduction

Neuroscience Introduction Neuroscience Introduction The brain As humans, we can identify galaxies light years away, we can study particles smaller than an atom. But we still haven t unlocked the mystery of the three pounds of matter

More information

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ).

x. Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ 2 ). .8.6 µ =, σ = 1 µ = 1, σ = 1 / µ =, σ =.. 3 1 1 3 x Figure 1: Examples of univariate Gaussian pdfs N (x; µ, σ ). The Gaussian distribution Probably the most-important distribution in all of statistics

More information

Templates as Master Keys

Templates as Master Keys Templates as Master Keys Dakshi Agrawal, Josyula R. Rao, Pankaj Rohatgi, and Kai Schramm IBM Watson Research Center P.O. Box 74 Yorktown Heights, NY 1598 USA {agrawal,jrrao,rohatgi}@us.ibm.com Communication

More information

Linear Algebra and Eigenproblems

Linear Algebra and Eigenproblems Appendix A A Linear Algebra and Eigenproblems A working knowledge of linear algebra is key to understanding many of the issues raised in this work. In particular, many of the discussions of the details

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 22 MACHINE LEARNING Discrete Probabilities Consider two variables and y taking discrete

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

STA 414/2104: Machine Learning

STA 414/2104: Machine Learning STA 414/2104: Machine Learning Russ Salakhutdinov Department of Computer Science! Department of Statistics! rsalakhu@cs.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 8 Continuous Latent Variable

More information

SNR to Success Rate: Reaching the Limit of Non-Profiling DPA

SNR to Success Rate: Reaching the Limit of Non-Profiling DPA SNR to Success Rate: Reaching the Limit of Non-Profiling DPA Suvadeep Hajra Dept. of Computer Science & Engg. Indian Institute of Technology, Kharagpur, India suvadeep.hajra@gmail.com Debdeep Mukhopadhyay

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Feature Extraction Hamid R. Rabiee Jafar Muhammadi, Alireza Ghasemi, Payam Siyari Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Dimensionality Reduction

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

A Stochastic Model for Differential Side Channel Cryptanalysis

A Stochastic Model for Differential Side Channel Cryptanalysis A Stochastic Model for Differential Side Channel Cryptanalysis Werner Schindler 1, Kerstin Lemke 2, Christof Paar 2 1 Bundesamt für Sicherheit in der Informationstechnik (BSI) 53175 Bonn, Germany 2 Horst

More information

STA 4273H: Statistical Machine Learning

STA 4273H: Statistical Machine Learning STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data.

Structure in Data. A major objective in data analysis is to identify interesting features or structure in the data. Structure in Data A major objective in data analysis is to identify interesting features or structure in the data. The graphical methods are very useful in discovering structure. There are basically two

More information

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30

Problem Set 2. MAS 622J/1.126J: Pattern Recognition and Analysis. Due: 5:00 p.m. on September 30 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain

More information

Bayesian Decision Theory

Bayesian Decision Theory Bayesian Decision Theory Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 46 Bayesian

More information

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015

ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 ROBERTO BATTITI, MAURO BRUNATO. The LION Way: Machine Learning plus Intelligent Optimization. LIONlab, University of Trento, Italy, Apr 2015 http://intelligentoptimization.org/lionbook Roberto Battiti

More information

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works

CS168: The Modern Algorithmic Toolbox Lecture #8: How PCA Works CS68: The Modern Algorithmic Toolbox Lecture #8: How PCA Works Tim Roughgarden & Gregory Valiant April 20, 206 Introduction Last lecture introduced the idea of principal components analysis (PCA). The

More information

Discriminant analysis and supervised classification

Discriminant analysis and supervised classification Discriminant analysis and supervised classification Angela Montanari 1 Linear discriminant analysis Linear discriminant analysis (LDA) also known as Fisher s linear discriminant analysis or as Canonical

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks Delivered by Mark Ebden With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

7. Symmetric Matrices and Quadratic Forms

7. Symmetric Matrices and Quadratic Forms Linear Algebra 7. Symmetric Matrices and Quadratic Forms CSIE NCU 1 7. Symmetric Matrices and Quadratic Forms 7.1 Diagonalization of symmetric matrices 2 7.2 Quadratic forms.. 9 7.4 The singular value

More information

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26

Lecture 13. Principal Component Analysis. Brett Bernstein. April 25, CDS at NYU. Brett Bernstein (CDS at NYU) Lecture 13 April 25, / 26 Principal Component Analysis Brett Bernstein CDS at NYU April 25, 2017 Brett Bernstein (CDS at NYU) Lecture 13 April 25, 2017 1 / 26 Initial Question Intro Question Question Let S R n n be symmetric. 1

More information

Advanced Introduction to Machine Learning CMU-10715

Advanced Introduction to Machine Learning CMU-10715 Advanced Introduction to Machine Learning CMU-10715 Principal Component Analysis Barnabás Póczos Contents Motivation PCA algorithms Applications Some of these slides are taken from Karl Booksh Research

More information

IASI PC compression Searching for signal in the residuals. Thomas August, Nigel Atkinson, Fiona Smith

IASI PC compression Searching for signal in the residuals. Thomas August, Nigel Atkinson, Fiona Smith IASI PC compression Searching for signal in the residuals Tim.Hultberg@EUMETSAT.INT Thomas August, Nigel Atkinson, Fiona Smith Raw radiance (minus background) Reconstructed radiance (minus background)

More information

A Collision-Attack on AES Combining Side Channel- and Differential-Attack

A Collision-Attack on AES Combining Side Channel- and Differential-Attack A Collision-Attack on AES Combining Side Channel- and Differential-Attack Kai Schramm, Gregor Leander, Patrick Felke, and Christof Paar Horst Görtz Institute for IT Security Ruhr-Universität Bochum, Germany

More information

Intelligent Data Analysis Lecture Notes on Document Mining

Intelligent Data Analysis Lecture Notes on Document Mining Intelligent Data Analysis Lecture Notes on Document Mining Peter Tiňo Representing Textual Documents as Vectors Our next topic will take us to seemingly very different data spaces - those of textual documents.

More information

PRINCIPAL COMPONENTS ANALYSIS

PRINCIPAL COMPONENTS ANALYSIS 121 CHAPTER 11 PRINCIPAL COMPONENTS ANALYSIS We now have the tools necessary to discuss one of the most important concepts in mathematical statistics: Principal Components Analysis (PCA). PCA involves

More information

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA

Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis (PCA) Relationship Between a Linear Combination of Variables and Axes Rotation for PCA Principle Components Analysis: Uses one group of variables (we will call this X) In

More information

Algebraic Methods in Side-Channel Collision Attacks and Practical Collision Detection

Algebraic Methods in Side-Channel Collision Attacks and Practical Collision Detection Algebraic Methods in Side-Channel Collision Attacks and Practical Collision Detection Andrey Bogdanov 1, Ilya Kizhvatov 2, and Andrey Pyshkin 3 1 Horst Görtz Institute for Information Security Ruhr-University

More information

Learning features by contrasting natural images with noise

Learning features by contrasting natural images with noise Learning features by contrasting natural images with noise Michael Gutmann 1 and Aapo Hyvärinen 12 1 Dept. of Computer Science and HIIT, University of Helsinki, P.O. Box 68, FIN-00014 University of Helsinki,

More information

Lecture 7: Con3nuous Latent Variable Models

Lecture 7: Con3nuous Latent Variable Models CSC2515 Fall 2015 Introduc3on to Machine Learning Lecture 7: Con3nuous Latent Variable Models All lecture slides will be available as.pdf on the course website: http://www.cs.toronto.edu/~urtasun/courses/csc2515/

More information

MACHINE LEARNING ADVANCED MACHINE LEARNING

MACHINE LEARNING ADVANCED MACHINE LEARNING MACHINE LEARNING ADVANCED MACHINE LEARNING Recap of Important Notions on Estimation of Probability Density Functions 2 2 MACHINE LEARNING Overview Definition pdf Definition joint, condition, marginal,

More information

Linear Dimensionality Reduction

Linear Dimensionality Reduction Outline Hong Chang Institute of Computing Technology, Chinese Academy of Sciences Machine Learning Methods (Fall 2012) Outline Outline I 1 Introduction 2 Principal Component Analysis 3 Factor Analysis

More information

Singular Value Decompsition

Singular Value Decompsition Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost

More information

Correlation Power Analysis. Chujiao Ma

Correlation Power Analysis. Chujiao Ma Correlation Power Analysis Chujiao Ma Power Analysis Simple Power Analysis (SPA) different operations consume different power Differential Power Analysis (DPA) different data consume different power Correlation

More information

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = 30 MATHEMATICS REVIEW G A.1.1 Matrices and Vectors Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A = a 11 a 12... a 1N a 21 a 22... a 2N...... a M1 a M2... a MN A matrix can

More information

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018

From Binary to Multiclass Classification. CS 6961: Structured Prediction Spring 2018 From Binary to Multiclass Classification CS 6961: Structured Prediction Spring 2018 1 So far: Binary Classification We have seen linear models Learning algorithms Perceptron SVM Logistic Regression Prediction

More information

Multiple-Differential Side-Channel Collision Attacks on AES

Multiple-Differential Side-Channel Collision Attacks on AES Multiple-Differential Side-Channel Collision Attacks on AES Andrey Bogdanov Horst Görtz Institute for IT Security Ruhr University Bochum, Germany abogdanov@crypto.rub.de www.crypto.rub.de Abstract. In

More information

PCA, Kernel PCA, ICA

PCA, Kernel PCA, ICA PCA, Kernel PCA, ICA Learning Representations. Dimensionality Reduction. Maria-Florina Balcan 04/08/2015 Big & High-Dimensional Data High-Dimensions = Lot of Features Document classification Features per

More information

Gaussian Models

Gaussian Models Gaussian Models ddebarr@uw.edu 2016-04-28 Agenda Introduction Gaussian Discriminant Analysis Inference Linear Gaussian Systems The Wishart Distribution Inferring Parameters Introduction Gaussian Density

More information

Differential Power Analysis Attacks Based on the Multiple Correlation Coefficient Xiaoke Tang1,a, Jie Gan1,b, Jiachao Chen2,c, Junrong Liu3,d

Differential Power Analysis Attacks Based on the Multiple Correlation Coefficient Xiaoke Tang1,a, Jie Gan1,b, Jiachao Chen2,c, Junrong Liu3,d 4th International Conference on Sensors, Measurement and Intelligent Materials (ICSMIM 2015) Differential Power Analysis Attacks Based on the Multiple Correlation Coefficient Xiaoke Tang1,a, Jie Gan1,b,

More information

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x =

Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1. x 2. x = Linear Algebra Review Vectors To begin, let us describe an element of the state space as a point with numerical coordinates, that is x 1 x x = 2. x n Vectors of up to three dimensions are easy to diagram.

More information

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)?

2. What are the tradeoffs among different measures of error (e.g. probability of false alarm, probability of miss, etc.)? ECE 830 / CS 76 Spring 06 Instructors: R. Willett & R. Nowak Lecture 3: Likelihood ratio tests, Neyman-Pearson detectors, ROC curves, and sufficient statistics Executive summary In the last lecture we

More information

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis

CS 340 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis CS 3 Lec. 18: Multivariate Gaussian Distributions and Linear Discriminant Analysis AD March 11 AD ( March 11 1 / 17 Multivariate Gaussian Consider data { x i } N i=1 where xi R D and we assume they are

More information