Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur

Similar documents
Formal Fault Analysis of Branch Predictors: Attacking countermeasures of Asymmetric key ciphers

Elliptic Curve Cryptography and Security of Embedded Devices

Side-channel attacks on PKC and countermeasures with contributions from PhD students

A DPA attack on RSA in CRT mode

Horizontal and Vertical Side-Channel Attacks against Secure RSA Implementations

CPE 776:DATA SECURITY & CRYPTOGRAPHY. Some Number Theory and Classical Crypto Systems

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Efficient randomized regular modular exponentiation using combined Montgomery and Barrett multiplications

Efficient Application of Countermeasures for Elliptic Curve Cryptography

Remote Timing Attacks are Practical

7 Cryptanalysis. 7.1 Structural Attacks CA642: CRYPTOGRAPHY AND NUMBER THEORY 1

Introduction to Side Channel Analysis. Elisabeth Oswald University of Bristol

Hardware Architectures for Public Key Algorithms Requirements and Solutions for Today and Tomorrow

Public-Key Cryptosystems CHAPTER 4

Exponent Blinding Does not Always Lift (Partial) SPA Resistance to Higher-Level Security

Hardware Security Side channel attacks

Multi-Exponentiation Algorithm

Introduction to Modern Cryptography. Benny Chor

RSA Key Extraction via Low- Bandwidth Acoustic Cryptanalysis. Daniel Genkin, Adi Shamir, Eran Tromer

Elliptic Curve Cryptography

8.1 Principles of Public-Key Cryptosystems

Algorithm for RSA and Hyperelliptic Curve Cryptosystems Resistant to Simple Power Analysis

Power Analysis to ECC Using Differential Power between Multiplication and Squaring

RSA Implementation. Oregon State University

CIS 551 / TCOM 401 Computer and Network Security

Discrete Mathematics GCD, LCM, RSA Algorithm

Public Key 9/17/2018. Symmetric Cryptography Review. Symmetric Cryptography: Shortcomings (1) Symmetric Cryptography: Analogy

Lecture 6: Introducing Complexity

DPA-Resistance without routing constraints?

Resistance of Randomized Projective Coordinates Against Power Analysis

Lemma 1.2. (1) If p is prime, then ϕ(p) = p 1. (2) If p q are two primes, then ϕ(pq) = (p 1)(q 1).

Arithmetic in Integer Rings and Prime Fields

Four-Dimensional GLV Scalar Multiplication

Security Issues in Cloud Computing Modern Cryptography II Asymmetric Cryptography

Numbers. Çetin Kaya Koç Winter / 18

EECS150 - Digital Design Lecture 23 - FFs revisited, FIFOs, ECCs, LSFRs. Cross-coupled NOR gates

Toward Secure Implementation of McEliece Decryption

Optimal XOR based (2,n)-Visual Cryptography Schemes

Square Always Exponentiation

Public-Key Encryption: ElGamal, RSA, Rabin

Algorithmic Number Theory and Public-key Cryptography

Timing Attack against protected RSA-CRT implementation used in PolarSSL

You separate binary numbers into columns in a similar fashion. 2 5 = 32

Sliding right into disaster - Left-to-right sliding windows leak

Side-Channel Analysis on Blinded Regular Scalar Multiplications

RSA. Ramki Thurimella

Public Key Cryptography

Statistical Analysis for Access-Driven Cache Attacks Against AES

Faster arithmetic for number-theoretic transforms

1 What are Physical Attacks. 2 Physical Attacks on RSA. Today:

A new security notion for asymmetric encryption Draft #12

Provably Secure Higher-Order Masking of AES

6. ELLIPTIC CURVE CRYPTOGRAPHY (ECC)

A new security notion for asymmetric encryption Draft #8

Cryptanalysis of the SIMON Family of Block Ciphers

YALE UNIVERSITY DEPARTMENT OF COMPUTER SCIENCE

Asymmetric Cryptography

Chapter 4 Asymmetric Cryptography

Randomized Signed-Scalar Multiplication of ECC to Resist Power Attacks

Blinded Fault Resistant Exponentiation FDTC 06

Cosc 412: Cryptography and complexity Lecture 7 (22/8/2018) Knapsacks and attacks

Side Channel Attack to Actual Cryptanalysis: Breaking CRT-RSA with Low Weight Decryption Exponents

McBits: Fast code-based cryptography

A new security notion for asymmetric encryption Draft #10

Attacks on DES , K 2. ) L 3 = R 2 = L 1 f ( R 1, K 2 ) R 4 R 2. f (R 1 = L 1 ) = L 1. ) f ( R 3 , K 4. f (R 3 = L 3

SYMMETRIC ENCRYPTION. Mihir Bellare UCSD 1

Symmetric Encryption

1 Recommended Reading 1. 2 Public Key/Private Key Cryptography Overview RSA Algorithm... 2

In fact, 3 2. It is not known whether 3 1. All three problems seem hard, although Shor showed that one can solve 3 quickly on a quantum computer.

CRYPTOGRAPHY AND NUMBER THEORY

Efficient scalar multiplication against side channel attacks using new number representation

Stream ciphers I. Thomas Johansson. May 16, Dept. of EIT, Lund University, P.O. Box 118, Lund, Sweden

RSA RSA public key cryptosystem

Error-free protection of EC point multiplication by modular extension

1 Number Theory Basics

CSE. 1. In following code. addi. r1, skip1 xor in r2. r3, skip2. counter r4, top. taken): PC1: PC2: PC3: TTTTTT TTTTTT

Attacks on RSA & Using Asymmetric Crypto

Side Channel Analysis and Protection for McEliece Implementations

CHAPTER 12 CRYPTOGRAPHY OF A GRAY LEVEL IMAGE USING A MODIFIED HILL CIPHER

Lecture 1: Introduction to Public key cryptography

Differential Cache Trace Attack Against CLEFIA

Ti Secured communications

Asymmetric Encryption

Encryption: The RSA Public Key Cipher

Efficient Leak Resistant Modular Exponentiation in RNS

CSc 466/566. Computer Security. 5 : Cryptography Basics

Side Channel Analysis. Chester Rebeiro IIT Madras

PKCS #1 v2.0 Amendment 1: Multi-Prime RSA

Other Public-Key Cryptosystems

U.C. Berkeley CS276: Cryptography Luca Trevisan February 5, Notes for Lecture 6

Cryptography IV: Asymmetric Ciphers

Hardware Acceleration of the Tate Pairing in Characteristic Three

Notes on Alekhnovich s cryptosystems

A DPA Attack against the Modular Reduction within a CRT Implementation of RSA

First-Order DPA Attack Against AES in Counter Mode w/ Unknown Counter. DPA Attack, typical structure

Performance, Power & Energy. ELEC8106/ELEC6102 Spring 2010 Hayden Kwok-Hay So

Introduction to Modern Cryptography. Benny Chor

Outline. 1 Arithmetic on Bytes and 4-Byte Vectors. 2 The Rijndael Algorithm. 3 AES Key Schedule and Decryption. 4 Strengths and Weaknesses of Rijndael

Fundamentals of Modern Cryptography

Efficient encryption and decryption. ECE646 Lecture 10. RSA Implementation: Efficient Encryption & Decryption. Required Reading

Transcription:

Branch Prediction based attacks using Hardware performance Counters IIT Kharagpur March 19, 2018

Modular Exponentiation Public key Cryptography March 19, 2018 Branch Prediction Attacks 2 / 54

Modular Exponentiation Exponentiation and Underlying Multiplication Primitive Inputs(M) are encrypted and decrypted by performing modular exponentiation with modulus N on public or private keys represented as n bit binary string. Algorithm 1: Binary version of Square and Multiply Exponentiation Algorithm S M ; for i from 1 to n 1 do S S S mod N ; if d i = 1 then S S M mod N ; end end return S ; Conditional execution of instruction and their dependence on secret exponent is exploited by the simple power and timing side-channels. March 19, 2018 Branch Prediction Attacks 3 / 54

Modular Exponentiation Montgomery Ladder Exponentiation Algorithm A naïve modification is to have a balanced ladder structure having equal number of squarings and multiplications. Most popular exponentiation primitive for Asymmetric-key cryptographic implementations. Algorithm 2: Montgomery Ladder Algorithm R 0 1 ; R 1 M ; for i from 0 to n 1 do if d i = 0 then R 1 (R 0 R 1 ) mod N ; R 0 (R 0 R 0 ) mod N ; end else R 0 (R 0 R 1 ) mod N ; R 1 (R 1 R 1 ) mod N ; end end return R 0 ; March 19, 2018 Branch Prediction Attacks 4 / 54

Modular Exponentiation Montgomery Multiplication Algorithm Highly efficient algorithm for performing modular squaring and modular multiplication operation. Avoids time consuming integer division operation. R is assumed to be 2 k, when N is k-bit number. Calculates Z = A B R 1 (modn), A = a R(modN), B = b R(modN) and R 1 R = 1(modN). Algorithm 3: Montgomery Multiplication Algorithm S A B ; S (S + (S N 1 mod R) N)/R ; if S > N then S S N ; end return S ; March 19, 2018 Branch Prediction Attacks 5 / 54

Modular Exponentiation Branch Predictor State Machines Taken Predict Taken Taken Predict Not Taken Not Taken Taken Not Taken Taken Predict Taken Not Taken Predict Not Taken Not Taken Dynamic 2-bit predictor State Machine The predictor must miss twice before the prediction changes. Conditional branching in regular recurring fashion goes undetected. Pattern History Table 0000...00 Branch History Register 0000...01 λ(sc) Pattern History 0000...10 bits 1 1 1 0 Sc... prediction of B index Sc Sc+1 = d(sc, Rc) Branch result of B(Rc) 1111...10 1111...11 State transition logic Figure: Two Level Adaptive Branch Prediction March 19, 2018 Branch Prediction Attacks 6 / 54

Modular Exponentiation Modelling Branch Miss as Side-Channel from HPC We monitor the branch misses on the square and multiply and Montgomery Ladder algorithm using Montgomery multiplication as subroutine for operations like squaring and multiplication. Branch miss rely on the ability of branch predictor to correctly predict future branches to be taken. Profiling of HPCs using performance monitoring tools provides simple user interface to different hardware event counts and are considered as side-channel. March 19, 2018 Branch Prediction Attacks 7 / 54

Understanding Branch Mispredictions 1 Modular Exponentiation 2 Understanding Branch Mispredictions 3 Reverse engineering of branch predictors 4 Profiling the BPU Using Asynchronous Measurements 5 Acquire, Deduce and Remove 6 Experimental validation March 19, 2018 Branch Prediction Attacks 8 / 54

Understanding Branch Mispredictions Secret dependent branching Let n-bit secret scalar in ECC be denoted as (k 0, k 1,, k i,, k n 1 ). Trace of taken or not-taken branches as conditioned on scalar bits and expressed as (b 0, b 1,, b n 1 ). If a particular key bit k j is 1 then the conditional addition statement in the double and add algorithm gets executed. Thus, the condition is checked first, and if the particular key bit is set then its immediate next statement ie, addition gets performed. Since this is a normal flow of execution the branch is considered as not-taken ie, b j = 0 in this case. While when k j = 0, the addition operation is skipped and the execution continues with the next squaring statement. Thus, in this case branch is taken ie, b j = 1. March 19, 2018 Branch Prediction Attacks 9 / 54

Understanding Branch Mispredictions Effect of Compiler Optimizations on branching We validate our understanding for conditional branching and observe the effect of optimization options in gcc: 1.LC3 :.string hello 2.LC4 :.string hi Figure: Assembly generated using various optimization options in gcc March 19, 2018 Branch Prediction Attacks 10 / 54

Understanding Branch Mispredictions Approximating the System predictor with 2-bit branch predictor 4360 Observed branch misses from Perf 4350 4340 4330 4320 4310 4300 4290 470 480 490 500 510 520 530 540 550 560 Predicted branch misses from 2-bit dynamic predictor Figure: Variation of branch-misses from performance counters with increase in branch miss from 2-bit predictor algorithm Direct correlation observed for the branch misses from HPCs and from the simulated 2-bit dynamic predictor over a sample of exponent bitstream. This confirms assumption of 2-bit dynamic predictor being an approximation to the underlying system branch predictor. March 19, 2018 Branch Prediction Attacks 11 / 54

Idea of the Attack Understanding Branch Mispredictions Timing attack exploiting branch mispredictions are demonstrated which requires the knowledge of actual structure of branch prediction hardware of the target system. Advantage of this attack lies in the fact that adversary, inspite of having no knowledge of the underlying architecture, can actually target real systems and reveal secret exponent bits, exploiting the branch miss as side-channel from HPCs. This is an iterative attack, targeting i th bit assuming previous bits to be known. The attack separates a sample input set based on mispredictions for conditional reduction of Montgomery multiplication at the (i + 1) th squaring step of exponentiation assuming secret i th bit. March 19, 2018 Branch Prediction Attacks 12 / 54

Understanding Branch Mispredictions Threat Model for the attack The attacker knows first i bits of the private key and wants to determine next unknown bit d i of the key (d 0, d 1,, d i,, d n 1 ). Generate a trace of branches as (t m,1, t m,2,, t m,i ) for conditional reduction of Montgomery multiplication at every squaring step. Under the assumption of d i having value j, where j {0, 1}, appropriate value of t j m,i+1 is simulated. March 19, 2018 Branch Prediction Attacks 13 / 54

Understanding Branch Mispredictions Offline Phase of the Attack assume di = 0 di 0 t 0 i+1 d0 d1 di 1 t1 t2 ti 1 ti Trace of taken or not taken branches for an input plaintext m t1 t2 ti t 0 i+1 t1 t2 ti t 1 i+1 di = 1 1 t 1 i+1 if T(t1, t2,, ti) = t 1 i+1 then add m to M1 else add m to M2 Prediction Oracle if T(t1, t2,, ti) = t 0 i+1 then add m to M3 else add m to M4 Figure: Partitioning randomly generated Ciphertexts set based on simulated Branch miss Modelling March 19, 2018 Branch Prediction Attacks 14 / 54

Understanding Branch Mispredictions Separation of Random Inputs 1 M 1 = {m m does not cause a miss during MM of (i + 1) th squaring if d i = 1} 2 M 2 = {m m causes a misprediction during MM of (i + 1) th squaring if d i = 1} 3 M 3 = {m m does not cause a miss during MM of (i + 1) th squaring if d i = 0} 4 M 4 = {m m causes a misprediction during MM of (i + 1) th squaring if d i = 0} We ensure that there must be no common ciphertexts in sets (M 1, M 3 ) and (M 2, M 4 ) and the sets should be disjoint. March 19, 2018 Branch Prediction Attacks 15 / 54

Online Phase Understanding Branch Mispredictions The probable next bit is decided following the Algorithm 4. If(avg(M M2 ) > avg(m M1 )) and (avg(m M4 ) < avg(m M3 )), then the next bit (nb i ) = 1 Otherwise, if (avg(m M4 ) > avg(m M3 )) and (avg(m M2 ) < avg(m M1 )) then, next bit (nb i ) = 0 March 19, 2018 Branch Prediction Attacks 16 / 54

Understanding Branch Mispredictions Algorithm 4: Adversary Attack Algorithm Input: (d 0, d 1,, d i 1 ),M Output: Probable next bit nb i begin Offline Phase; for m M do Generate taken/ not-taken trace for input m as t m,1, t m,2,, t m,i ; Assume d i = 0 and 1, generate t 0 m,i+1, t1 m,i+1 respectively; p m,i+1 = T (t m,1, t m,2,, t m,i ) ; if p m,i+1 = t 1 m,i+1 then Add m to M 1 ; end else Add m to M 2 ; end if p m,i+1 = t 0 m,i+1 then Add m to M 3 ; end else Add m to M 4 ; end end Remove Duplicate Ciphertexts in the sets M 1, M 3 and M 2, M 4 ; Online Phase; Observe distribution of branch misses from performance counters as M M1, M M2, M M3, M M4 ; if (avg(m M2 ) > avg(m M1 )) and (avg(m M4 ) < avg(m M3 )) then nb i = 1 ; end March 19, if 2018 (avg(m M4 ) > avg(m M3 )) and (avg(m M2 ) < avg(m Branch M1 )) Prediction then Attacks 17 / 54

Understanding Branch Mispredictions Experimental Validation for the Online Phase of the Attack A large input set is separated by simulations over bimodal and two-level adaptive predictor. Average branch misses are observed from HPCs for each elements in set M 1, M 2, M 3 and M 4. Each set has L = 1000 elements. Experiment is repeated over I = 1000 iterations. Experiments are performed on various platforms as Core-2 Duo E7400, Intel Core i3 M350 and Intel Core i5-3470. March 19, 2018 Branch Prediction Attacks 18 / 54

Understanding Branch Mispredictions Experiments on Square and Multiply Algorithm Avg. Branch misses from Performance Counters 4648 4646 4644 4642 4640 4638 4636 4634 4632 4630 0 100 200 300 400 500 600 700 Iterations M 1 -no simulated miss M 2 -misprediction Avg. Branch misses from Performance Counters 4648 4646 4644 4642 4640 4638 4636 4634 4632 4630 0 100 200 300 400 500 600 700 Iterations M 3 -no simulated miss M 4 -misprediction (a) Correct Assumption d i = 1 (b) Incorrect Assumption d i = 0 Figure: Branch misses from HPCs on square and multiply correctly identifies secret bit d i = 1, ciphertext set partitioned by simulated misses of two-level adaptive predictor March 19, 2018 Branch Prediction Attacks 19 / 54

Understanding Branch Mispredictions Experiments on Montgomery Ladder Avg. Branch misses from Performance Counters 4805 4800 4795 4790 4785 4780 4775 4770 4765 4760 0 50 100 150 200 250 300 350 400 450 Iterations M 1 -no simulated miss M 2 -misprediction Avg. Branch misses from Performance Counters 4795 4790 4785 4780 4775 4770 4765 4760 0 50 100 150 200 250 300 350 400 450 Iterations M 3 -no simulated miss M 4 -misprediction (a) Correct Assumption d i = 1 (b) Incorrect Assumption d i = 0 Figure: Branch misses from HPCs on Montgomery Ladder correctly identifies secret bit d i = 1, ciphertext set partitioned by simulated misses of two-level adaptive predictor March 19, 2018 Branch Prediction Attacks 20 / 54

Understanding Branch Mispredictions Comparison with timing as side-channel 4.9e+06 4.9e+06 4.8e+06 4.8e+06 Execution Time 4.7e+06 4.6e+06 4.5e+06 4.4e+06 4.3e+06 4.2e+06 4.1e+06 M 1 -no simulated misprediction M 2 - misprediction 0 100 200 300 400 500 600 700 800 900 1000 Execution Time 4.7e+06 4.6e+06 4.5e+06 4.4e+06 4.3e+06 4.2e+06 4.1e+06 4e+06 3.9e+06 M 3 -no simulated misprediction M 4 - misprediction 0 100 200 300 400 500 600 700 800 900 1000 Iterations Iterations (a) Correct Assumption d i = 1 (b) Incorrect Assumption d i = 0 Figure: No identification of secret bit is possible using timing as side-channel with L = 1000 and I = 1000 March 19, 2018 Branch Prediction Attacks 21 / 54

Understanding Branch Mispredictions Existing DPA countermeasures on ECC Scalar Randomization If K is the secret scalar and P E the base point, instead of computing K times P, randomize the scalar K as K = K + r #E where r is a random integer and #E is the order of the curve. Scalar Splitting In [2], to randomize the scalar such that instead of computing KP, the scalar is split in two parts K = (K r) + r with a random r, and multiplication is computed separately, KP = (K r)p + rp. Point Blinding This computes K(P + R) instead of KP, where KR can be stored in the system beforehand, which when subtracted K(P + R) KR gives back KP. March 19, 2018 Branch Prediction Attacks 22 / 54

Reverse engineering of branch predictors 1 Modular Exponentiation 2 Understanding Branch Mispredictions 3 Reverse engineering of branch predictors 4 Profiling the BPU Using Asynchronous Measurements 5 Acquire, Deduce and Remove 6 Experimental validation March 19, 2018 Branch Prediction Attacks 23 / 54

Reverse engineering of branch predictors Code Snippet for granular observations Branch prediction hardware design is proprietary of the processor manufacturer. The perf class is instantiated with particular hardware event. We incorporate start and stop calls before and after the target conditional if-else structure. This returns event counts at regular interval and measurements are synchronous to the execution of the conditional block. March 19, 2018 Branch Prediction Attacks 24 / 54

Reverse engineering of branch predictors Sampling granularly using Ioctl system call Previous measurements are not practical for an attacker, as the attacker cannot modify the executable run by the victim. Instead we use perf ioctl in sampling mode. We define a function as signal handler which execute on an interrupt raised by the interrupt handler. perf object is instantiated with an event that is used as sampler object with a predefined sample_period. The interrupt handler is called at regular intervals of the sample_period. Measurements are observed to be more noisy if sample_period is reduced beyond a threshold, due to the overhead of the interrupt getting generated very frequently. March 19, 2018 Branch Prediction Attacks 25 / 54

Reverse engineering of branch predictors Reverse Engineering of Branch Predictors in Modern Intel Processors Total Branch Misses for each sequence 12 10 8 6 4 2 0 0 200 400 600 800 1000 Binary Sequences equivalent as number Total Branch Misses for each sequence 12 10 8 6 4 2 0 0 200 400 600 800 1000 Binary Sequences equivalent as number Figure: Branch misses from 2-bit predictor and actual hardware predictor Figure: Branch misses from 3-bit predictor and actual hardware predictor Branch misses from Intel i5-5200u with Broadwell Architecture March 19, 2018 Branch Prediction Attacks 26 / 54

Reverse engineering of branch predictors Total Branch Misses for each sequence 12 10 8 6 4 2 0 0 200 400 600 800 1000 Binary Sequences equivalent as number Total Branch Misses for each sequence 12 10 8 6 4 2 0 0 200 400 600 800 1000 Binary Sequences equivalent as number Figure: Branch misses from 2-bit predictor and actual hardware predictor Figure: Branch misses from 3-bit predictor and actual hardware predictor Branch misses from Intel i3-m350 with Sandybridge Architecture March 19, 2018 Branch Prediction Attacks 27 / 54

Reverse engineering of branch predictors Total Branch Misses for each sequence 12 10 8 6 4 2 0 0 200 400 600 800 1000 Binary Sequences equivalent as number Total Branch Misses for each sequence 12 10 8 6 4 2 0 0 200 400 600 800 1000 Binary Sequences equivalent as number Figure: Branch misses from 2-bit predictor and actual hardware predictor Figure: Branch misses from 3-bit predictor and actual hardware predictor Branch misses from Intel i3-m350 with Sandybridge Architecture March 19, 2018 Branch Prediction Attacks 28 / 54

Reverse engineering of branch predictors March 19, 2018 Branch Prediction Attacks 29 / 54

Profiling the BPU Using Asynchronous Measurements 1 Modular Exponentiation 2 Understanding Branch Mispredictions 3 Reverse engineering of branch predictors 4 Profiling the BPU Using Asynchronous Measurements 5 Acquire, Deduce and Remove 6 Experimental validation March 19, 2018 Branch Prediction Attacks 30 / 54

Profiling the BPU Using Asynchronous Measurements Threat Model 1 The measurements made are such that an event(branch misses) is getting monitored on specific sample period of the instruction count. 2 This measurement is handled by ioctl interface and observed to be asynchronous in nature. 3 The attack model is both practical and realistic in shared server environment where the hardware is shared between multiple users processes. 4 In such setting, the branch mispredictions can be observed over a target execution by the attacker from another concurrently running process. March 19, 2018 Branch Prediction Attacks 31 / 54

Profiling the BPU Using Asynchronous Measurements March 19, 2018 Branch Prediction Attacks 32 / 54

Profiling the BPU Using Asynchronous Measurements A point addition or doubling step for Edward-1174 curve b = 1 k j R b = 2R b + R kj R 1 = Z 1 Z 2, R 2 = R 1 2 R 3 = X 1 X 2, R 4 = Y 1 Y 2 R 5 = d R 3 R 4, R 6 = R 2 R 5, R 7 = R 2 + R 5 X 3 = R 1 R 6 ((X 1 + Y 1 ) (X 2 + Y 2 ) R 3 R 4 ) Y 3 = R 1 R 7 (R 4 R 3 ) Z 3 = R 6 R 7 March 19, 2018 Branch Prediction Attacks 33 / 54

Acquire, Deduce and Remove 1 Modular Exponentiation 2 Understanding Branch Mispredictions 3 Reverse engineering of branch predictors 4 Profiling the BPU Using Asynchronous Measurements 5 Acquire, Deduce and Remove 6 Experimental validation March 19, 2018 Branch Prediction Attacks 34 / 54

Acquire, Deduce and Remove Overview of the attack We follow by a strategy of Deduce & Remove to target the scalar splitting and scalar blinding countermeasures on ECC. 1 n-bit scalar K. 2 m branch miss samples from the execution over K. 3 Each branch miss sample is reported after sample period of I instructions. 4 Thus effectively, each sample of reported branch misses is affected by n/m bits of the scalar K. 5 In our experiments, we have chosen I such that n/m = 2. Moreover, considering a b-bit predictor, I should be such that n/m b (b = 3 for our case). March 19, 2018 Branch Prediction Attacks 35 / 54

Acquire, Deduce and Remove Template Building for b bit predictor March 19, 2018 Branch Prediction Attacks 36 / 54

Acquire, Deduce and Remove Offline template matching for an unknown trace Sample trace collected for an unknown secret scalar is matched iteratively to the previously constructed templates. The matching phase is composed of: Deduce and Remove steps. Deduce In Deduce phase, we start matching from the Least Significant Bit (LSB) of the scalar multiplication. The matching can be done iteratively taking on a trace with s sample points (s = t m/n). These s samples are point-wise matched with all the template points for each particular template and the distances for each of the traces are measured using the Least Square Method (LSM). March 19, 2018 Branch Prediction Attacks 37 / 54

Acquire, Deduce and Remove Remove For real experiments noise is predominant, several templates may return same least square distance. The noise filtering is done in Remove step. Parameters chosen for Remove step is device-specific and can also change with algorithm. March 19, 2018 Branch Prediction Attacks 38 / 54

Acquire, Deduce and Remove Template matching for Exponent Scalar Splitting 1 Acquire: N pairs of split scalar multiplications over K r and r are acquired over t bits, each pair for unknown and random values of r. 2 Deduce: For each of the N pairs, corresponding pairwise template matching is performed, on each sample. It results in N values each for K r and r. Pairwise adding up of each pair (K r + r) results in t-bits of K. 3 Remove: Ideally, all N values of K obtained previously must be identical. The non-matching values can be removed by majority voting. March 19, 2018 Branch Prediction Attacks 39 / 54

Acquire, Deduce and Remove Template matching for scalar blinding 1 Acquire: N blinded scalar multiplication over (K + r#e)p, for random, unknown values of r. 2 Deduce: For each of the N trace, pointwise matching over s branch misprediction samples of t bits is performed. It results N candidates for t bits for of K + r#e. March 19, 2018 Branch Prediction Attacks 40 / 54

Acquire, Deduce and Remove 3 Remove: Choose any 3 branch misprediction traces out of N traces, for random r 1, r 2, r 3. Deduce step reveals t bits of K + r 1 #E, K + r 2 #E and K + r 3 #E respectively. Take pair wise difference of the candidate values example (K + r 1 #E) (K + r 2 #E). Compute r 1 #E r 2 #E, r 2 #E r 3 #E and r 1 #E r 3 #E. Now for correct t bits of the blinded scalar, adding up of candidate value of r 1 #E r 2 #E and r 2 #E r 3 #E would result in non-empty set on intersection with candidate of r 1 #E r 3 #E. Combination for empty set for intersection can be discarded, leading to t bits of blinded scalar. Iteratively repeating this process leads to retrieval of k + r i #E separately. Modulus with the order of curve returns secret K for each r i s. March 19, 2018 Branch Prediction Attacks 41 / 54

Experimental validation 1 Modular Exponentiation 2 Understanding Branch Mispredictions 3 Reverse engineering of branch predictors 4 Profiling the BPU Using Asynchronous Measurements 5 Acquire, Deduce and Remove 6 Experimental validation March 19, 2018 Branch Prediction Attacks 42 / 54

Experimental validation Building Branch Misprediction Templates Using perf ioctl Sampling Success of our attack is highly dependent on how accurate the template has been built. The templates constructed using mean value loose their correlation to the behavior of the 3-bit predictor. We separately construct frequency distributions for each of these sample points and select the modal value as the candidate template point. Templates constructed taking the highest frequency points capture the essence of the distribution accurately. Modal values of branch misses 6 5 4 3 2 1 0 100 200 300 400 500 600 700 800 900 1000 Decimal equivalent of input binary sequences Figure: Variation of sample points of branch miss templates March 19, 2018 Branch Prediction Attacks 43 / 54

Template for LSB Experimental validation The most noisy sample point is the first one, which is supposed to be affected only by the Least Significant bits. This sample gets affected by the branch misses from instruction before the scalar multiplication starts to execute We have performed a frequency analysis on the first branch miss samples observed over a set of random binary sequences of input depending on the actual values of the LSB and the bit following LSB. March 19, 2018 Branch Prediction Attacks 44 / 54

Experimental validation Retrieving the Least Significant Bit for Scalar Splitting Frequency of Occurances 600 500 400 300 200 100 taken-taken ntaken-taken taken-ntaken ntaken-ntaken 0 0 1 2 3 4 5 6 7 8 Number of branch misses Figure: Confusion in determining the LSB for scalar splitting For each of the N sample pairs, we select the pairs where neither of the sample points exhibits a value 3 or 4. If a sample point exhibits value < 2, we classify both the branches as not-taken ie, the bits to be 11. If a sample point exhibits value 2, we conclude that the branches are either both not-taken or not-taken followed by a taken branch. Thus the bit values are 11 or 01. If a sample point exhibits value 5, we conclude that the branches are either both taken or taken followed by a not-taken branch. Thus the bit values are 00 or 10. If a sample point exhibits value > 5, we classify both the branches as taken ie, the bits to be 00. March 19, 2018 Branch Prediction Attacks 45 / 54

Experimental validation Iterative Template matching for scalar splitting 17000 16500 correct-bit = 1 correct-bit = 0 8000 7800 correct-bit = 1 correct-bit = 0 Frequency of Occurances 16000 15500 15000 14500 14000 13500 Frequency of Occurances 7600 7400 7200 7000 6800 6600 13000 0.5 1 1.5 2 2.5 3 3.5 Bit positions from LSB 6400 0.5 1 1.5 2 2.5 3 3.5 Bit positions from LSB Figure: Determining next three bits Figure: Determining further three bits Determining 3 bits at a time for scalar splitting March 19, 2018 Branch Prediction Attacks 46 / 54

Experimental validation Efficiency of Deduce and Remove strategy on Scalar Blinding Number of Candidate samples 55000 50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 Total number of candidates Removed candidates Number of conclusive candidates 0 1 2 3 4 5 6 Instances Figure: Efficiency of Deduce and Remove step on scalar blinding Bars with red represent the total number of candidates which were matched after template matching. Bar with blue indicates the total number of candidate values that were pruned. Bar in black represents the number of correctly retrieved candidates after taking intersection. The number of correctly retrieved ones are higher than 93%. Knowing 3 bits we update the state of the predictor and perform template matching on the next t bits to retrieve the following 3 bits. March 19, 2018 Branch Prediction Attacks 47 / 54

Experimental validation Revealing Secret of Secret Splitting and Randomization Building template for data-dependent branches in Curve1174 requires two sets of branch misprediction traces: the simulated mispredicted traces from 3-bit predictor for each modular reduction operations involved in a point addition operation, the perf samples corresponding to the same set of inputs. The i th bit of the blinded scalar is guessed and both 0 and 1 is appended to the already known (i 1) bits for each of the j blinded scalar sequences. bm_perf (i=0),j, bm_perf (i=1),j as branch misprediction samples for only the guessed bit, where the operation is performed over all j sequences and known (i 1) bits for two separate guesses. Thus we have 2 j sequences and we separately take the ioctl branching samples where the i th bit is guessed to be 0 and 1 respectively. March 19, 2018 Branch Prediction Attacks 48 / 54

Experimental validation Similarly, we have the simulated branch misses from the 3-bit predictor bm_sim 0 i,j and bm_sim1 i,j for both guesses as discussed earlier. The steps to build a template are as follows: For each k instances of the modular reduction operation, we apply a windowing technique to bm_perf (i=0),j and bm_perf (i=1),j to identify approximately how many samples are responsible for each modular reduction operation. Now for guess = 0, we consider bm_perf (i=0),j and bm_sim 0 i,j, and separately build template points based on whether or not they suffer from a branch miss. For guess = 0, we separately consider templates from the samples in bm_perf (i=0),j for each of the k modular reduction operations involved for point addition for each of the j sequences. We construct two bins for each of the k modular reduction operations based on whether they have a simulated branch miss as listed in bm_sim 0 i,j. March 19, 2018 Branch Prediction Attacks 49 / 54

Experimental validation Let us consider a particular modular reduction in k, we perform a vertical analysis of all sequences in j. We separate the sequences into two bins based on whether they have a simulated branch miss at particular modular reduction step in bm_sim 0 i,j. Now we fill the bin_miss (i=0),k with samples from bm_perf (i=0),j for the k th particular modular reduction, if there has been a simulated branch miss in bm_sim 0 i,j. Otherwise, we fill bin_no_miss (i=0),k if there is no misprediction.we separately construct templates taking the mode of the distributions of each of these constructed bins as described in the previous discussions of template building. At the end of this step, we have 2 k separate bins for all the k modular reduction operations considering all the j sequences where all the i th bit has been guessed to zero. A similar construction can be performed with i th bit being guessed as 1. March 19, 2018 Branch Prediction Attacks 50 / 54

Experimental validation March 19, 2018 Branch Prediction Attacks 51 / 54

Countermeasures Experimental validation 1 The most obvious countermeasure is to avoid conditional if-else implementaton. 2 Another countermeasure to thwart such attacks is to randomize the state of the predictor intermediate to the execution. 3 But this does not ensure complete security, since the adversary can be more powerful having more granular traces and even then these countermeasures will pose themselves ineffective. March 19, 2018 Branch Prediction Attacks 52 / 54

Summary Summary We initially perform reverse engineering on the branch predictor hardware of Intel s Broadwell and Sandybridge systems and show that the hardware has a significant similarity in behavior to the 3-bit predictor algorithm. Subsequently, we use this granular observation of branch misprediction to attack the DPA secure implementations of ECC. The experimental results illustrate the effectiveness and efficiency of our proposed attack for both scalar splitting and scalar blinding countermeasures. March 19, 2018 Branch Prediction Attacks 53 / 54

Summary Thank You March 19, 2018 Branch Prediction Attacks 54 / 54