PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING

Similar documents
Part of Speech Tagging: Viterbi, Forward, Backward, Forward- Backward, Baum-Welch. COMP-599 Oct 1, 2015

Lecture 12: Algorithms for HMMs

Natural Language Processing Prof. Pushpak Bhattacharyya Department of Computer Science & Engineering, Indian Institute of Technology, Bombay

Lecture 12: Algorithms for HMMs

6.047 / Computational Biology: Genomes, Networks, Evolution Fall 2008

Basic Text Analysis. Hidden Markov Models. Joakim Nivre. Uppsala University Department of Linguistics and Philology

Statistical Methods for NLP

1. Markov models. 1.1 Markov-chain

Performance Comparison of K-Means and Expectation Maximization with Gaussian Mixture Models for Clustering EE6540 Final Project

Midterm sample questions

Hidden Markov Models, I. Examples. Steven R. Dunbar. Toy Models. Standard Mathematical Models. Realistic Hidden Markov Models.

Natural Language Processing Prof. Pawan Goyal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Sequence labeling. Taking collective a set of interrelated instances x 1,, x T and jointly labeling them

Machine Learning for natural language processing

Pair Hidden Markov Models

CMSC 723: Computational Linguistics I Session #5 Hidden Markov Models. The ischool University of Maryland. Wednesday, September 30, 2009

Statistical NLP: Hidden Markov Models. Updated 12/15

Machine Learning & Data Mining Caltech CS/CNS/EE 155 Hidden Markov Models Last Updated: Feb 7th, 2017

Natural Language Processing. Statistical Inference: n-grams

CS Homework 3. October 15, 2009

Hidden Markov Models

N-gram Language Modeling

O 3 O 4 O 5. q 3. q 4. Transition

CSCI 5832 Natural Language Processing. Today 2/19. Statistical Sequence Classification. Lecture 9

An Introduction to Bioinformatics Algorithms Hidden Markov Models

Language Modelling: Smoothing and Model Complexity. COMP-599 Sept 14, 2016

Probabilistic Language Modeling

From perceptrons to word embeddings. Simon Šuster University of Groningen

10 : HMM and CRF. 1 Case Study: Supervised Part-of-Speech Tagging

P(t w) = arg maxp(t, w) (5.1) P(t,w) = P(t)P(w t). (5.2) The first term, P(t), can be described using a language model, for example, a bigram model:

Math 350: An exploration of HMMs through doodles.

The Noisy Channel Model and Markov Models

MACHINE LEARNING INTRODUCTION: STRING CLASSIFICATION

CS 136a Lecture 7 Speech Recognition Architecture: Training models with the Forward backward algorithm

Hidden Markov Models

Foundations of Natural Language Processing Lecture 5 More smoothing and the Noisy Channel Model

Empirical Methods in Natural Language Processing Lecture 10a More smoothing and the Noisy Channel Model

CS839: Probabilistic Graphical Models. Lecture 7: Learning Fully Observed BNs. Theo Rekatsinas

Hidden Markov Models

Hidden Markov Models in Language Processing

CSC401/2511 Spring CSC401/2511 Natural Language Computing Spring 2019 Lecture 5 Frank Rudzicz and Chloé Pou-Prom University of Toronto

Ngram Review. CS 136 Lecture 10 Language Modeling. Thanks to Dan Jurafsky for these slides. October13, 2017 Professor Meteer

Hidden Markov Models

Hidden Markov Models

STA 414/2104: Machine Learning

Collapsed Variational Bayesian Inference for Hidden Markov Models

Part A. P (w 1 )P (w 2 w 1 )P (w 3 w 1 w 2 ) P (w M w 1 w 2 w M 1 ) P (w 1 )P (w 2 w 1 )P (w 3 w 2 ) P (w M w M 1 )

Hidden Markov Models Part 1: Introduction

Brief Introduction of Machine Learning Techniques for Content Analysis

Human-Oriented Robotics. Temporal Reasoning. Kai Arras Social Robotics Lab, University of Freiburg

N-gram Language Modeling Tutorial

Conditional Random Fields

Data-Intensive Computing with MapReduce

A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models

N-grams. Motivation. Simple n-grams. Smoothing. Backoff. N-grams L545. Dept. of Linguistics, Indiana University Spring / 24

perplexity = 2 cross-entropy (5.1) cross-entropy = 1 N log 2 likelihood (5.2) likelihood = P(w 1 w N ) (5.3)

STA 4273H: Statistical Machine Learning

Foundations of Natural Language Processing Lecture 6 Spelling correction, edit distance, and EM

Hidden Markov Models: All the Glorious Gory Details

CS221 / Autumn 2017 / Liang & Ermon. Lecture 15: Bayesian networks III

Expectation Maximization (EM)

Sequences and Information

AN INTRODUCTION TO TOPIC MODELS

Augmented Statistical Models for Speech Recognition

A New OCR System Similar to ASR System

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

ANLP Lecture 22 Lexical Semantics with Dense Vectors

Design and Implementation of Speech Recognition Systems

Statistical Methods for NLP

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Sequence Modelling with Features: Linear-Chain Conditional Random Fields. COMP-599 Oct 6, 2015

Sean Escola. Center for Theoretical Neuroscience

Lecture 3: ASR: HMMs, Forward, Viterbi

Hidden Markov Models and Gaussian Mixture Models

Sequence Labeling: HMMs & Structured Perceptron

CS 446 Machine Learning Fall 2016 Nov 01, Bayesian Learning

TnT Part of Speech Tagger

ECE521 Lecture 7/8. Logistic Regression

The Expectation-Maximization Algorithm

Hidden Markov Models Part 2: Algorithms

p(d θ ) l(θ ) 1.2 x x x

Conditional probabilities and graphical models

Lecture 12: EM Algorithm

CSC321 Lecture 7 Neural language models

A fast and simple algorithm for training neural probabilistic language models

Machine Learning and Bayesian Inference. Unsupervised learning. Can we find regularity in data without the aid of labels?

Variations of Logistic Regression with Stochastic Gradient Descent

Shankar Shivappa University of California, San Diego April 26, CSE 254 Seminar in learning algorithms

ACS Introduction to NLP Lecture 2: Part of Speech (POS) Tagging

Statistical Methods for NLP

1 Probabilities. 1.1 Basics 1 PROBABILITIES

Training the linear classifier

CS838-1 Advanced NLP: Hidden Markov Models

Recap: Language models. Foundations of Natural Language Processing Lecture 4 Language Models: Evaluation and Smoothing. Two types of evaluation in NLP

Lecture 21: Spectral Learning for Graphical Models

Linear Classifiers and the Perceptron

NLP: N-Grams. Dan Garrette December 27, Predictive text (text messaging clients, search engines, etc)

Algorithm-Independent Learning Issues

A Higher-Order Interactive Hidden Markov Model and Its Applications Wai-Ki Ching Department of Mathematics The University of Hong Kong

Maximum-Likelihood fitting

Transcription:

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING JAY DESTORIES MENTOR: ELIF YAMANGIL Abstract Many systems use passwords as the primary means of authentication. As the length of a password grows, the search space of possible passwords grows exponentially. Despite this, people often fail to create unpredictable passwords. This paper will explore the problem of creating a probabilistic model for describing the distribution of passwords among the set of strings. This will help us gain insight into the relative strength of passwords as well as alternatives to existing methods of password candidate generation for password recovery tools like John the Ripper. This paper will consider methods from the field of natural language processing and evaluate their efficacy in modeling human-generated passwords. 1

2PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING Contents 1. Introduction 3 2. To the Community 3 3. Probability Modeling 3 3.1. N-Gram Language Model 3 3.2. Hidden Markov Model 5 4. Generating Passwords 6 4.1. Baseline 6 4.2. Bigram Language Model 6 4.3. Hidden Markov Model 6 4.4. Results 6 5. Applications 7 5.1. Password Strength 7 5.2. Password Recovery 8 6. Future Work 8 7. Conclusion 8 References 10 8. Appendix 11 8.1. Language Model Evaluation 11 8.2. Bigram Language Model Implementation 14 8.3. Hidden Markov Model Implementation 16

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING3 1. Introduction Passwords are commonly used by the general public for authentication with computer services like social networks and online banking. The purpose of a password, and authentication in general in the scope of computer systems, is to confirm an identity. For instance, when using an online banking service you provide a password so that you can perform actions on an account, but others cannot. This sense of security relies on a password being something that you can remember, but others cannot discover. The security of a system protected by a password is dependent on that password being difficult to guess. This is helped by the fact that an increase in the length of a password exponentially increases the number of possible passwords. Despite this, people often create unoriginal and easy to guess passwords. In a 2014 leak of approximately 5 million gmail account passwords, 47779 users had used the password 123456 1. In fact, there are so many instances of different people using the same passwords that one of the dominant ways to guess passwords in password recovery tools like John the Ripper is to simply guess every password in a list of leaked passwords. It is apparent that there is some underlying thought process in coming up with a password that leads to people creating similar or even identical passwords. This paper will attempt to find that structure in the form of a probabilistic model. It will consider some techniques from the field of natural language processing for inferring the structure of the language. If there is a good way to model the language of passwords, it could be used to generate new password guesses that don t already exist in a wordlist, improving the performance of password recovery tools when wordlists are exhausted. It could also evaluate the strength of a password in terms of the expected number of guesses to find it given a model. 2. To the Community This paper deals mainly with the technical aspects of guessing passwords. The purpose of this paper is to explore the weaknesses of using passwords for authentication. This is an interesting problem because good passwords by definition do not fit well into a predictive model (a good password is hard to guess). We will consider ways of guessing which passwords a person might use and of evaluating the strength of a password, with the intention not only of demonstrating these techniques, but also of encouraging the use of stronger passwords and additional security measures like two-factor authentication. 3. Probability Modeling 3.1. N-Gram Language Model. N-grams make sequence modeling practical by making a Markov assumption: the probability of an element given the previous elements is approximately equal to the probability of that element given the previous n elements. For example, an n-gram of order 2 (bigram) assumes that P (e j e 1,..., e j 1 ) P (e j e j 1 ). Then, by the chain rule the probability of a sequence P (e 1,..., e n ) = P (e 1 )P (e 2 e 1 )P (e 3 e 1, e 2 )...P (e n e 1,..., e n 1 ) P (e 1 )P (e 2 e 1 )P (e 3 e 2 )...P (e n e n 1 ) 1 Password corpora referenced are available at: http://tinyurl.com/pwdcorpora

4PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING Using transition probabilities like P (e 2 e 1 ), P (e 3 e 2 ) rather that considering all previous elements in the sequence makes it easier to extrapolate to other arbitrary sequences. Consider the password 123456789 in the training corpus for a bigram model describing passwords. Using a bigram captures the idea that numeric character sequences are likely to appear in numeric order and would assign high probabilities to sequences like 123. A bigram model consists of the matrix of transition probabilities from all tokens to all other tokens. To train a bigram model on the training corpus, we want to find the maximum likelihood estimate (MLE) the matrix which maximizes the probability of the training corpus. θ kj = P (v j v k ) : 0 θ kj 1, Σ j θkj = 1 k Then the maximum likelihood estimation for θ kj, ˆθ kj is: ˆθ kj = argmax{p (c 1,..., c n )} Log is monotonic, so we can we can rewrite this as: Then by our bigram assumption: ˆθ kj = argmax{logp (c 1,..., c n )} ˆθ kj = argmax{logπ i P (c 1 c i 1 )} = argmax{logπ i Π kj θ 1(ci 1=v k c i=v j) kj } = argmax{logπ kj θ Σi1(ci 1=v k c i=v j) kj } = argmax{logπ kj θ n kj kj } = argmax{σ kj n kj logθ kj } Apply Lagrange multiplier, differentiate and set equal to zero to find the maximum subject to our constraints: θ kj [Σ k Σ j n kj logθ kj Σ k λ k (Σ j θ kj 1)] = 0 n kj ˆθ kj λ k = 0 n kj ˆθ kj = λ k ˆθ kj = n kj λ k n kj ˆθ kj = Σ j n kj So to get the MLE estimate ˆθ kj, add up all transitions from k to j and divide that by the number of transitions from k to anything. For passwords, consider all transitions in every sequence in the training corpus, including tokens for the start and end of each password. Count the number of times each character appears at the start of a transition, and the number of times each transition appears. The probability of bigram P (B A) is equal to the number of transitions from A to B divided by the number of transitions starting at A. See the appendix for implementation-specific details.

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING5 3.2. Hidden Markov Model. Hidden Markov models (HMMs) are similar to n-grams in that they model sequences based on the Markov assumption that we can approximate the likelihood of future events by considering only recent ones. In modeling sequences with n-grams, we considered probabilities of transitions between observable events. In the case of password modeling, those events are the observable characters of a password. In HMMs, we instead consider an unobservable sequence of events which influence the observable events. In the context of natural language processing, one example of applications of Hidden Markov models is in part-ofspeech tagging. In that case, the hidden sequence is a sequence of parts of speech, each of which relates to a word. An HMM defines the probability of n-grams of the hidden sequence, as well as the probability of an observation at any given hidden state. To train an HMM for a problem like part-of-speech tagging, humans would manually map parts of speech to words in a training corpus, then use computers to count the number of each n-gram C s1,...,s n,b, the compute probabilities of each n-gram P (s 1,..., s n, b) = C s 1,...,s n,b Σ a C s1,...,s n,a Computing the emission probabilities (probability of an observed event given a hidden state), we would just count up the number of times each observation appears for a particular hidden state, then divide that by the number of times the hidden state appears. This is a supervised training approach. However, using Hidden Markov models to describe passwords is less intuitive. There is no established way to tag the characters in a password, so there are no labeled training corpora. This means that short of coming up with a system for tagging characters and manually tagging the dataset for this purpose, the only option to train an HMM for password modeling is unsupervised training. This begs the question: why should we use Hidden Markov models for this task? Despite no established tagging system, some patterns come to mind when looking at password lists. Some people create passwords based on words or other common sequences with certain characters substituted. A common example of this is interchangably using e, E, and 3, a, A, and 4, l, L, and 1, etc. An HMM could model this relationship well by grouping characters which are commonly substituted for one another into common hidden states, then modeling their relative popularity as an emission probability from that hidden state. Another common pattern in passwords is the inclusion of a birth year in the password. Consider passwords that follow the pattern word/common phrase + birth year. An HMM may be able to tag the digits in a birth year as individual hidden states. This could let it capture aspects of the data like the fact that numerically higher digits occur more frequently in the tens place for a birth year. This being said, we don t know if the HMM will be able to learn these types of specific details. We will train an HMM on passwords using the Baum Welch algorithm. This approach will use a fixed number of hidden states, randomly initialize our transition and emission probabilities, then use Expectation Maximization to approach a local maximum for the likelihood of the training corpus. Using NLTK s implementation of an HMM will simplify our code for the purposes of this paper. For practical applications, it may be useful to reimplement and optimize our models.

6PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING 4. Generating Passwords For each model, we will consider a way of sampling passwords one at a time. We can then evaluate the performance of a model based on how many unique passwords which exist in the test corpus it generates given some fixed number of attempts. For the training corpus, we will use a list of 14,344,391 passwords exposed in a Rockyou leak. For the test corpus, we will use all but the last 5,000 passwords from a list of 3,135,920 passwords exposed in a Gmail leak. We will keep the last 5,000 passwords from the Gmail leak as a held-out corpus 2. 4.1. Baseline. The baseline is based on the default wordlist attack in John the Ripper, a popular password recovery tool. It will yield every password in the training corpus, in order. Once it has exhausted the training corpus, it will incrementally try all passwords given a vocabulary of characters. 4.2. Bigram Language Model. To generate a password from the bigram model, we will set our current character to the start token, then repeatedly do a weighted random selection of a next character given the transition probabilities until transitioning to the end token. 4.3. Hidden Markov Model. Random sampling from an HMM is similar to sampling from an n-gram, except instead of sampling transitions between observations, we sample transitions between hidden states, then for each step also sample an emission for that hidden state. For the purposes of this paper, we can use NLTK s implementation of an HMM, which provides a method to perform random sampling. 4.4. Results. Model Guesses Training Passwords Used Correct Guesses Baseline 100,000 10 4,612 Baseline 100,000 100 6,278 Baseline 100,000 1,000 6,011 Baseline 100,000 14,344,391 81,494 Bigram LM 100,000 10 45 Bigram LM 100,000 100 2,540 Bigram LM 100,000 1,000 4,215 Bigram LM 100,000 14,344,391 3,985 HMM (10 hidden states) 100,000 10 315 HMM (10 hidden states) 100,000 100 1,531 HMM (10 hidden states) 100,000 1,000 1,760 HMM (50 hidden states) 100,000 10 99 HMM (50 hidden states) 100,000 100 1,795 HMM (50 hidden states) 100,000 1,000 2,798 HMM (100 hidden states) 100,000 10 80 HMM (100 hidden states) 100,000 100 1,430 HMM (100 hidden states) 100,000 1,000 2,913 2 Password corpora referenced are available at: http://tinyurl.com/pwdcorpora

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING7 This paper will not consider the efficacy of HMMs trained with more than 1000 passwords, since training time gets too long. This is worth considering as a drawback to using unsupervised learning for password predictor models. Since the baseline is based on the behavior of a popular password recovery tool, it is unsurprising that it outperforms the language models. The results show that the best precision in a language model was from the bigram model trained on the first 1000 passwords from the training corpus. This doesn t necessarily mean that the HMM shouldn t be used. Since the bigram model and the HMM model the language differently, it is possible that each may be likely to guess passwords that the other could not. Most encouraging is that all of the models trained on 1000 or fewer inputs were able to guess more passwords than they were given in 100000 tries. This demonstrates that they have the capability to extrapolate from the input, making them potentially useful additions to the baseline for password recovery. Since the strength of a password is based on how difficult it is to guess, this means that these models are useful for evaluating password strength. If we can expect them to guess a given password more quickly than the baseline, then that password is weaker than we may have otherwise thought. 5. Applications 5.1. Password Strength. If we conclude that these models are good at guessing passwords, then we could estimate the strength of a password by the expected number of guesses the models would need to make to guess it. We care most about the worst case scenario, so we will say that strength relates to the minimum expected number of guesses from a model. Given a password password and a model M which evaluates P (password M) = p, define the Bernoulli random variable C as 1 if we guess password and 0 if we do not guess password in some trial. The expectation E(C) for a Bernoulli random variable is equal to its probability, p. Therefore the expected number of guesses to find password is n, satisfying: For E(C) = E(C 1 ) =... = E(C n ) 1 = E(Σ n k=1c k ) E(Σ n k=1c k ) = Σ n k=1e(c k ) = ne(c k ) = 1 = n = 1 E(C k ) = 1 p But p is, in many cases, small enough to cause floating point imprecision, so we deal with log(p) in implementation. n = 1 exp(log(p)) = exp( log(p)) For the baseline, the order of passwords guesses is known so the guess number for any particular password is known. See the appendix for a password strength estimate implementation.

8PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING 5.2. Password Recovery. Password recovery tools function by guessing passwords, one at a time, until one is correct. In most cases, this means taking a hashed password and hashing password guesses until the hashes match. The baseline used in this paper is based on the wordlist attack behavior for John the Ripper, a popular password recovery tool. The models discussed in this paper for guessing passwords could easily be applied in password recovery tools. For recovering passwords based on a hash, we would just need to implement the hashing algorithm, then hash each guess and compare it against the original password hash. These models would be useful as additional techniques for guessing passwords if they are likely to guess some probable passwords that existing tools are unlikely to guess. They are therefore potentially useful primarily for guessing passwords which do not occur in wordlists. When given training sets of 10, 100, and 1000 passwords, both the bigram languge model and the HMM are able to extrapolate, correctly guessing many more unique passwords than they were given. This is encouraging. However, in all cases the baseline outperforms these models. This does not mean that the models are not useful, just that they should not replace the baseline technique, but rather be used in parallel. 6. Future Work Further work in modeling passwords could involve both improving the models described in this paper as well as building new ones. It would be interesting to try modeling passwords with trigrams, four-grams, etc., or with a context-free grammar. An obvious weakness of these models in generating passwords is that there is nothing stopping them from generating the same passwords repeatedly. If we could prune the set of possible generated passwords, it could improve performance by preventing this type of redundancy. If we store every generated password and exclude it from being generated again, the bigram language model s correct guesses goes up from 3,985 to 4,771 when trained on the full training corpus. This exact approach isn t practical because it requires storing every generated password in memory, but it demonstrates the potential improvement with some sort of pruning. Another apparent weakness in this paper is the lack of sufficient computational resources to properly test all models. The forward backward algorithm used to train the HMM takes too long on consumer grade hardware to test it with large training sets. However, many people or organizations who would be interested in guessing passwords have access to much more powerful hardware, and training could be further sped up with a better optimized implementation, for example by taking advantage of graphics processors to parallelize the task. Therefore, it would be interesting to see how the HMM performs with much larger training sets. This paper considered the precision of models at guessing passwords. It would also be interesting to know the recall what portion of the test corpus we are able to guess. This is harder to test and is less relevant when computational resources are scarce. This is similar to the problem of estimating password strength, in that it considers the likelihood of a password being guessed. 7. Conclusion We ve shown that using language models to guess passwords is effective enough to encourage further exploration. Although none could beat the precision of traditional password recovery techniques, all models were able to extrapolate based on

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING9 an input set of passwords to a set containing a larger number of correct passwords. This paper outlines one method for evaluating the strength of a password. That being said, it provides no guarrantees for the strength of a password. It is most effective at recognizing passwords that are weak. The important take-away from this paper is how effectively we can guess passwords, even with just traditional methods. Given how much we rely on passwords for critical security, it is important to pick passwords which are harder to guess, but also to keep in mind that regardless of how long or seemingly complex some passwords might be, they may still be vulnerable. Therefore it is important not to rely solely on passwords for security. Other security measures like two-factor authentication improve our chances at keeping systems secure.

10 PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING References [1] Bolch, G.; Greiner, S.; Meer, H.; Trivedi, K. (2006) Queueing Networks and Markov Chains John Wiley & Sons, Inc. [2] Ghahramani, Z. (2001) An Introduction to Hidden Markov Models and Bayesian Networks. International Journal of Pattern Recognition and Artificial Intelligence. [3] Rabiner, L.R., & Juang, B.H. (1986) An Introduction to Hidden Markov Models. IEEE ASSP Magazine [4] Rosenfeld, Ronald. Two Decades of Statistical Language Modeling: Where do we go from here? www.cs.cmu.edu/ roni/papers/survey-slm-ieee-proc-0004.pdf [5] Seneta, E. (1981) Non-negative matrices and Markov chains. 2nd rev. ed. Springer Series in Statistics

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING11 8. Appendix 8.1. Language Model Evaluation. 1 2 Test the p e r f o r m a n c e o f v a r i o u s password models 3 4 5 import l m g e n e r a t o r 6 import hmmgenerator 7 8 # b a s i c w o r d l i s t a t t a c k 9 d e f b a s e l i n e G e n e r a t o r ( t r a i n i n g c o r p u s ) : 10 f o r pwd i n t r a i n i n g c o r p u s : 11 y i e l d pwd 12 v o c a b u l a r y = s e t ( ) 13 f o r pwd i n t r a i n i n g c o r p u s : 14 f o r c i n pwd : 15 v o c a b u l a r y. update ( [ c ] ) 16 v o c a b u l a r y = l i s t ( v o c a b u l a r y ) 17 d e f b r u t e F o r c e ( v o c a b u l a r y, l e n g t h ) : 18 i f l e n g t h == 1 : 19 f o r c i n v o c a b u l a r y : 20 y i e l d c 21 e l s e : 22 f o r c i n v o c a b u l a r y : 23 f o r password end i n b r u t e F o r c e ( v o c a b u l a r y, l e n g t h 1 ) : 24 y i e l d c + password end 25 l e n g t h = 1 26 w h i l e True : 27 f o r pwd i n b r u t e F o r c e ( v o c a b u l a r y, l e n g t h ) : 28 y i e l d pwd 29 l e n g t h += 1 30 31 d e f p a s s w o r d S t r e n g t h ( password, t r a i n i n g c o r p u s, bigramlm, hmm) : 32 e x p e c t a t i o n s = [ ] 33 # b a s e l i n e 34 b a s e l i n e g u e s s = 0 35 i f password i n t r a i n i n g c o r p u s : 36 b a s e l i n e g u e s s = t r a i n i n g c o r p u s. i n d e x ( password ) + 1 37 e l s e : 38 v o c a b u l a r y = s e t ( ) 39 f o r pwd i n t r a i n i n g c o r p u s : 40 f o r c i n pwd : 41 v o c a b u l a r y. update ( [ c ] ) 42 v o c a b u l a r y = l i s t ( v o c a b u l a r y ) 43 g u e s s a b l e = True 44 f o r c i n password : 45 i f c not i n v o c a b u l a r y : 46 g u e s s a b l e = F a l s e 47 i f not g u e s s a b l e : 48 b a s e l i n e g u e s s = f l o a t ( i n f ) 49 e l s e : 50 b a s e l i n e g u e s s = l e n ( t r a i n i n g c o r p u s )

12 PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING 51 f o r pwd len i n x range ( 1, l e n ( password ) ) : 52 b a s e l i n e g u e s s += pow ( l e n ( v o c a b u l a r y ), pwd len ) 53 d e f h e l p e r ( pwd ) : 54 i f l e n ( pwd ) == 1 : 55 r e t u r n v o c a b u l a r y. i n d e x ( pwd [ 0 ] ) + 1 56 r e t u r n pow ( l e n ( v o c a b u l a r y ), l e n ( pwd ) 1) v o c a b u l a r y. i n d e x ( pwd [ 0 ] ) + \ 57 h e l p e r ( pwd [ 1 : ] ) 58 b a s e l i n e g u e s s += h e l p e r ( password ) 59 e x p e c t a t i o n s. append ( b a s e l i n e g u e s s ) 60 61 i f bigramlm : 62 e x p e c t a t i o n s. append ( bigramlm. E x p e c t e d G u e s s e s ( password ) ) 63 i f hmm: 64 e x p e c t a t i o n s. append (hmm. E x p e c t e d G u e s s e s ( password ) ) 65 r e t u r n min ( e x p e c t a t i o n s ) 66 67 # See how many t h i n g s i n t e s t c o r p u s the g e n e r a t o r can g u e s s with some number o f 68 # t r i e s 69 d e f t e s t G e n e r a t o r ( gen, t e s t c o r p u s, t r i e s ) : 70 found = 0 71 t e s t s e t = s e t ( t e s t c o r p u s ) 72 g u e s s e s = s e t ( ) 73 f o r i i n x r a n g e ( t r i e s ) : 74 g u e s s = gen. n e x t ( ) 75 i f not g u e s s i n g u e s s e s : 76 g u e s s e s. update ( [ g u e s s ] ) 77 i f g u e s s i n t e s t s e t : 78 found += 1 79 r e t u r n found 80 81 d e f t e s t C o r p o r a ( t r a i n i n g c o r p u s, t e s t c o r p u s ) : 82 p r i n t F i r s t 5 t r a i n i n g passwords :, t r a i n i n g c o r p u s [ : 5 ] 83 p r i n t F i r s t 5 t e s t passwords :, t e s t c o r p u s [ : 5 ] 84 85 t r i e s = 100000 86 b a s e l i n e = t e s t G e n e r a t o r ( b a s e l i n e G e n e r a t o r ( t r a i n i n g c o r p u s ), t e s t c o r p u s, t r i e s ) 87 p r i n t B a s e l i n e w o r d l i s t a t t a c k %d t r i e s : %d. % ( t r i e s, b a s e l i n e ) 88 bigramlm = l m g e n e r a t o r. BigramLM ( t r a i n i n g c o r p u s ) 89 bigramlmgen = bigramlm. G e n e r a t o r ( ) 90 b i g r a m l m r e s u l t s = t e s t G e n e r a t o r ( bigramlmgen, t e s t c o r p u s, t r i e s ) 91 p r i n t Bigram LM a t t a c k %d t r i e s : %d. % ( t r i e s, b i g r a m l m r e s u l t s ) 92 hmmlm = hmmgenerator.hmmlm( t r a i n i n g c o r p u s, 100) 93 hmmgen = hmmlm. G e n e r a t o r ( ) 94 hmmresults = t e s t G e n e r a t o r (hmmgen, t e s t c o r p u s, t r i e s ) 95 p r i n t HMM a t t a c k %d t r i e s : %d. % ( t r i e s, hmmresults ) 96 p r i n t Password s t r e n g t h t e s t : 97 p r i n t 123456:, p a s s w o r d S t r e n g t h ( 123456, t r a i n i n g c o r p u s, bigramlm, hmmlm) 98 p r i n t 123456789123:, p a s s w o r d S t r e n g t h ( 123456789123, t r a i n i n g c o r p u s, bigramlm, h 99 p r i n t mingchow :, p a s s w o r d S t r e n g t h ( mingchow, t r a i n i n g c o r p u s, bigramlm, hmmlm) 100 p r i n t 0a0I8DV :, p a s s w o r d S t r e n g t h ( 0a0I8DV, t r a i n i n g c o r p u s, bigramlm, hmmlm) 101 p r i n t AtiK0nAOLP3y :, p a s s w o r d S t r e n g t h ( AtiK0nAOLP3y, t r a i n i n g c o r p u s, bigramlm, h 102 p r i n t c o r r e c t h o r s e b a t t e r y s t a p l e :, p a s s w o r d S t r e n g t h ( c o r r e c t h o r s e b a t t e r y s t a p l e, t r 103

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING13 104 105 106 d e f main ( ) : 107 p r i n t ################################################################ 108 p r i n t T r a i n i n g c o r p u s : rockyou 109 p r i n t Test c o r p u s : g m a i l 110 p r i n t ################################################################ 111 r o c k y o u n o c o u n t = open ( c o r p o r a / r o c k y o u n o c o u n t, r ) 112 t r a i n i n g c o r p u s = [ pwd. r s t r i p ( ) f o r pwd i n r o c k y o u n o c o u n t ] [ : 1 0 ] 113 g m a i l n o c o u n t = open ( c o r p o r a / g m a i l n o c o u n t, r ) 114 g m a i l c o r p u s = [ pwd. r s t r i p ( ) f o r pwd i n g m a i l n o c o u n t ] 115 t e s t c o r p u s = g m a i l c o r p u s [: 5000] 116 h e l d o u t c o r p u s = g m a i l c o r p u s [ 5000:] 117 t e s t C o r p o r a ( t r a i n i n g c o r p u s, t e s t c o r p u s ) 118 119 120 i f n a m e == m a i n : 121 main ( )

14 PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING 8.2. Bigram Language Model Implementation. 1 from math i mport log, exp, i s n a n 2 import random 3 from d e c i m a l i mport 4 5 s t a r t t o k e n = <S> 6 e n d t o k e n = </S> 7 10 8 d e f P r e p r o c e s s ( c o r p u s ) : 9 r e t u r n [ [ s t a r t t o k e n ] + [ token f o r token i n pwd ] + [ e n d t o k e n ] f o r pwd i n c o r p u s ] 11 c l a s s BigramLM : 12 d e f i n i t ( s e l f, t r a i n i n g c o r p u s ) : 13 s e l f. b i g r a m c o u n t s = {} 14 s e l f. u n i g r a m c o u n t s = {} 15 s e l f. T r a i n ( t r a i n i n g c o r p u s ) 16 17 d e f T r a i n ( s e l f, t r a i n i n g c o r p u s ) : 18 t r a i n i n g s e t = P r e p r o c e s s ( t r a i n i n g c o r p u s ) 19 f o r pwd i n t r a i n i n g s e t : 20 f o r i i n x r ange ( l e n ( pwd ) 1 ) : 21 token = pwd [ i ] 22 n e x t t o k e n = pwd [ i + 1 ] 23 i f not token i n s e l f. u n i g r a m c o u n t s : 24 s e l f. u n i g r a m c o u n t s [ token ] = 0 25 i f not token i n s e l f. b i g r a m c o u n t s : 26 s e l f. b i g r a m c o u n t s [ token ] = {} 27 i f not n e x t t o k e n i n s e l f. b i g r a m c o u n t s [ token ] : 28 s e l f. b i g r a m c o u n t s [ token ] [ n e x t t o k e n ] = 0 29 s e l f. u n i g r a m c o u n t s [ token ] += 1 30 s e l f. b i g r a m c o u n t s [ token ] [ n e x t t o k e n ] += 1 31 32 d e f GenerateSample ( s e l f ) : 33 sample = [ s t a r t t o k e n ] 34 w h i l e not sample [ 1] == e n d t o k e n : 35 s e l e c t o r = random. u n i f o r m ( 0, s e l f. u n i g r a m c o u n t s [ sample [ 1 ] ] ) 36 sum bc = 0 37 f o r bigram i n s e l f. b i g r a m c o u n t s [ sample [ 1 ] ] : 38 sum bc += s e l f. b i g r a m c o u n t s [ sample [ 1 ] ] [ bigram ] 39 i f sum bc > s e l e c t o r : 40 sample. append ( bigram ) 41 b reak 42 r e t u r n. j o i n ( sample [ 1 : 1 ] ) 43 44 # g e t s the ( unsmoothed ) p r o b a b i l i t y o f a s t r i n g g i v e n the bigramlm 45 d e f S t r i n g L o g P r o b a b i l i t y ( s e l f, s t r i n g ) : 46 p r e p r o c e s s e d = P r e p r o c e s s ( [ s t r i n g ] ) [ 0 ] 47 l o g p r o b = 0 48 f o r i i n x r a n g e ( 1, l e n ( p r e p r o c e s s e d ) ) : 49 unigram = p r e p r o c e s s e d [ i 1 ] 50 bigram = p r e p r o c e s s e d [ i ] 51 i f unigram i n s e l f. u n i g r a m c o u n t s and unigram i n s e l f. b i g r a m c o u n t s and bigra 52 l o g p r o b += l o g ( s e l f. b i g r a m c o u n t s [ unigram ] [ bigram ] ) l o g ( s e l f. u n i g r a m c

PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING15 53 e l s e : 54 l o g p r o b = f l o a t ( i n f ) 55 r e t u r n l o g p r o b 56 57 d e f E x p e c t e d G u e s s e s ( s e l f, s t r i n g ) : 58 l o g p r o b = s e l f. S t r i n g L o g P r o b a b i l i t y ( s t r i n g ) 59 t r y : 60 e x p e c t a t i o n = Decimal( l o g p r o b ). exp ( ) 61 r e t u r n e x p e c t a t i o n i f not i s n a n ( e x p e c t a t i o n ) e l s e f l o a t ( i n f ) 62 e x c e p t : 63 r e t u r n f l o a t ( i n f ) 64 65 d e f G e n e r a t o r ( s e l f ) : 66 w h i l e True : 67 y i e l d s e l f. GenerateSample ( ) 68 69 d e f S i m p l e P r u n e d G e n e r a t o r ( s e l f ) : 70 t r i e s = s e t ( ) 71 w h i l e True : 72 pwd = s e l f. GenerateSample ( ) 73 i f not pwd i n t r i e s : 74 t r i e s. update ( [ pwd ] ) 75 y i e l d pwd

16 PROBABILISTIC PASSWORD MODELING: PREDICTING PASSWORDS WITH MACHINE LEARNING 8.3. Hidden Markov Model Implementation. 1 from math i mport log, exp, log1p, i s n a n 2 import random 3 from memoize i mport memoize 4 import n l t k 5 from d e c i m a l i mport 6 7 c l a s s HMMLM: 8 d e f i n i t ( s e l f, t r a i n i n g c o r p u s, n u m s t a t e s ) : 9 s e q u e n c e s = [ [ ( c, ) f o r c i n pwd ] f o r pwd i n t r a i n i n g c o r p u s ] 10 symbols = l i s t ( s e t ( [ c f o r pwd i n t r a i n i n g c o r p u s f o r c i n pwd ] ) ) 11 s t a t e s = range ( n u m s t a t e s ) 12 t r a i n e r = n l t k. tag.hmm. HiddenMarkovModelTrainer ( s t a t e s=s t a t e s, symbols=symbols ) 13 s e l f.hmm = t r a i n e r. t r a i n u n s u p e r v i s e d ( s e q u e n c e s ) 14 15 d e f Sample ( s e l f, r a n g e s t a r t, r a n g e e n d ) : 16 pwd = s e l f.hmm. random sample ( random. Random ( ), random. r a n d i n t ( r a n g e s t a r t, r a n g e 17 pwd =. j o i n ( [ e [ 0 ] f o r e i n pwd ] ) 18 r e t u r n pwd 19 20 d e f S t r i n g P r o b a b i l i t y ( s e l f, pwd ) : 21 r e t u r n s e l f.hmm. l o g p r o b a b i l i t y ( [ ( c, None ) f o r c i n pwd ] ) 22 23 d e f E x p e c t e d G u e s s e s ( s e l f, pwd ) : 24 l o g p r o b = s e l f. S t r i n g P r o b a b i l i t y ( pwd ) 25 t r y : 26 e x p e c t a t i o n = Decimal( l o g p r o b ). exp ( ) 27 r e t u r n e x p e c t a t i o n i f not i s n a n ( e x p e c t a t i o n ) e l s e f l o a t ( i n f ) 28 e x c e p t : 29 r e t u r n f l o a t ( i n f ) 30 31 d e f G e n e r a t o r ( s e l f ) : 32 w h i l e True : 33 pwd = s e l f.hmm. random sample ( random. Random ( ), random. r a n d i n t ( 4, 1 8 ) ) 34 pwd =. j o i n ( [ e [ 0 ] f o r e i n pwd ] ) 35 y i e l d pwd