MEASUREMENTS ASSOCIATED WITH LEARNING MORE SECURE COMPUTER CONFIGURATION PARAMETERS XIN ZHOU. A Thesis Submitted to the Graduate Faculty of

Size: px

Start display at page:

Download "MEASUREMENTS ASSOCIATED WITH LEARNING MORE SECURE COMPUTER CONFIGURATION PARAMETERS XIN ZHOU. A Thesis Submitted to the Graduate Faculty of"

Elijah Dickerson
6 years ago
Views:

1 MEASUREMENTS ASSOCIATED WITH LEARNING MORE SECURE COMPUTER CONFIGURATION PARAMETERS BY XIN ZHOU A Thesis Submitted to the Graduate Faculty of WAKE FOREST UNIVERSITY GRADUATE SCHOOL OF ARTS AND SCIENCES in Partial Fulfillment of the Requirements for the Degree of MASTER OF SCIENCE Computer Science May, 2015 Winston-Salem, North Carolina Approved By: David John, Ph.D., Advisor Errin Fulp, Ph.D., Chair Don Gage, Ph.D. William Turkett, Ph.D.

2 Acknowledgments Firstly, I would like to thank my thesis committee, David John, Errin Fulp, Don Gage, and William Turkett, who have each provided helpful comments, and my advisor David John who has been a great and nice mentor who have prepared me to finish this thesis. Secondly, I would like to thank all current GAMT members and the former student Robert Smith who had done a lot of great work I can rely on to finish my thesis. This Material is based upon work supported by the National Science Foundation under Grant No ii

3 Table of Contents Acknowledgments List of Abbreviations List of Figures Abstract ii v vi vii Chapter 1 Introduction Computer Security Computer Parameters Configuration CVSS Score System The Moving Target Strategy The Genetic Algorithm Moving Target Strategy Selection Crossover Mutation Chromosome Diversity Analysis Resilience Approach Machine Learning Temporal Classifier Spatial Classifier Chapter 2 Implementation Details XML File for Configurations Attack Simulation Details Genetic Algorithmic Details Chapter 3 The Role of the Classifier in Learning Attacks Experimental Design Intensive Single Parameter Attack Repeated Attack Information Repeated Contiguous Attack Information Does the GA Matter? Multiple Parameters Attack Conclusion iii

4 Chapter 4 An Evolutionary Strategy for Resilient Cyber Defense Abstract Introduction Software Configurations An Evolutionary Strategy for Configuration Resiliency Mapping Configurations to Chromosomes Chromosome Feasibility and Fitness Algorithmic details Experimental Results Conclusions Next Steps Chapter 5 Future Work Bibliography Appendix A Code for Fitness Analysis A.1 Utility Functions Used by All Measurements A.2 Scripts for Each Measurement Appendix B Code for Attack Profile Analysis B.1 Utility Functions Used by Analyze Attack Profiles B.2 Utility Functions Used by Create Attack Profiles Vita iv

5 List of Abbreviations CVSS ES GA GAMT MT SGA SVM Common Vulnerability Scoring System Evolutionary Strategy Genetic Algorithm Moving Target Genetic Algorithm Moving Target Simple Genetic Algorithm Support Vector Machine v

6 List of Figures 1.1 Sample of CVSS Vector Illustration of Roulette Wheel Selection Roulette Wheel Selection Algorithm Tournament Selection Algorithm Crossover Algorithm range integer Parameter Mutation Algorithm Diversity Algorithm Illustration of Linear and Nonlinear Classification Illustration of Support Vectors Illustration of a SVM with Algebra Algorithm of History Tally for Each Parameter Classifier for bit vector Parameter Classifier for integer range Parameter Spatial Classifier: Multi-Tree Sample of Configurations File Results of Single Parameter Attack Results of Repeatedly Single Parameter Attack Results of Parameters Starts with Bad Fitness Results of Repeatedly Contiguous Single Parameter Attack I Results of Repeatedly Contiguous Single Parameter Attack II Results of Repeatedly Contiguous Single Parameter Attack When the GA Based on Fake Score Selection Algorithm Based on fakescore Simulation Results When the GA Based on Fake Score Results of Multiple Parameters Attack Results of Multiple Parameters Attack When the GA Based on Fake Score ROC Analysis I ROC Analysis II Flow Chart of Evolutionary-based Software Configuration Results of Resilience Experiments vi

7 Abstract Xin Zhou To defend against a cyber attack in which the attacker searches the network for vulnerable machines, most people will install some specific security software that cost some money, or download the newest patches for the softwares to avoid some vulnerabilities. Neither way can be efficient especially when attacks are always updated. Reconnaissance is the essential part of a cyber attack during which the attacker is to learn about vulnerabilities of the targeted machines, including credentials, software versions, and misconfigured settings. Moving target strategy then can be implemented specifically against this part of an attack. Theoretically, a change in configuration during the construction of an exploit will alter the computer such that the machine no longer contains the same vulnerabilities discovered during reconnaissance, thereby rendering the initial reconnaissance step ineffective. In this thesis, a genetic algorithm is implemented for the moving target strategy to find secure computer configurations over generations. A genetic algorithm is one version of an evolutionary algorithm which is inspired by the process of natural selection. Starting with the current generation, it creates further generations which consist of highly fit candidate solutions to adapt to the environment. In this research, the solutions are machine configurations, and the environment is always updated from the attacks they being faced. For each iteration of the genetic algorithm, a new population of chromosomes (configurations) is evolved from the current population through the processes of selection, crossover, and mutation, and each chromosome in the population is evaluated by a fitness function. Selection chooses single chromosomes with higher fitness for each iteration which is based on a fitness proportional selection known as roulette wheel selection. Crossover selects a high fit parent from the pool of configurations to exchange the genetic information with an existing parent, resulting in the child with high fitness for some crossover points. Mutation is applied to some genes (parameters) of the chromosomes to provide diversity. This thesis also introduces a novel approach, the machine learning strategies of support vector machines (SVMs) and decision tree (DT) classification, to enhance the genetic algorithm. Classification strategy deals with grouping data objects into one of several categories based on their similarity to known examples of each category. Normally, a training data set is provided to the classifier, consisting of a variety of objects with many characteristics and the label of the group to which each belongs. The classifier uses this training data to create a model for the feature space. The classifier is then provided with new unlabeled data points and it returns the label of the group to which each data point belongs based on the generated model. Classification vii

8 can enhance the genetic algorithm by singling out low fitness parameter combinations and removing them from the population. This will be achieved by correlating a parameter s setting changes to the chromosome s fitness changes, or by comparing attacked machines to machines which were not attacked. A classifier can then be trained on this data, and used to classify future settings or chromosomes as either secure or insecure. In this thesis, several different attack simulation experiments are conducted to evaluate the different strategies discussed above for moving target defense: genetic algorithm or genetic algorithm with Support Vector Machines and Decision Tree classifiers. The goal is to determine how genetic algorithms in general influence the learning behavior for the attacked parameters, and whether genetic algorithm with classifiers can be better and faster to make good configurations for the specific attacked parameters or not, and what level of good configurations can be achieved using genetic algorithm with classifiers or genetic algorithm itself. Furthermore, experiments are conducted to test if resilience exists to reflect the genetic algorithm s adaptability to new security threats, while not discarding learned security improvements. viii

9 Chapter 1: Introduction In a cyber attack, an attacker often searches the network for vulnerable machines. This search gains information about the computers, such as vulnerabilities created by poor parameter settings. The attacker uses this information to design an exploit. To better protect machines from the attackers, a Moving Target (MT) defense strategy can be applied to update machines configurations from time to time before the attacker has the opportunity to design and execute an exploit[11]. In the MT defense strategy analyzed in this research, a Genetic Algorithm (GA) is used to provide security through diversity by changing machines parameter configurations. This approach can proactively identify more functional and secure computer configurations and use them to evolve new configurations. A GA is a search and optimization heuristic inspired by the biological process of natural selection, accomplished via the processes of selection, crossover, and mutation. In the processes of selection and mutation, it is more likely to choose the parameters with better settings (reflected by higher fitness scores). The Moving Target Genetic Algorithm (GAMT) generally increases the fitness score of all parameter configurations. The GAMT does not focus on providing a good configuration level for specific parameters. However, if we can figure out which parameter settings allow for a successful attack, classifiers then can be applied to configure specific parameters quickly. In this research, we design and analyze some attack simulation experiments to investigate the role of classifiers with GAMT. Also in the research, we focus on understanding resilience of evolutionary strategies in cyber defense. 1

10 1.1 Computer Security Our daily life can t live without computers and the Internet since so much relies on them now. For example, we use and cell phones for communication; laptop, DVDs and Netflix for entertainment; GPS for car navigation; and we shop online using credit cards. We can t imagine how much private information is stored in our computers. Cyber security is an urgent issue, so it s crucial to develop a strategy to protect us by preventing, detecting, and responding to cyber attacks. In this thesis, the focus is computer security. In a cyber attack, the first step of exploit, reconnaissance, is vital to the attacker who will begin by searching the network for vulnerable machines. This search can focus on collecting information about the computers, such as what operating system is running, what system parameters have been configured, and which versions of software have been installed. Once completed, the attacker can use these pieces of information to design an exploit which will allow them to achieve their evil goal on the targeted machine. If the information it provides is inaccurate or modified before the commencement of an exploit, then there is a high probability that the attack will fail. For example, an attacker who finds a machine running a web server program with a bad parameter setting, and a security flaw associated with the poor values is known, then it s easy to prepare the exploit. However, if the target s parameters configuration is updated and improved before the attacker designs and executes the exploit, this specific exploit will fail. 1.2 Computer Parameters Configuration A computer configuration is a set of parameters governing how a machine is to be operated. A configuration consists of a set of parameters and their associated settings. 2

11 The configuration affects operating system and application performance, functionality, and security. It includes such operating system settings as file permissions or enabling address randomization as well as information found in application configuration files, such as httpd.conf for Apache web-servers. For this research, these parameters have different types: bit vector settings for file permissions describing the way those permissions are set, an integer range setting for parameters, or a list of possible option list from which parameter can select a value (for instance, if selecting +FollowSymLinks, the server will follow symbolic links in this directory). A configuration assigns a selected setting for each parameter, for instance, assigning a length 9 bit vector to each unix file s permissions, this setting means the owner of the file can read, write and execute the file, each owner of the group can read and execute the file, and others have no access to the file. From the attacker s point of view, knowledge of configuration parameters and poor parameter settings can be used to create an exploit. A configuration s security can be measured by the number and severity of the attacks to that machine. However, in reality, finding secure parameter settings is difficult and time-consuming. The best way to estimate the security of a configuration is configuring a machine with the specific setting and detecting if a successful attack will occur. Even though an attack occurs, it is not necessarily possible to determine which subset of parameters setting were responsible for this vulnerability. 1.3 CVSS Score System Testing how well the machine configuration improves security is time consuming. For example, if assuming that each operational period would cost at least one day for the machine to be attacked, it would take over months for generations of configurations to run on the system. To solve this, the framework known as the Common 3

12 Vulnerability Scoring System (CVSS), which has been developed for assessment of the vulnerabilities that a device may have, can be used [15]. CVSS vectors thus can be served as measurements of the threats imposed by some given exploits, and CVSS score for all the parameter settings can be assigned by expert 1 scoring of parameters based on known attacks which indicate a vulnerability for these parameters. CVSS contains two classes of metrics with three submetrics [15]: Exploitability Metrics 1. Access Vector (AV): describes how a vulnerability may be exploited, for example, local from the machine itself (L), adjacent network from the same LAN (A), or network from anywhere (N). 2. Access Complexity (AC): describes how easy or difficult it is to exploit the discovered vulnerability, for example, low (L), medium (M), or high (H). 3. Authentication (Au): describes the number of times that an attacker must authenticate to complete the exploit, for example, none (N), single (S), or multiple (M). Impact Metrics 1. Confidentiality (C): describes the impact on the confidentiality of data processed by the system. For example, there is no impact on the confidentiality of the system (N), there is considerable disclosure of information, but the scope of the loss is constrained such that only a partial of the data is available (P), or there is total information disclosure, providing access to any or all data on the system (C). 1 This expert score was provided by Dr. Errin Fulp, Scott Seal, and Bryan Prosser. 4

13 Figure 1.1: Sample of CVSS Vector 2. Integrity (I): describes the level of the comprised during the exploit. For example, there is no impact on the integrity of the system (N), modification of some data or system files is possible, but the scope of the modification is limited (P), or there is total loss of integrity; the attacker can modify any files or information on the target system (C). 3. Availability (A): describes the impact on the availability of the target system. For example, there is no impact on the availability of the system (N), there is reduced performance or loss of some functionality (P), or there is total loss of availability of the attacked resource (C). A CVSS vector consists of six fields corresponding to the information discussed above, and each can be assigned with one of the three values. Access, Complexity, and Authentication can measure the difficulty of the attack, and Confidentiality, Integrity, and Availability can measure the severity of the attack s consequence. A CVSS vector example is shown in Figure 1.1, it means the specific parameter setting corresponding to this CVSS vector is secure for the below reasons: firstly it s difficult to exploit a discovered vulnerability for it (AC:H) and even if possible, vulnerability can only be exploited from the machine itself (AV:L) and the attacker must authenticate multiple times to complete an exploit from it (Au:M); secondly, this parameter setting has no impact on the confidentiality (C:N), integrity (I:N) and availability (A:N) of the system. 5

14 1.4 The Moving Target Strategy A Moving Target (MT) strategy is to disrupt attacks by changing the current state of the defended item, such as a computer s configuration [11]. This can make the attacker s knowledge of the target obsolete before it can be used for vulnerability. Theoretically, a change in configuration during the construction of an exploit will alter the computer such that the machine no longer contains the same vulnerabilities discovered during reconnaissance, thereby rendering the initial reconnaissance step ineffective. 1.5 The Genetic Algorithm Moving Target Strategy Evolutionary Strategies (ES) are a class of algorithms inspired by the process of natural selection [18, 17]. They breed the current generation to create further generations which consist of highly fit candidate solutions to adapt to the environment. The solutions are machine configurations and the environment is always updated because of the attacks being faced. A Genetic Algorithm (GA) is one version of an ES. The Simple Genetic Algorithm (SGA) was originally developed by John Holland, and this algorithm represents each chromosome as a bit string [10]. Each chromosome is evaluated according to a given fitness function, and assigned a fitness score. The fitness function must be designed such that the fitness score of a chromosome or the population of chromosomes moves toward a goal that its performance improves. Once all of the chromosomes have been assigned a fitness score, a decision must be made as to which individuals will be permitted to produce offspring and with what probability the selection will be based on. In this research, a population of chromosomes is an arbitrary, constant number, 6

15 Figure 1.2: Configuration 6 with higher fitness is more likely to be selected. and each chromosome has a fixed length. A candidate solution is represented by a chromosome which consists of a list of various possible traits of the solution. To begin the algorithm [18, 25], an initial population of n chromosomes is randomly generated. During each iteration, the GA evolves a new generation with the same population size n from the current population through the processes of selection, crossover, and mutation. Selection is more likely to choose chromosomes with higher fitness from the current generation. This causes highly fit solutions to have higher chance of survival and multiplication in the future generations (Figure 1.2), however it still chooses chromosomes with lower fitness to provide diversity in the population. Crossover swaps some traits of two selected parent chromosomes to exploit a chromosome for the next generation. Mutation randomly changes a given trait, allowing the GA to introduce new trait settings which were not in the current population or to reintroduce ones which were eliminated in previous generations. Crossover and mutation use probabilities p c and p m to determine the extent of the application of crossover and mutation, respectively. These three processes are described in more details below Selection Roulette Wheel Selection For each generation, a new population of chromosomes is evolved from the current population through the processes of selection, crossover, and mutation, and each chromosome in the population is evaluated by a fitness function. Selection is used to select single chromosomes with higher fitness each time, and this process can 7

16 Figure 1.3: Roulette Wheel Selection Algorithm Figure 1.4: Tournament Selection Algorithm be defined as a fitness proportional selection which is also known as roulette wheel selection [18]. This selection implies that each chromosome in the current generation is chosen with probability proportional to its own fitness. In this algorithm, a range is defined from zero to the total population of chro- 8

17 mosomes fitness scores, and each chromosome is represented by an interval in the range with the length equal to its fitness. A number is randomly generated within the range, and the chromosome corresponding to the interval is selected. In the probability analysis, it indicates when a small number is chosen, a large population of chromosomes with fitness larger than the small number are more likely to be selected, but when a large number is chosen, only a small of population of chromosomes with fitness larger than the large number are more likely to be selected, so in this way, it gives each chromosome a chance to be selected proportional to its fitness score. The roulette wheel selection is illustrated in Figure 1.2 and Figure 1.3. Tournament Selection In a deterministic tournament selection, four chromosomes are selected using roulette wheel selection, then each two selected chromosomes are compared to decide which chromosome has higher fitness [18]. After two rounds of selection, two chromosomes with higher fitness are chosen to perform the last competition to return the final winner. The tournament selection algorithm is illustrated in Figure 1.4. In this research, a mixture of the roulette wheel selection and tournament selection are applied. For each iteration of the selection process, two pairs of chromosomes are selected at the beginning, then the tournament selection is applied to them Crossover After selection, the GA performs crossover to exploit the chromosomes. To determine if a chromosome will participate in the crossover, a constant parameter p c is used, which is the probability of crossover. Once a chromosome is chosen for crossover, another chromosome is also selected and copied from the current generation to be it s partner. Once finished, the partner is no longer used. Crossover can be implemented in several ways [18]. 9

18 Figure 1.5: Crossover Algorithm Uniform Crossover Uniform crossover is widely used in this research. In uniform crossover, a constant crossover number of points k is used. In this research, each chromosome/configuration consists of p parameters, and k of them are selected to perform crossover. Once uniform crossover occurs, k parameters settings are randomly chosen from the partner and swapped between the parents.the algorithm is illustrated in Figure 1.5. N-Point Crossover Some crossover operators tend to give the offspring several settings from a single parent, if those parameters are located near to each other in the chromosome. This can be achieved by placing related traits near to each other. The N-point crossover is used usually in this scenario. To survive a crossover, one point crossover can ensure that some related parameter settings can be moved together from the old parent and swapped to the new parent. So in one point crossover, if a random trait (parameter) is chosen, itself and all traits after it are all swapped to the new parent. Similarly, in two point crossover, two random traits are chosen, themselves and all traits between them are swapped. In three point crossover, a third parent is chosen through the selection process. If the first two parents have the same parameter setting, this setting is used 10

r Figure 1.6: range integer Parameter Mutation Algorithm by the child, otherwise, the child will use the third parent s setting. 1.5.

19 r Figure 1.6: range integer Parameter Mutation Algorithm by the child, otherwise, the child will use the third parent s setting Mutation After crossover, the GA performs mutation to further explore the chromosome space [18]. To determine if a parameter within the chromosome will be mutated, a constant parameter p m is used to control it, which is the probability of mutation. Once a parameter is chosen for mutation, a specific mutation operator for that parameter will be called since there are three different types of parameters, bit vector, option list, integer range, and for each type, it has its own specific mutation operator. Bit vector parameter is mutated through flipping a random bit to 0 or 1. For option list parameters, one can perform the mutation through choosing a random option from an associated list of valid parameter settings. Integer range parameter can randomly choose a integer within a range using a probability density function. Through ex- 11

20 Figure 1.7: Diversity Algorithm periences, some integer range parameters have a large range but only a small set of integers can be selected for providing security, and the rest of them will cause vulnerability. To consider this, the probability density function is modified from normal distribution, making very low integers, very high integers, and the integers close to the range s median to be most likely to be selected from mutation. The mutation algorithm for integer range parameters is illustrated in Figure Chromosome Diversity Analysis It is important to keep a diverse collection of high quality configurations, since in the moving target strategy, different configurations are applied across computers. To evaluate the diversity between two chromosomes, the Hamming distance overall corresponding traits (parameters) is calculated. To calculate the diversity between two chromosomes [25], we use the pairwise comparison for each parameter setting, and 12

21 if there is a change, a distance 1 is added to the result. However, it s computationally expensive to measure this change value when the population of chromosomes n and the number of the parameters within each chromosome are large since it needs to do the pairwise comparison for each pair of chromosomes within the current generation. In this research, it has 140 chromosomes and 102 parameters within each chromosome, then we need calculate (nearly ) steps. To easily and efficiently calculate the diversity, for each generation, we only sample a small population of chromosomes k (k = 14 in this research, this sampled diversity can represent the true diversity of the generation [25]) to do the pairwise comparison, and a Hamming distance is measured from this sample set of chromosomes to represent the generation diversity. The diversity algorithm is illustrate in Figure Resilience Approach The cyber attacks a computer system will encounter over time will constantly change. Attackers will alter their strategies as new vulnerabilities and exploits are discovered and developed. As a result, attackers cannot be successfully stopped using only traditional static defenses, such as firewalls and signature-based intrusion detection. Providing security now requires constant attention, and in many cases this includes monitoring and updating system configurations. Unfortunately, making these timely system updates may be difficult due to the number and diversity of managed systems. Given the difficulty of managing secure systems, having a resilient approach that automatically adjusts configurations to the current threat environment, or enables recovery from an attack, would offer defenders a significant advantage. In addition, the continual system change can form a diversity defense, or Moving Target (MT) defense, where the attacker is forced to contend with an ever changing system thus increasing the expense associated with a successful attack. For example, A3 [22, 23] is 13

22 a resilient-oriented MT defense that provides an execution environment that monitors and automates mitigation strategies such as isolation policies, network remapping, and source recompilation [19]. Each strategy can be considered a resilient MT tactic and can be combined to form a comprehensive, albeit complex, defense. Autonomic Computer Network Defence [2] and LISYS [9] are two other approaches to resilient systems. Related, Symbiotic Embedded Machines [5] is a resilient software approach. However given that software misconfiguration is a common and wide-spread security issue, focusing on providing resilient configuration management alone can greatly benefit security. Evolutionary-based algorithms can be leveraged to create a novel resilient software configuration defense. Using this approach, software configurations (application and/or operating system) are modeled as chromosomes, where configuration parameters are individual chromosome traits or alleles. Mimicking different forms of selection, crossover, and mutation processes observed in nature, such an approach continually evolves system configurations in response to the current environment. 1.6 Machine Learning In addition to the GA described before, another strategy, classification, is also used by the MT system [25]. At the beginning of this strategy, a training data set is provided to the classifier, consisting of many objects with different features and the label of the group to which each object belongs. The classifier uses this training data to create a model for the feature space. Then after the classifier is provided with new unlabeled data points, it can return the label of the group to which each data point belongs based on the generated model. Classification techniques can enhance the GA by singling out low fitness trait combinations and removing them from the population. This can be achieved by 14

23 Figure 1.8: Illustration of Linear and Nonlinear Classification correlating a parameter s setting changes to the chromosome s fitness changes. Once a model is generated based on the previous generations, it can determine if parameters setting are secure or not for chromosomes of future generations. A linear classifier can well handle linearly separable data set, but not for nonlinearly separable data set [14]. Take a one-dimensional case for example, a linear classifier can straightforwardly classify the one-dimensional data set in top panel of Figure 1.8, but not for the one-dimensional data set in middle panel of Figure 1.8. Many computer parameters have a very large space of possible settings, and they are also not a linearly separable data set, such as the bit vector parameters. To solve this problem, we can map the data on to a higher dimensional space, then use a linear classifier in the higher dimensional space [14]. For example, mapping x {x, x 2 }, the last panel of Figure 1.8 shows that data becomes linearly separable in the new representation. The general idea is to map the original feature space 15

24 Figure 1.9: The support vectors are the 5 points right up against the margin of the classifier to some higher dimensional feature space where the training set can be separable. However computing the mapping itself can be inefficient in some cases. Support Vector Machines (SVMs) can efficiently perform a non-linear classification using what is called the kernel trick [4], implicitly mapping their inputs into high-dimensional feature spaces. The SVM algorithm searches for a decision surface that is maximally drifted away from any data point. This distance from the decision surface to the closest data point determines the margin of the classifier. A small subset of the data defines the decision function for a SVM and they are known as the support vectors [14]. Figure 1.9 shows an example of the margin and support vectors. Maximizing the margin is necessary because points near the decision surface represent highly uncertain classification decisions: it s a 50% chance for the classifier deciding either way. 16

25 Figure 1.10: Illustration of a SVM with Algebra Let s illustrate a SVM with algebra which is shown in Figure 1.10 [14]. A decision hyperplane can be defined by an intercept term b and a decision hyperplane normal vector w which is perpendicular to the hyperplane. This vector is commonly defined as the weight vector. To choose among all the hyperplanes that are perpendicular to the normal vector, we specify the intercept term b. Since the hyperplane is perpendicular to the normal vector, all points x on the hyperplane satisfy w x = b. Now suppose that we have a set of training data points {( x i, y i )}, where each member is 17

26 Figure 1.11: Algorithm of History Tally for Each Parameter a pair of a point x i and a class label y i corresponding to it. For SVMs, the two data classes are always named +1 and 1 (rather than 1 and 0), and the intercept term is always explicitly represented as b Temporal Classifier At the beginning of the GA, each configuration of the whole generation is returned with fitness scores, the changes between each parameter for each chromosome are examined. In general, the classifier collects information about difference in configuration fitness score of the current and previous generation to create a historical tally over generations, which is a numerical estimate of the setting s relative fitness [25]. For the later discussion about SVM classifier with linear kernel function or polynomial kernel function, the absolute value of the setting s relative fitness can be regarded as the setting s weight which is illustrated in Figure 1.12 and Figure However, there is a shortcoming for this method, since the overall fitness change between the previous 18

27 Figure 1.12: Classifier for bit vector Parameter and current generation does not just reflect the change of one specific parameter, but also to some other parameters with potentially substantial changes. If the child chromosome has a mutation to evolve a setting of B for a given parameter from its parent whose setting is A, the difference between the child s and the parent s fitness is added to the historical tally for B of the parameter, but subtracted from the historical tally for A. This can reflect the observed change for each setting of all parameters over generations. For example, if switching from setting A to B results in a negative fitness score change between the child and parent generation, then the historical tally for the setting A of the given parameter will be assigned a positive value, however the historical tally for the setting B of the given parameter will be assigned a negative value. This algorithm is illustrated in Figure 1.11 [25]. For each generation, all the settings historical tallies will be checked. When a 19

28 Figure 1.13: Classifier for integer range Parameter setting s historical tally drops below a specific threshold (-4800 in this research which is chosen based on experimentation), the classifier considers that setting insecure and that will be removed from the domain. Once the insecure setting found, the mutation operator is applied to remove it. A mutation operator can permanently restrict the domain of a configuration parameter. It is used to remove settings from consideration after multiple generations, after sensing that some specific settings after evolving always cause low fitness score. Since in this research there are three different type of parameters, it also has three different mutation operators for them. For parameters with option list parameter, that insecure setting option is simply removed from the list, so the mutation operator can never select it again. However, a SVM is needed to perform the parameter domain mutation for both bit vector and integer range parameters. Since these two types of 20

29 parameters feature settings have measurable distances between one another, it is not enough to just remove the insecure value since an entire insecure region around could possibly cause the insecurity. To do this, each setting with a positive historical tally is placed into the possibly secure group and each setting with negative historical tallies less than half the minimum (minimum relative fitness score for each parameter) is placed in the insecure group. The rest of the settings are discarded from the training data, although they will eventually be classified into one of the two groups by the SVM [25]. In this research, for example, for a bit vector of length 9, the setting is projected into 9-dimensional space, with each bit becoming the value of the associated number in the 9-tuple. The SVM for the bit vector parameter is then trained on the resulting data with a linear kernel. The process for classifying a bit vector parameter is illustrated in Algorithm of Figure 1.12 [25]. For integer range parameters, the settings are placed on the x-axis of a two dimensional plane and the SVM is trained on the data with a polynomial kernel of degree two. This allows the classifier to contain all of one group inside the parabola and all of the other outside it, splitting the axis into a possibly secure region and an insecure one or intersect the axis twice within the region containing the data, creating a secure region surrounded by two insecure regions or vice versa. This is illustrated in the algorithm of Figure 1.13 [25]. Whenever the bit vector or integer range parameter mutation operator is called, it classifies the new value with the SVM. If the new value is in the insecure region, a new setting is regenerated until a possibly secure one is chosen. All chromosomes in the current generation are similarly checked for parameter values to see if they are in the insecure region. In this classifier strategy, all the new values will become the new training data set to regenerate the feature model to use for the future generations. 21

30 Figure 1.14: Spatial Classifier: Multi-Tree Spatial Classifier Temporal classifier takes each parameter s settings as data object. For the spatial classifier, the data object will be each configuration total settings, and it will determine if the future chromosomes (configurations) are secure or insecure [25]. For each configuration, there are three different types of parameters settings, using a decision tree is better than using SVMs since in the case, it doesn t require data normalization, dummy variables don t need to be created and blank values don t need to be removed. The classifier correlates some minimum number of the attacks with the security. For example, when some machines are successfully attacked for more than the minimum number, their configurations are defined as possibly insecure, otherwise, they are defined as secure. 22

31 To construct a decision tree, a configuration is converted into a tuple in a high dimensional space. Each setting for integer range parameter remain a single integer in the tuple. Bit vector parameters with length n are split into n different numbers, each of which gets one of the bits. Option list parameters are replaced by dummy variables. A classification and regression tree (decision tree) is then constructed based on all configurations ever scored. In this research, we build a separate tree for each generation, so once a new configuration is generated through GA, it is checked against each tree individually. The spatial classifier algorithm is illustrated in Figure At the end of the GA s mutation process, it classifies each configuration using the decision tree. If it is defined insecure, mutation is applied again to a single random parameter and the chromosome(configuration) is reclassified. It repeats until the new chromosome is classified as possibly secure. This process can go on for many iterations until a possibly secure configuration is discovered. 23

32 Chapter 2: Implementation Details This chapter contains details about the representation of generation, chromosome and traits. It also contains details about the attack simulation experiments and the implementation of the genetic algorithms. 2.1 XML File for Configurations The configurations file in a XML format is used for the GA in this research. The details are illustrated below and in Figure 2.1. Configurations(Generation): identifies the generation number (the first generation in this example) and the number of parameters (102 in this research). Configuration(Chromosome): identifies each configuration within a generation of configurations. It contains the id number from 0 to 139 for each generation in this research; the fitness of each configuration indicated by truescore; the number of attacks for each configuration and fitness associated with attacks indicated by attack and score respectively and both used for attack simulation experiments. Parameter(Trait): identifies three different types of parameters (bit vector, option list, integer range) and the value to be applied. It contains the id number of the parameter, the probability distribution ( uniform for integer range parameter), different option choices for option list parameter, and the preferred value defined by the security baseline which is indicated by preferred. For each parameter, it also contains its own fitness score defined by score. 24

33 In this research, a scoring rubric XML file is also used to store all the CVSS vectors (CVSS vector is described in section 1.3) for all the parameter settings scoring of parameters based on known attacks which indicate a vulnerability for these parameters. Each configuration fitness score (defined by truescore in the above XML configurations example) is based on the CVSS vector, it is assumed that attackers will try to design exploits based on parameters with easy or low security more often than parameters with difficult or high security. A fitness score can be assigned to the CVSS score to estimate the number of the attacks for the parameter vulnerability. In this research, to assign a CVSS vector a fitness score, we have three categories: bad, neutral and good. A specific score is assigned to each of the three values, score 1 for a bad setting value, score 10 for a medium setting value, and score 100 for a good setting value, so the best fitness score for a CVSS vector is 600, and the worst fitness score is Attack Simulation Details In this research, we simulate single parameter attacks and multiple parameters attacks in several experiments. During an attack simulation, an attack profile is provided for the GA. It can be a single parameter which was attacked in one generation or can be multiple parameters simultaneously attacked in one generation. During simulation, for each iteration of the GA, it checks each parameter of every configuration of current generation. If the parameter is in the attack profile and its corresponding setting s fitness is not perfect (600), the score of this parameter, reflecting the fitness associated with attack simulation, is assigned to the same score as its own fitness score. If the parameter is not in the attack profile, the score is assigned to the perfect score 600. In this way, the truescore of each configuration represents the real fitness associated with CVSS vectors, and the score of each configuration 25

34 Figure 2.1: Sample of Configurations File represents the fitness associated with the attacks which actually indicates how many parameters got attacked and don t have good configurations. In the classifier strategy described before (history tally for each parameter), each parameter setting s relative fitness change is correlated with the score change between the current and previous generation, and this change is cumulatively collected over generations. Then up to each generation, it can check all the parameter settings history tally to classify the secure and insecure settings for each parameter. 2.3 Genetic Algorithmic Details Evolutionary algorithm implementations use a combination of selection, crossover, and mutation to search the solution space. The implementation of the EA algorithm in this research is a Genetic Algorithm. Each chromosome population has a fixed size 26

35 of n chromosomes (configurations), where each chromosome represents k configuration parameters. For example in this research, the population will have 140 chromosomes (n = 140) and each chromosome will consist of 102 parameters (k = 102). This size is large enough to contain parameter complexities, such as chains, while still easily analyzable. Based on the the size of the configuration, 140 chromosomes were found to be an effective pool size (The choice of 140 chromosomes was based on practical consideration related to the execution of the algorithm period, certainly the number of chromosomes can be increased by sacrificing more time and space complexity). In the initial generation, the chromosomes parameters can be set randomly using a uniform distribution, except for integer range parameters which are randomly set using an associated probability density function. The selection operator identifies a member from the current chromosome pool based on proportional fitness. Tournament selection chooses two pairs of children from the current population of computer configurations. The uniform crossover operator exploits the combination of the n pairs of children to form n offspring chromosomes, thus, combining portions of existing configurations to create a new configuration. In our current research, 68% of the 102 randomly (using a uniform probability distribution) chosen parameter settings are exchanged between pairs of children, with p c = 0.05 (for each chromosome). This relatively low rate of crossover is necessary to balance the fairly high number of uniform crossing points. Experimentally, the rate of 0.05 has been found to yield effective results. Mutation explores new regions of the problem space by randomly changing parameters in the n offspring. The purpose of mutation is to maintain diversity across the generations of chromosomes and avoid permanent fixation at any particular locus. The probability of mutation is p m = 1.0. In this research, every chromosome gets two traits (parameters) mutated. If a parameter is mutated then type specific 27

36 mutation is used. An integer range parameter is mutated by choosing a new value in accordance with its specified probability density function. A option list parameter randomly chooses a value from the set of allowable options. The bit vector parameter randomly assigns a new value. 28

37 Chapter 3: The Role of the Classifier in Learning Attacks Genetic Algorithms (GA) through selection, crossover, and mutation generally do increase the fitness score of all configurations, and the algorithm doesn t have to focus on providing a secure configuration level for specific parameters. However, in reality, if we can determine which parameters get attacked or which specific type of parameters need to be configured to a secure level, then classifiers can be applied to securely configure the specific parameters in a fast fashion. This research will address three questions: firstly, how the the GA, in general, influences the learning behavior for the attacked parameters; secondly, whether the GA with classifiers can be faster in making good configurations for the specific attacked parameters; and thirdly, what level of good configurations can be achieved using the GA with classifiers or the GA itself. To investigate these questions, we conduct several experiments. In general, we have four experimental scenarios: (1) single parameter attack profile in a random order or in a repeated order. (2) single parameter attack profile in a repeatedly random order. (3) single parameter attack profile in a contiguous repeatedly random order. (4) multiple parameters attack profile in a repeatedly random order. By specifying attacks in different modes, we can investigate the learning behaviors for the specific parameters in a controlled manner. Table 3.1 lists all the experiments discussed in this chapter and their scoring methods and purposes. Natural selection and evolution are the motivations for the algorithmic paradigm known as Genetic Algorithms (GA). A population of candidate problem solutions are modeled as chromosomes, and the GA works to evolve a new population of candidate problem solutions from the current one. Fundamental to the GA is a fitness function, 29

Lecture 9 Evolutionary Computation: Genetic algorithms

Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Simulation of natural evolution Genetic algorithms Case study: maintenance scheduling with genetic