A New GP-evolved Formulation for the Relative Permittivity of Water and Steam

ew GP-evolved Formulation for the Relative Permittivity of Water and Steam S. V. Fogelson and W. D. Potter rtificial Intelligence Center he University of Georgia, US Contact Email ddress: sergeyf1@uga.edu bstract he relative ermittivity or static dielectric constant) of water and steam has been exerimentally calculated at a relatively wide range of temeratures and ressures. single function for redicting the relative ermittivity of water and steam in three distinct thermodynamic regions is evolved using genetic rogramming. data set comrised of all of the most accurate relative ermittivity values, along with temerature, ressure, and density values from the entire exerimentally calculated range of these values, found in [Fern95], is used for this tas. he accuracy of this function is evaluated by comaring the values for the relative ermittivity calculated using the evolved function and the values calculated using the latest formulation of Fernandez et al., found in [Fern97] to the aforementioned data set. In all regions, the newly evolved function outerforms the most current formulation in terms of difference between calculated and exerimentally obtained values for the dielectric constant. Keywords: genetic rogramming, relative ermittivity, thermodynamic roerties 1. Introduction he relative ermittivity or static dielectric constant) of water and steam, ε r, has been exerimentally calculated at a relatively wide range of temeratures and ressures. he relative ermittivity is an imortant indicator of the solvent behavior of water in a variety of biological cell membrane electrohysiology, intracellular biochemical rocesses) and geohysical/industrial geochemical high temerature, high ressure rocesses in dee sea vents and in industrial rocessing lants) settings [Fern97]. Many rior researchers have attemted to create a single function that accurately redicts the relative ermittivity of water and steam, the earliest of which being Quist and Marshall s 1965 formulation [Quis65]. However, these attemts have suffered from a lac of exerimental values across the entire temerature and ressure range, and thus have only been able to aroximate the relative ermittivity of water with minimal uncertainty over a small range of temeratures and ressures. Recently, Fernandez et al. comiled all of the exerimentally available data for the relative ermittivity of water and steam in a single database [Fern95]. Furthermore, Fernandez et al. evaluated the methods used to exerimentally derive the relative ermittivity and chose a subset of the total data set that was the most accurate and that was recommended for use in data correlation. Fernandez et al. roosed a new formulation in [Fern97] that used this subset to generate a statistical regression function that aroximated the relative ermittivity fairly well across the entire exerimentally available temerature and ressure range. In an earlier aer [Foge07], we roosed two individual functions evolved using genetic rogramming that divided the entire data set recommended for data correlation by Fernandez et al. into two distinct thermodynamic regions, with each equation alied to the temerature and ressure range secific to the given thermodynamic region. lthough that roosed formulation

outerformed Fernandez et al. s formulation across the entire range of data values in both thermodynamic regions, a formulation that utilizes a single equation to aroximate the relative ermittivity across the entire range of exerimental values would seem both more natural and aroriate, and is an imortant goal for researchers in this area. It was hoed that an increase in the size of the evolving oulation of rograms couled with an increase in the maximum size any individual rogram could be would allow for the discovery of just such an equation. In the current aroach, such an equation has been evolved and closely aroximates the relative ermittivity of water across the entire range of exerimentally verified temerature and ressure values. he accuracy of this function is evaluated by comaring its outut value for the relative ermittivity of water at a given temerature and ressure with the outut relative ermittivity value obtained using the latest formulation of Fernandez et al., against the subset of dielectric constant values that Fernandez et al. chose for data correlation mentioned earlier.. he Static Dielectric Constant he static dielectric constant hereon relative ermittivity) of a substance, ε r, is roughly defined as the ability of a substance to transmit or allow the existence of an electric field. More formally, the relative ermittivity of a substance, ε r, is the ratio of the static ermittivity of the substance, ε s, to the static ermittivity of a vacuum, ε 0 [Fern95]. he relative ermittivity of a substance is used for ractical uroses in the design of caacitors. he behavior of the relative ermittivity of water is related to its hysical state or hase as a liquid or as steam), temerature, and ressure. Exerimentally verified relative ermittivity values for water in its solid state as ice) at temeratures as low as 190K 83 C ) exist [Mats96], however this data did not include corresonding ressure values for any of the measurements, and as a result, could not be used. Water, in its liquid or gas steam) state can exist within a large range of temeratures and ressures, and this range has been traditionally divided into 4 regions,, B, C, and D. Region is the normal liquid water state between the normal freezing and boiling oints ~73K to ~373K) at ressures u to 1000MPa. Region B refers to water along the liquid-vaor hase boundary. Region C is the region with a temerature above 373.15K. t lower ressures and temeratures within region C, water is in the normal vaor steam) state. t higher ressures and temeratures in this region, water becomes a suercritical fluid, that is, water ceases to behave as if it were in either the liquid or gas state, but rather exhibits a combination of the thermodynamic roerties attributable to both liquids and gases. Finally, region D refers to suer cooled water water that exists in the liquid state below the normal freezing oint of 73.15K at the standard ressure of ~.1MPa). he behavior of the relative ermittivity exhibits discontinuities along the liquid-vaor hase boundary region B) and in the suercritical art of the region above the normal boiling oint region C), with very small changes in the temerature and ressure causing very large changes in density and in the value of the relative ermittivity [Harv06]. s a result, theoretical formulations for calculating the relative ermittivity of water have mainly focused on a narrow range of temeratures ~70K to ~315K) and ressures ~.1MPa to 100MPa) below the hase boundary [Fern95]. Furthermore, data oints along the hase boundary region B), although numerous, have not had their ressure values recorded, and thus have not figured in any data-driven correlations that correct for ressure differences. he most current formulation for aroximating the relative ermittivity across the entire range of exerimental temeratures and ressures may be found in [Fern97] and is also reroduced in the results section. Fernandez et al. s formulation uses an extensive adative regression algorithm to create an aroriate function taing a wide variety of domain secific thermodynamic

values including first, second, and third derivatives of the temerature and ressure inuts with resect to each other) into account. he final function uses 5 adjustable arameters and a total of 5 constants and domain secific non-adjustable arameters and aroximates well across the entire range of exerimentally available values. 3. Evolution and Genetic Programming Genetic Programming GP) may be seen as an abstract algorithmic imlementation broadly insired by the main rinciles of Darwin s theory of evolution by means of natural selection. Roughly, Darwinian evolutionary theory involves oulations of interbreeding organisms secies) cometing for environmental resources over time. Secies share genetic material by interbreeding, and random mutations occur to members of the secies that may either hinder or further their reroductive success. s the members of a given secies breed with each other over time, characteristics beneficial for the secies survival roagate throughout the oulation, while those characteristics that are detrimental to the survival of the secies do not get exressed in the oulation. hat is, individuals with characteristics that favor their survival within the given environment tend to roagate, whereas individuals not ossessing those characteristics in the environment or those that exhibit detrimental characteristics) tend to die out. GP alies the broad tenets of Darwinian evolutionary theory within a heuristic framewor that attemts to create automatically generated rograms that evolve to otimally solve user-defined roblems [Koza9]. GP is an extension of the evolutionary comutational aroach nown as genetic algorithms G) first ioneered by John Holland [Holl9]. Within the GP framewor, a oulation of candidate solutions, each reresented as an executable comuter rogram of some finite length an individual of a given oulation), evolves in resonse to some roblem to be solved the environmental conditions) [Koza9]. Each GP individual/candidate rogram within the oulation is given a fitness value that is the outut of a function the fitness function) that determines the aroriateness or otimality of the rogram outut individual behavior) when given the user-defined roblem the environmental conditions). his allows each individual within the GP oulation to be measured against every other individual, whether the individual solves the roblem otimally resonds to the environment) or not. Once all of the individuals within a oulation have been assigned a fitness value, certain individuals are robabilistically chosen to recombine and create offsring based on their fitness values, so that individuals with higher fitness values tend to be chosen more frequently for recombination. During recombination two unique individuals are chosen to reresent the arents, and may stochastically recombine to generate two offsring. Occasionally, however, because recombination is robabilistic and does not always occur) they do not recombine and remain unchanged as offsring. fter every recombination event, an offsring individual may be mutated with some small robability. he series of stes from initial oulation generation, arent selection, recombination, and mutation of offsring constitutes a generation of the GP run. t the start of every generation, newly created individuals in the oulation are evaluated by the fitness function and assigned a fitness value. he GP run continues in this manner after the generation of the initial oulation, only fitness value assignment, arent selection, recombination, and mutation of offsring occur) until some stoing criteria such as the creation of an individual with either some given minimum or maximum fitness value, or one that adequately solves the roblem at hand) has been reached. Each GP individual uses a tree-based reresentation scheme, where the tree comletely reresents a given rogram. odes for the GP rogram tree either come from the terminal set or the function set both redefined by the individual imlementing the GP search).

he terminal set comletely defines the inds of inuts the given rogram can use to solve the roblem. he members of the terminal set can only occur as leaf nodes within the rogram tree that is, nodes that have no children). he function set defines the inds of transformations that are ermissible given any of the elements in the terminal set or any of the other elements within the function set as arguments to each of the elements within the function set. hus, the members of the function set may only occur as the internal nodes of a GP-generated rogram tree nodes with at least one child node). hese restrictions amount to the fact that the union of the function and terminal sets of a GP imlementation must ossess the roerty of closure where closure is defined as the ability to have any comosition of functions and terminals roduce an executable comuter rogram) [Ghan03]. he rogram trees generated using GP do not have to be standard binary trees trees where every node is either a leaf node, or has a maximum of two child nodes), as the exerimenter may define a function oerator within the function set that taes more than two arguments. Initially, GP individuals are randomly generated through a stochastic tree-building rocess where each node in the tree is chosen to be a random member of either the function or terminal sets. raditionally, GP candidate rograms are initially generated either strictly to some maximum initial tree deth limit where all nodes u to the maximum initial tree deth are chosen stochastically exclusively from the function set and all nodes at the maximum initial deth limit are chosen exclusively from the terminal set), or until all of the branches of the tree have either gone to the maximum initial deth or have ended in terminal nodes before the maximum initial tree deth has been reached. he genetic oerators of crossover and mutation, as well as the way in which individuals are raned according to their fitness level are modified from the G aroach described in detail in [Holl9]) to suit the GP technique. Crossover occurs by selecting two nodes on different arent trees and then swaing all of the children of the selected nodes as well as the selected nodes themselves) between the two individuals. Mutation, on the other hand, involves selecting a node at which mutation will occur, deleting all of the nodes that are children of the selected node, and then generating a random tree with this node as its root. he fitness evaluation and raning method in GP are slightly different from the classic G aroach where fitness maximization is standard) in the fact that the highest raning individual rograms in GP have the lowest fitness values in effect, a minimization roblem). hus, GP attemts to find a rogram with the globally minimal fitness value in the search sace of all ossible rograms that may be created using the function and terminal sets used in the roblem, to the tree deth or rogram length secified in the GP setu. Ultimately, the GP aroach involves determining a set of functions and terminals to be used in solving the roblem, defining a fitness measure by which individual rograms may be evaluated and assigned a fitness value, setting the secific arameters and oerator robabilities that are involved in rogram tree generation crossover and mutation robabilities, initial tree deth limit, maximum tree length, etc.), and develoing a set of rules or stoing criteria to determine when to end a secific GP run whether after a certain number of generations have elased, or after an individual rogram with a desired fitness threshold has been found). 4. Exerimental Set-U In our aroach, a variety of different function and terminal sets were exlored in an effort to evolve a single function that could model the relative ermittivity of water as a function of ressure, temerature, and density in thermodynamic regions, C, and D of the temerature-ressure hase sace. Recall, no emirical temerature and ressure data for region B along the hase boundary) is currently available [Fern95], and thus a function

aroximating the dielectric constant in region B was not evolved. he function for regions, C, and D was evolved using data sets taen from [Fern95] and was then comared to relative ermittivity values calculated with the same inut temerature/ressure/density values taen from the same data sets) using the newest formulation for dielectric constant rediction, found in [Fern97]. hese data sets were comiled from all revious exerimentally available data, and were then corrected by Fernandez et al. to coincide with the most recent internationally acceted temerature scale, IS-90. In most cases, values were rovided for the temerature in degrees Kelvin, or K), ressure in megaascals, or MPa), and the corresonding dielectric constant. However, in some cases, temerature/density/dielectric constant values were given instead of temerature/ressure/dielectric constant values. In these circumstances, density values were converted into their corresonding ressures, and ressure values were converted to their corresonding densities using the IPWS-95 formulation for the equation of state of water found in [Wagn0]. With this comleted, the final data set uniformly reresented the dielectric constant at every temerature, ressure, and density value that was exerimentally available as of December 006). he function was evolved by generating a oulation of ossible functions reresented as trees) as with standard genetic rogramming imlementations. Each candidate function s fitness was taen to be the sum of the absolute errors between the calculated and the exerimentally measured value for the relative ermittivity at every inut value in the corresonding data set. he combination of inut values for each function that is, what combination of the three ossible adjustable inuts was to be used) was determined by the GP module. he oulation of ossible functions was then evolved with a variety of crossover/mutation robabilities and function sets. he data set of exerimentally calculated relative ermittivity values used to create the function consisted of 644 data oints, which reresent the comlete dataset that Fernandez et al. recommend for data correlations [Fern95]. he function with the lowest sum of absolute errors across the data oints that was found after all runs had been comleted was chosen as the final formulation. During any given GP run, all function and terminal sets used during function evolution always included addition, subtraction, multilication, and division as function oerators, and temerature,, ressure,, and density,, as terminal values. In cases where a generated function divided a value by zero, the zero-generating term was relaced by 0.00001. ll runs used a oulation of 100 random floating-oint constants in the range between 0 and 1, which were generated at runtime. hese constants would function as additional terminal values for the genetic rogram to use during function creation. Other function oerators sin), cos), ln), log 10, log, and x y ) and terminal oerators vogadro s number,, ermittivity of free sace, ε 0, elementary charge, e, Boltzmann s constant,, molar mass of water, M w, mean molecular olarizability of water, α, the diole moment of water, µ ) were also used in certain GP runs. he aforementioned terminal oerators are rovided in table 1. he function length of any individual solution a tree reresenting a given candidate function) never exceeded 100 functional units where a functional unit is taen to be a single oerator from the function set or a terminal value from the terminal set). he large size of the function and terminal sets causes the size of the search sace reresenting all of the ossible unique rograms of length 100 or less that can be generated from the function and terminal sets) to be enormous easily more than a googol). s a result, each GP run was done on a oulation of one and a half million individuals that were evolved for 00 generations. his was done in hoes that the GP

imlementation would uniformly samle as much of the search sace as ossible in its effort to find a suitable function within a reasonable time. range of crossover robabilities between.5 and 1.0, in increments of.05) and mutation robabilities between 0 and.5, in increments of.05) were exlored for all combinations of function and terminal sets. Each combination of unique arameter settings was imlemented in 10 GP runs, after which the function with the lowest total absolute error was chosen. 5. Results he otimal function that was evolved was found during a run that used multilication, division, subtraction, and addition as oerators in the function set and temerature, ressure, and density as terminal oerators with the 100 additional random ehemeral constants described earlier). he otimal function run used a robability of crossover of 0.9 and a robability of mutation of 0.05. he final evolved function, along with Fernandez et. al s formulation, are listed below next age). he results of alying the GP-evolved function and Fernandez et al. s formulation to the total data set are found in table 3. Our evolved non-simlified) function is significantly smaller 31 terms versus 11 terms) than the formulation develoed by Fernandez et al. and uses only three adjustable arameters temerature, ressure, and density), zero non-adjustable domain secific arameters, and only three of the one hundred ossible random ehemeral constants that were available during function evolution. o domain-secific nowledge aside from the data sets themselves) was alied to the formation of the function. s can be seen from table 3, the evolved function outerformed Fernandez et al. s formulation in all collected statistical categories excet the minimum absolute difference, where both functions had at least one data oint where very marginal absolute error <0.01) existed. 6. Conclusions and Future Wor: new function that aroximates the relative ermittivity of water and steam at a variety of temeratures and ressures has been develoed. his function was evolved using the GP technique with a secific function and terminal set, and its accuracy has been comared to that achieved by Fernandez et al. s most recent formulation. he evolved function aroximates the relative ermittivity of water and steam for a wide range of temerature and ressure values extremely well, imroving on Fernandez et al. s formulation across the entire exerimentally available temerature and ressure range while being simler comutationally. Further refinements to create more accurate aroximations of the relative ermittivity of water and steam will include creating an evolved function that can be used across all thermodynamically distinct temerature and ressure regions, including regions where water is in the solid hase, or where a hase boundary exists. his can be done when exerimental values for the temerature, ressure, and relative ermittivity in these regions are obtained. refined fitness function that taes more than the absolute difference between exected and calculated values may also rove useful in creating a new, more accurate formulation. Introducing a enalty for very large and difficult to read formulations may also hel in finding a function that is both comact and generalizes well across the entire thermodynamic sace. However, significant imrovements to the evolution of an aroriate function will most surely come from an increase in exerimentally verifiable values for the relative ermittivity, and thus any new accurate data that may be found should be used to refine the current formulation.

GP-Evolved formulation ε r.0486 1.617 1.617.617 ).0364.016.016.076 55.55 55.474 ) )1.313 6.7586.1194.0864.0036 3 Fernandez formulation: B B B B B r 4 4 9 10 18 9 5 5 1 ε where and B are given by g ε µ 0 ε α 0 3 B and where g is given by q c j c i c K g ) 1 8 ) ) ) 1 1 11 1 with w c M 3 and K c 096 647. and values for, i, j, and q given in table.

7. ables able 1. Constants used in the relative ermittivity formulation Parameter Value 7 1 Permittivity of free sace, ε 0 [4 10 π 9979458) ] C J Elementary charge, e Boltzmann's constant, vogadro's number, Molar mass of water, Mean molecular olarizability of water,α Diole moment of water, µ 19 1.6017733*10 C 3 1.380658*10 JK 3 1 6.01367 *10 mol 1 M w 0.01801568g * mol 40 1.636*10 C 6.138*10 30 1 J m Cm 1 1 m 1 able. Coefficients, and exonents i, j, and q of the equation for g 1 0.978448686 1 0.5 0.957771379375 1 1 3 0.37511794148 1.5 4 0.7146944396 1.5 5 0.9817036956 3 1.5 6 0.10886347196 3.5 7 1 0.9493748864*10 4 8 0.980469816509*10 5 9 4 0.165167634970*10 6 5 10 4 0.93735979577*10 7 0.5 11 9 0.131791870*10 10 10 1 0.19609650446*10 q1. i j able 3. Results and numeric comarison Evolved GP result Fernandez Sum bsolute Difference 103.1 149.73 Mean bsolute Difference 0.16 0.3 Standard Deviation bsolute Difference 0.5.15 Sum Squared Difference 55.8 306.43 Mean Squared Difference 0.09 4.68 Standard Deviation Squared Difference 0.54 117.4 Minimum bsolute Difference 0 0 Maximum bsolute Difference 3.55 54.61 # Data Points bsolute Difference formulation < bsolute Difference Fernandez 331 Percentage of otal Data Points better than Fernandez 51.08% otal Data Points 644

8. References [Fern95] Fernandez, D.P., Y. Mulev,.R.H. Goodwin, and J.M.H. Levelt-Sengers. 1995. Database for the Static Dielectric Constant of Water and Steam. Journal of Physical and Chemical Reference Data 41): 33-69. [Fern97] Fernandez, D.P.,.R.H. Goodwin, E.W. Lemmon, J.M.H. Levelt-Sengers, and R.C. Williams. Formulation for the Static Permittivity of Water and Steam at emeratures from 38K to 873K at Pressures u to 100MPa, Including Derivatives and Debye-Hucel Coefficients. Journal of Physical and Chemical Reference Data 64): 115-1166. [Quis65] Quist,.S., and W.L. Marshall. 1965. Estimation of the Dielectric Constant of Water to 800. Journal of Physical Chemistry 9: 3165. [Wagn0] Wagner, W and Pruss,. 00. he IPWS Formulation 1995 for the hermodynamic Proerties of Ordinary Water Substance for General and Scientific Use. Journal of Physical and Chemical Reference Data 31): 387-535. [Foge07] Fogelson, S., & Potter, W. 007. GP-Evolved Formulation for the Relative Permittivity of Water and Steam. o aear in he Proceedings of the International Conference of Industrial and Engineering lication of rtificial Intelligence and Exert Systems, IE-IE 07. [Ghan03] Ghanea-Hercoc, R. 003. lied Evolutionary lgorithms in Java. ew Yor, Y: Sringer-Verlag. [Harv06] Harvey, llan. IS, Personal communication. [Holl9] Holland, J. 199. datation in atural and rtificial Systems: nd Edition. Cambridge, M: MI Press. [Koza9] Koza, J.R. 199. Genetic Programming. Cambridge, M: MI Press. [Mats96] Matsuoa,., Fujita, S., Mae, S. 1996. Effect of temerature on dielectric roerties of ice in the range 5-39 GHz. Journal of lied Physics 8010): 5884-5890.