An adsorbed gas estimation model for shale gas reservoirs via statistical learning

Size: px

Start display at page:

Download "An adsorbed gas estimation model for shale gas reservoirs via statistical learning"

Spencer Hodges
6 years ago
Views:

1 An adsorbed gas estmaton model for shale gas reservors va statstcal learnng Yuntan Chen, a,1 Su Jang, b,1 Dongxao Zhang, a,* and Chaoyang Lu c a ERE and BIC-ESAT, College of Engneerng, Pekng Unversty, No. 5, Yheyuan Road, Bejng , Chna b Department of Energy Resources Engneerng, Stanford Unversty, Stanford, Calforna 95305, U.S.A. c Department of Materals Scence and Engneerng, Massachusetts Insttute of Technology, Cambrdge, Massachusetts 02139, U.S.A. * Correspondng author at: College of Engneerng, Pekng Unversty, No. 5, Yheyuan Road, Bejng , Chna. Tel.: ; fax: E-mal address: dxz@pku.edu.cn (D. Zhang) 1 Authors have made equal contrbutons to ths work. Graphcal Abstract: Keywords: shale gas; statstcal learnng; bg data; adsorbed gas; estmaton model; geologcal parameter. Abstract: Shale gas plays an mportant role n reducng polluton and adjustng the structure of world energy. Gas content estmaton s partcularly sgnfcant n shale gas resource evaluaton. There exst varous estmaton methods, such as frst prncple methods and emprcal models. However, resource evaluaton presents many challenges, especally the nsuffcent accuracy of exstng models and the hgh cost resultng from tme-consumng adsorpton experments. In ths research, a low-cost and hgh-accuracy model based on geologcal parameters s constructed through statstcal learnng methods to estmate adsorbed shale gas content. The new model conssts of two components, whch are used to estmate Langmur pressure (P L) and Langmur volume (V L) based on ther quanttatve relatonshps wth geologcal parameters. To ncrease the accuracy of the model, a bg data set that conssts of 301 data entres was compled and utlzed. Data outlers

were detected by the K-Nearest Neghbor (K-NN) algorthm, and the model performance was evaluated by the leave-one-out algorthm. The proposed model was compared wth four exstng models.

Furthermore, because all varables n the new model are not dependent on any tme-consumng expermental methods, the new model has low cost and s hghly effcent for approxmate overall estmaton of shale

2 were detected by the K-Nearest Neghbor (K-NN) algorthm, and the model performance was evaluated by the leave-one-out algorthm. The proposed model was compared wth four exstng models. The results show that the novel model has better estmaton accuracy than the prevous ones. Furthermore, because all varables n the new model are not dependent on any tme-consumng expermental methods, the new model has low cost and s hghly effcent for approxmate overall estmaton of shale gas reservors. Fnally, the proposed model was employed to estmate adsorbed gas content for nne shale gas reservors n Chna, Germany, and the U.S.A. 1. Introducton Shale gas, or natural gas trapped wthn shale formatons, s a type of relatvely clean energy resource. It conssts manly of methane and burns cleaner than other knds of hydrocarbon fuel. As an alternatve fuel, shale gas s attractng ncreased attenton globally [1-8]. Shale gas conssts of free gas, adsorbed gas, and soluton gas: free gas s the shale gas n the pore space wthn the shale rock; adsorbed gas s a sgnfcant quantty of gas adsorbed on the surface of organcs and clays n the shale formaton; and soluton gas s the gas dssolved n the reservor water or ol. The volume of soluton gas s governed by pressure and temperature. As the pressure drops below the bubble pont, the gas dssolved n the lqud begns to be released and becomes free gas. Fg. 1 llustrates these three knds of shale gas. In recent years, the producton of shale gas has ncreased sgnfcantly. For nstance, n 2000, shale gas only occuped 1.6% of gas producton n the U.S.A. [9], whle ths percentage rose to 47% n 2015 [10]. In Chna, the total producton of shale gas was only 200 mllon m³n However, t ncreased to 1.3 bllon m³n 2014 [11] and 4.47 bllon m³n 2015 [12]. Shale gas s thus expected to play an even more crtcal role n the world s energy supply n the near future. Fg. 1. Illustraton of free gas, adsorbed gas, and soluton gas n shale formatons. There are many challenges nvolved n shale gas development and evaluaton processes. One of the most mportant problems s to provde an accurate resource estmaton. For nstance, the U.S. Energy Informaton Admnstraton (EIA) and the Chnese Mnstry of Land and Resources have qute dfferent evaluatons of Chna shale gas resources, whch are 31.6 trllon m³and 25.1 trllon m³, respectvely [13, 14]. The uncertanty regardng resource estmaton wll not only

3 affect the gas well ste selecton on a mcro level, but wll also nfluence the natonal and ndustral polcy-makng process on a macro level. Thus, t s crucal to fnd a way to accurately evaluate shale gas resources around the world. Ths research emphaszes adsorbed gas because t accounts for 20% to 80% of the total gas [15-17]. Adsorbed gas also ncludes soluton gas, whch s only a small porton of the total gas content [14, 18]. Regardng free gas, many researchers have already proposed accurate estmaton models [18-20]. In gas content estmaton, frst prncple methods and emprcal models are common optons. The frst prncple methods are stll not thoroughly developed, because of the complexty of storage mechansms and the lack of understandng of the nfluence of numerous factors, such as thermal maturty (R o) and reservor temperature (T). Thus, t s challengng to buld an accurate theoretcal model. Regardng the emprcal models, the Langmur model (Eq. (1)) s the most commonly used [21, 22]. Its prmary advantage s that, once Langmur pressure (P L) and Langmur volume (V L) are determned, the Langmur adsorpton sothermal curve s ascertaned, whch makes t easy to calculate adsorbed gas content under any reservor pressure. Nevertheless, adsorpton experments are probably the only effectve methods to obtan Langmur parameters. However, these experments are very tme-consumng, as the adsorpton process wthn the mcroscale and nanoscale shale pores s slow [23], and the corng process to obtan the experment sample s expensve. These challenges cause dffcultes n determnng the correspondng Langmur parameters and hgh uncertantes regardng adsorbed gas content evaluaton. In addton, most of these experments are complcated and are subject to expermental errors. For example, leakage always occurs n the corng process, whch affects the accuracy of the adsorpton experments. Because both the frst prncple methods and emprcal models have dffcultes n gas content estmaton, the salent queston s: are there any alternatves that are reasonably accurate, but do not rely on any ste-specfc adsorpton experments? V PV L P P L (1) In ths work, we proposed to use wdely avalable geologcal parameters to estmate resource volume va statstcal learnng. After years of development, statstcal learnng has become a powerful tool to buld models [24-36], but t has never been appled to shale gas resource evaluaton. Statstcal learnng s a commonly used method for Bg Data Analytcs, and t s effectve n fndng a predctve functon based on a bg data set. Statstcal learnng has many successful applcatons n varous felds, such as computer vson, sales forecastng, and bonformatcs. It s not only utlzed to uncover hdden patterns and unknown correlatons between varables, but s also utlzed to allevate the nfluence of data uncertanty resultng from the operaton process. For the problem that we are consderng, geologcal parameters, such as total organc carbon (TOC), thermal maturty (R o, vtrnte reflectance) and reservor temperature (T), are much easer to obtan compared wth Langmur parameters. Thus, one promsng strategy s to use these easly-obtaned geologcal parameters to estmate the Langmur parameters wth statstcal learnng methods. Attempts have been made to dscern the qualtatve correlaton between geologcal parameters and Langmur parameters. The postve correlaton between V L and TOC has been shown n the

4 lterature [37-40], and a smlar postve correlaton between R o and V L s found [23]. The qualtatve nfluences of TOC, R o, and T on Langmur parameters are revealed by Zhao et al. [41]. Regardng P L, Kong et al. dentfed the effect of T, R o and porosty on the shale gas adsorpton process, and dscussed the determnng physcal mechansms [42]. Hao et al. studed the nfluence of T on P L, as well [43]. Although the qualtatve effect of dfferent factors has been wdely studed, the quanttatve relatonshps reman unclear. Attempts have been made to use lnear regresson to estmate V L wth TOC as the only ndependent varable [44-46]. Although TOC s essental n adsorpton, prevous studes have shown that TOC s not the only nfluencng factor [23, 41]. The dataset of these works are not abundant as well. For example, there are only 7-10 data ponts from shale gas reservors n these studes. As for P L, Xa et al. attempted to buld a frst prncple model based on sosterc heat, standard entropy, and temperature [47]. Nevertheless, the varables used n ths theoretcal model are dffcult to measure, whch restrcts model applcablty. Later, Zhang et al. utlzed T as an ndependent varable to estmate P L [46]. T s a sgnfcant factor snce adsorpton s an exothermc process. However, t s not comprehensve to consder T as the only ndependent varable. In addton, ths model s dataset only has sx data ponts. In concluson, many studes n the last decade estmated Langmur parameters by usng geologcal parameters, but these quanttatve models usually present three unresolved problems. The frst one s nadequate sample sze. Owng to the hgh cost of adsorpton experments, the datasets of prevous studes are often too small to obtan a relable and stable model. Some of these data are from coal bed methane and are not sutable for shale gas analyss [44, 46]. The second one concerns factor selecton. Numerous prevous studes consder factors non-comprehensvely [44-46]. Specfcally, most of these studes only focused on TOC. Ths s partally reasonable snce TOC consttutes the man hydrocarbon-generaton matter and gas carrer [48], but TOC alone cannot descrbe Langmur parameters suffcently, and other factors should be consdered [23]. The last problem s the lack of general applcablty. Most of the prevous work only has data that cover a certan reservor or even a sngle well, and the valdaton test s always overlooked [44-47]. In fact, among all of the three problems, nadequate sample sze presents the bggest obstacle, as t partally causes the other two problems. The man objectve of ths study s to solve the three problems above and buld an adsorbed gas content estmaton model based on geologcal parameters va statstcal learnng. The modelbuldng process s shown n Fg. 2. To solve the nadequate sample sze problem, 301 adsorpton experment data sets were compled and analyzed from 19 dfferent reservors n seven countres. Our data are much bgger than those n prevous studes, whch usually have less than 10 data sets. Once the sample sze s enlarged, statstcal learnng may become more effectve n the modelbuldng process. The varables are determned by correlaton analyss to solve the factor selecton problem. Data outlers may be detected by the K-NN algorthm to broaden applcablty [25-27, 35, 36]. In addton, model qualty s assessed by the leave-one-out algorthm [28-33]. Based on the bg tranng dataset and the statstcal learnng methods, ths paper proposed an effcent model that does not rely on tme-consumng experments. Fnally, the adsorbed gas contents from nne reservors n Chna, Germany, and the U.S.A. were estmated as case studes.

5 Fg. 2. Flow chart of the model-buldng process. The geologcal-parameter-based adsorbed gas estmaton model s constructed by the couplng of classcal Langmur model, PL submodel, and VL submodel. The PL and VL submodels are bult va statstcal learnng methods, whch are explaned n the dashed box. 2. Data descrpton and preprocessng The expanson of data sze s necessary for developng an accurate estmaton model, snce t consttutes the foundaton of the statstcal learnng process. Most exstng models have no more than 10 data entres, whch s not suffcent to construct a credble model. To overcome the data sze restrcton, raw data ponts are collected from 24 dfferent studes, experment reports, and databases n ths study [23, 37, 39, 40, 44-46, 48-64]. These data are avalable n the onlne supplementary materal of ths paper. They are gathered from marne and terrestral shales from varous reservors around the world. Table 1 shows the seven countres and 19 reservors used n ths research. The varety of data sources makes the model sutable for general use. Table 1 Data source nformaton (detaled data are shown n the onlne supplement) Country Reservors Lterature Amount of data ponts U.S.A. Barnett Shale [39, 48, 57] 32 Haynesvlle Shale [39] 2 Marcellus Shale [57-59] 15 Utca Shale [59] 6 Woodford Shale [46] 3 Eagle Ford Shale [39, 57] 2 Canada Montney Shale [57] 2 Besa Rver Formaton [59] 7 Colorado Group Formaton [59, 63] 13 Exshaw Formaton [59, 64] 3 Muskwa Formaton [59, 62] 3 Duvernay Formaton [62] 4

6 Chna Schuan Basn [23, 45, 48, 49, 51-53, 55, 56] 136 Yangtze Platform [40] 21 Ordos Basn [50, 61] 18 Germany Posdona Shale [39, 60] 31 Sweden Alum Shale [39] 20 Netherlands Carbonferous Shale [39] 9 Brazl ParanáBasn [44] 2 The raw data should be cleaned pror to use for model buldng. The cleanng process s explaned n the onlne supplementary materal. In ths work, data preprocessng ncludes two straghtforward steps. The frst step s data cleanng, wth the crteron that each data pont should have both dependent varables and ndependent varables. Accordng to ths crteron, the P L data pont should have R o, TOC and T data, whle the V L data pont should have TOC and T data (Secton 3.1 wll ntroduce the ndependent varable selecton method). In ths step, ncomplete data are deleted, and replcate data are ntegrated. The second step concerns the value range of geologcal parameters. It s reasonable to omt the out-of-range data. Prevous work ndcates that the TOC s always greater than 2% n promsng shale gas reservors [13, 65]. To be conservatve, 1% was taken as the floor level of TOC n ths work. The R o of promsng reservors should be lower than 4%, otherwse t s over-matured [66-68]. Moreover, the expermental data set exhbts the followng propertes: ) there s lttle data n whch TOC s greater than 17% or T s greater than 90 ; ) P L s usually greater than 1.5 MPa and less than 12 MPa; and ) V L s the maxmum adsorbed gas volume, and t has to be greater than 1 m 3 /t for the sake of economc explotaton. Thus, the effectve data pont for P L should have a temperature below 90, R o less than 4%, TOC between 1% and 17%, and P L hgher than 1.5 MPa and less than 12 MPa. After data processng, there are 101 data ponts remanng, as shown n the supplementary materal onlne. Regardng V L, the effectve data ponts should have a temperature below 90, TOC between 1% and 17%, and V L hgher than 1 m 3 /t. The data ponts outsde of ths range have to be deleted. There are then 200 data ponts remanng, as shown n the supplementary materal onlne. Fnally, ths research s database conssts of 301 data ponts (101 P L data and 200 V L data), whch s much larger than prevous works databases wth fewer than 10 data ponts [44-47]. 3. Model Constructon Regresson models are bult to dentfy the quanttatve relatonshps between Langmur and geologcal parameters. The model-buldng method conssts of the followng three parts: varable selecton, outler detecton, and model regresson. 3.1 Varable selecton In the Langmur equaton, P L and V L ndcate the adsorpton capacty of the shale formaton. V L denotes the maxmum adsorbed gas volume, and P L represents the pressure when the adsorbed gas volume s equal to half of V L [21]. In ths work, the Langmur parameters could be estmated by geologcal parameters due to the strong relatonshps between them. The geologcal varables are selected accordng to the data avalablty and ther nfluences on P L and V L. Recent studes

7 have shown that the man nfluencng factors of P L are reservor temperature (T), total organc carbon (TOC), and vtrnte reflectance (R o, whch reflects the thermal maturty). In addton, T, TOC, R o, and porosty have effects on V L. The qualtatve relatonshps between Langmur and geologcal parameters are shown n Table 2. Table 2 Qualtatve relatonshps between Langmur and geologcal parameters based on lterature values [23, 37-43] TOC R o T Porosty V L Postve Negatve Negatve Postve P L Negatve Negatve Postve Mcroscopcally, the abundance of pores drectly affects the adsorpton capacty. Organc matter offers numerous pores, leadng to large surface area and adsorpton capacty [69]. Ths ndcates that organc matter s not only hydrocarbon-generaton matter, but also the carrer of adsorbed gas. So, P L decreases and V L ncreases wth a hgher TOC. Regardng thermal maturty, t reflects the thermal evoluton degree of shale formatons. Thus, wthn certan lmts, a hgher R o represents a hgher hydrocarbon-generaton potental [13, 66]. However, ths does not guarantee low P L and hgh V L values for the followng reasons. On the one hand, R o means vtrnte reflectance, whch ncreases whle organc carbon decomposes and becomes more mature [65]. A hgher R o gves less organc carbon, and therefore reduces adsorbed gas potental. On the other hand, the organc carbon maturng process ncludes dehydrogenaton and deoxygenaton [70, 71], whch wll form pores n the mcroscopc structure and ncrease adsorbed gas potental [72, 73]. So, the nfluence of R o on adsorpton ablty should be determned by expermental results rather than speculaton. In fact, adsorpton expermental data ndcate that R o s negatvely correlated wth both P L and V L. Temperature s also an mportant factor for P L and V L. Accordng to Le Chateler's prncple, the ncrease of temperature nhbts the adsorpton process, an exothermc process, whch leads to lower adsorpton capacty. Fnally, the ncrease n temperature results n the ncrease of P L and decrease of V L. Regardng the varable selecton of the P L submodel, T, R o, and TOC are chosen as the ndependent varables. For the sake of smplcty, the number of ndependent varables should be reduced by creatng nteracton terms and makng the model three-dmensonal. From the analyss above, we know that R o has a negatve correlaton wth P L, whle there s a postve correlaton between P L and T. Thus, accordng to the qualtatve relatonshps between these ndependent T varables, the nteracton terms could be consdered as a new ndependent varable to ft the Ro P L submodel. Regardng the varable selecton of the V L submodel, R o and porosty are elmnated based on data sze, correlaton coeffcent between varables, and the cross-valdaton results. The data sze s counted for each par of varables. For nstance, the data sze of T vs. TOC means the number of data ponts whch have effectve V L, T, and TOC smultaneously. The effectve data here holds the same defnton as that n Secton 2. Correlaton tests between dfferent ndependent varables are then valdated. The orgnal dataset s gven n the onlne supplementary Table C. The results are shown n Table 3.

8 Table 3 Data szes and correlaton coeffcents between dfferent ndependent varables Data Sze Correlaton Coeffcent (absolute value) T vs. TOC T vs. R o T vs. Porosty TOC vs. R o TOC vs. Porosty R o vs. Porosty As shown n Table 3, there s a relatonshp between R o and the other varables. The absolute correlaton coeffcent between T and R o s as hgh as 0.64, and that of TOC and R o s 0.45, whch means that they are both moderately correlated [74]. In prncple, R o ncreases whle organc carbon decomposes, ndcatng a negatve correlaton between Ro and TOC. In addton, reservor temperature affects the organc carbon maturng process, whch s also measured by R o. Ths means that the effect of TOC and T n ths model may replace the one of R o. Moreover, the data szes of TOC vs. R o s only 80. The V L dataset only conssts of 79 data ponts f the model takes all of the TOC, T, and Ro as varables. However, the data sze has a 2.5-fold ncrease to 200 by cuttng out R o. In concluson, on the one hand, R o s statstcally and theoretcally related to T and TOC. On the other hand, the avalablty of R o restrcts the data sze. So, t s reasonable to use T and TOC to substtute the effect of R o for the smplfcaton of the model. Accordng to the cross-valdaton results (Secton 4.1 presents the process n detal), the elmnaton of R o actually ncreased the model accuracy. The average relatve error decreased slghtly from 25.75% of the model wth R o to 23.76% wthout R o. Furthermore, the half-wdth of the 90% confdence nterval of relatve errors dropped from 5.04% wth R o to 2.33% wthout R o, whch ndcates that the estmaton accuracy of the model wth R o has a hgher fluctuaton. The elmnaton of R o not only reduces the model dmensonalty, but also expands the data sze, whch results n a more accurate estmaton model. Smlarly, porosty s elmnated n the process. On the one hand, the sze of the V L dataset can be expanded from 66 to 200 by leavng t out. On the other hand, the TOC and porosty are moderately correlated wth the correlaton coeffcent of Ths postve correlaton has been observed n former studes [75]. Wang et al. determned that the porosty of organc matter n shale s hgher than that of mneral matrx [76], whch theoretcally explcates the moderate correlaton. Only one varable between TOC and porosty needs to be consdered n the model. Snce there are 200 data ponts that have T and TOC, and the correlaton coeffcent s only 0.26, porosty was substtuted by TOC and T n the model. Furthermore, the cross-valdaton results show that the average relatve error of the model wth porosty (27.98%) s hgher than the model wthout porosty (23.76%). In addton, the half-wdth of the 90% confdence nterval of the model wth porosty s 5.78%, whch s more than two tmes larger than that of the model wthout porosty (2.33%). Thus, the V L submodel takes T and TOC as the ndependent varables. 3.2 Outler detecton Outler detecton s essental because of the dscrepancy of data sources and the uncertanty

9 of geologcal parameters. Snce geologcal varables have contnuty, and the relatonshps between Langmur parameters and geologcal varables are monotonous, the K-Nearest Neghbors algorthm (K-NN) s sutable for outler detecton [25-27, 35, 36]. The man dea of the K-NN algorthm s to classfy the test data accordng to the k nearest tranng samples [77]. If the test data s property value does not match that of the tranng samples, the test data are consdered to be an outler and should be rejected. In ths work, we defne the weghted relatve error ( R ) as the crteron of outler detecton, whch means the relatve dfference between the th test data s dependent varable value and the average value of the k nearby tranng samples. In ths research, k s assgned to be 5 and the weghted relatve error s defned wth 1-norm, whch s shown n Eq. (2). The detaled formula dervaton process s shown n Appendx A. The test data are regarded as an outler f R s greater than the gven threshold. The denomnator of R s the smaller one of the test data s dependent value and the average value of tranng samples, whle the numerator s the weghted sum of the dfference between each tranng sample and test data. There s an nverse relatonshp between the weght and the dstance to the neghbor, whch means that the statstcally nearer neghbors matter more than the statstcally dstant ones. The sum of the weghts s equal to 1 for every test data. R k j1 wr V V, j L, L,, j j1 L,, j mn, k k V V L, (2) where V L, s the dependent varable value of the th test data; V L,, j represents the dependent varable value of the j th neghborng tranng data of the th test data; wr, j s the weght of the dependent value dfference between the th test data and the j th neghborng tranng data; and wr, j s very sgnfcant, and ths weght s calculated based on the dstance between the test data and tranng data. Although the detaled calculaton process s gven n Appendx A, t s essental to ntroduce the dstance, snce t s fundamental to determne the k nearest tranng samples and calculate the weght. Frst of all, ths dstance should not be the spatal dstance, because the spatal relatve locaton cannot precsely descrbe the smlarty n adsorpton ablty between two dfferent samples n the shale formaton. The statstcal dstance whch s based on geologcal parameters s appled to measurng the dstance between dfferent samples, n the form of Eucldean dstance. Secondly, only the relatve values of the varables n ther dstrbuton matter, and the absolute values are nsgnfcant n the calculaton of the statstcal dstance. Thus, the Eucldean dstance should be adjusted, and t s necessary to perform normalzaton on these varables. As a result, the weghted Eucldean dstance, whch s a statstcal dstance, s computed to determne the k nearest neghbors and assgn dfferent weghts to dfferent samples. There are 301 data ponts n the orgnal dataset before the outler detecton, consstng of 101

10 P L data ponts and 200 V L data ponts. Accordng to Eq. (2), the weghted relatve error (R) of each data pont s calculated based on the K-NN algorthm. The results are shown n the onlne supplementary materal Table B. If the R value of a data pont s larger than 0.85, ths data pont s regarded as an outler. Regardng the P L submodel, the weghted Eucldean dstance s computed by the TOC, T, and R o. After outler detecton, 10 data ponts are deleted, comprsng 9.9% of the orgnal data. Fnally, 91 data ponts reman to ft the P L submodel. As for V L, the weghted Eucldean dstance s calculated based on T and TOC. Accordng to R values, there are 16 data ponts deleted, takng up 8% of the orgnal data. Fnally, 184 data ponts reman to develop the V L submodel Model regresson The regresson submodels were developed based on the processed data. The model constructon process ncludes the followng steps. Frst, accordng to prevous qualtatve studes [23, 37-43], the selected varables are combned to construct functons, whch have clear physcal meanng and succnct form. Second, mathematcal adjustments are appled to varables to better descrbe the trend of these varables. Then, the P L and V L submodels are derved from the processed data, and the coeffcents are dentfed by ordnary least square method (OLS) based on a normal equaton [78]. Fnally, the multvarate regresson model between Langmur and geologcal parameters s bult based on the P L and V L submodels. Ths regresson model can be used for adsorbed gas estmaton. Based on the cleaned data and qualtatve relatons, the smplfed P L and V L submodels were constructed. A decay characterstc of Langmur parameters could be observed n the data scatter plot. Thus, logarthmc treatment s appled to the dependent varables. The adjusted P L and V L submodels are shown n Eq. (3) and (4), respectvely. TOC * represents dmensonless total organc TOC T carbon and s equal to. T * stands for. It s the dmensonless adsorpton temperature. 4% 48 Ro * R o means the dmensonless thermal maturty, whch s equal to. In ths model, 4%, 1.75% 48, and 1.75% are the average value of TOC, T, and R o, respectvely. T (3) * * ln PL ap TOC bp ln( ) c * P RO ln V = a TOC b T c (4) * *3 L V V V Usng 91 data ponts as fttng data, the coeffcents n the P L submodel were determned, whch are a p = , b p = 0.715, and c p = All of the varables are monotonously related to P L n the model. There s a postve correlaton between P L and T, a negatve correlaton between P L and TOC, and a negatve correlaton between P L and R o. These correlatons match the qualtatve analyses from prevous work [23, 37-43]. Fg. 3a shows the dstrbuton of P L data ponts and ts comparson wth the regresson model. The blue ponts represent the data for regresson, and the red stars are outlers detected by R value and the K-NN algorthm. Regardng

the V L submodel, the coeffcents were obtaned from fttng the regresson model, whch are: a v = 0.421, b v = -0.067, and c v = 0.563. Fg.

The regresson model fts the expermental data well by vsual comparson. As Fg. 3b shows, V L ncreases wth decreasng T and ncreasng TOC.

11 the V L submodel, the coeffcents were obtaned from fttng the regresson model, whch are: a v = 0.421, b v = , and c v = Fg. 3b shows the dstrbuton of V L data ponts and the outlers. It compares all of the V L data ponts wth the regresson model. The regresson model fts the expermental data well by vsual comparson. As Fg. 3b shows, V L ncreases wth decreasng T and ncreasng TOC. These results match the qualtatve relatonshps mentoned n prevous work [23, 37-43]. (a) (b) Fg. 3. Comparson of fttng data and outlers wth the regresson model: (a) PL data set and PL model; and (b) VL

12 data set and VL model. The blue ponts descrbe the dstrbuton of PL and VL data. The red stars represent the outlers n the model. The color grds show the regresson model. 4. Model Assessment 4.1. Model valdaton The vsual comparson n Fg. 3 s ntutve, but not precse. In order to obtan a relable and stable model, cross-valdaton s necessary for the P L and V L submodels. The leave-one-out method [28-32] s used for cross-valdaton n ths research. Ths method nvolves usng one observaton as the valdaton sample and takng the remanng data as the tranng set. The tranng set s utlzed to ft the valdaton model, and the relatve error s calculated by the valdaton sample. When mplementng the cross-valdaton, a vectorzng treatment can smplfy the calculaton. To vectorze the problem, three matrces are defned. Matrx X ncludes the orgnal data ponts, of whch each lne descrbes an expermental data pont, and each column corresponds to an ndependent varable. Matrx W s the coeffcent matrx of the regresson model. Matrx Y ncludes the values of the dependent varable of expermental data ponts. The P L submodel s taken as an example to ntroduce the vectorzng process. The x n (for =1, 2, 3 m) s assumed to be 1. Thus, the w n represents the constant term n the model. The relatonshp of these matrces s shown n Eq. (5). x1,1 x1, n w1 PL,1 x x w P m,1 m, n n L,m (5) The relatve error between the valdaton sample and the tranng set s the ndcator of the model qualty. In the cross-valdaton process, all of the data ponts should be treated as a valdaton sample once. The th data pont s taken as an example to llustrate the leave-one-out process. The functon to calculate the th relatve error s defned n Eq. (6). P L, s the expermental dependent varable of the th data pont, and P L, s the estmaton of the dependent varable of the th data pont. Ths relatve error s dfferent wth the weghted relatve error (R) n Eq. (2). The weghted relatve error (R) s based on K-NN algorsm and s only used n the outler detecton process. Eq. (7) presents the method to calculate P L,, where X s constructed wth all of the data except the th data pont. The detaled formula dervaton process s shown n Appendx B. The above process s repeated m tmes untl all of the m samples are consdered as valdaton data once. Fnally, the relatve errors of all samples are averaged to represent the accuracy of the P L and V L submodels. The confdental nterval s calculated, as well.

13 Error P P L, L, 100% (6) P L, PL,1 x1,1 x1,n P x x P x x T -1 T L,1 1,1 1,1 L,,1, n X X X where X PL,1 x 1,1 x 1,n P x x L,m m,1 m,1 (7) Regardng the P L submodel, all of the 91 data ponts are separately used as the valdaton data n the leave-one-out process. Fnally, the average relatve error wth the 90% confdental nterval s ± 3.54 %. Consderng the great varety of data sources, shale locatons and shale types, ths error s acceptable. The cross-valdaton result attests to the accuracy of the model. Furthermore, although the tranng data are dfferent n every valdaton test, all of the fttng models are smlar to the fnal regresson model, whch means that ths model s robust and not senstve to the tranng data. Based on the cross-valdaton results, Fg. 4 shows the dstrbuton of the expermental values, the estmated values, and the relatve error between the expermental and estmated data. Fg. 4a presents the scatter of the expermental values versus ther correspondng estmated values. The closer the dstrbuton of ponts s to the 45º dagonal, the better the estmaton. Fg. 4b s a normal Q-Q plot of the 91 average relatve errors. Ths fgure compares the probablty dstrbuton of the average relatve errors wth the normal dstrbuton by plottng ther quantles aganst each other. The reference lne represents the deal condton when the relatve errors obey the normal dstrbuton exactly. It s clear from Fg. 4 that the estmated values are smlar to the expermental ones, and the relatve errors are approxmately normally dstrbuted. Regardng the V L submodel, all of the 184 data ponts are used as the valdaton data. The relatve errors are calculated just lke for the P L submodel. Fnally, the average relatve error s 23.76%, wth ts 90% confdence nterval beng (21.43%, 26.09%). The confdence ntervals are calculated based on a fxed-sample-sze procedure [79]. Fg. 5a shows the scatter of the expermental values versus the estmated values, and Fg. 5b presents the Q-Q plot of the relatve errors versus the normal dstrbuton. As shown n Fg. 4b and Fg. 5b, there s a devaton between the expected normal values and average relatve errors when the expected normal value ncreases beyond 40%. Ths devaton s resulted from the followng two reasons. Frstly, the acceptance regon of Q-Q plot s not parallel to the reference lne. The acceptance regon becomes wder when the quantle gets closer to 0 or 1 (correspondng to the bottom left and upper rght of the Q-Q plot) [80, 81]. Secondly, the model has less data ponts wth large postve error compared wth the normal dstrbuton, whch results n the phenomena that expected values are above the reference lne on the upper rght. Ths actually means that the relatve error n the model s apt to be smaller than that n the normal dstrbuton when the error s above 40%. In addton, the 40% relatve error corresponds to the 91% quantle n standard normal dstrbuton. Thus, the devaton n the Q-Q plot actually only affects a small proporton of the data, and ths means that there are fewer postve errors n the model than the normal dstrbuton.

14 (a) (b) Fg. 4. Results of the PL submodel s cross-valdaton: (a) the scatter of the expermental value versus ts correspondng estmated value; and (b) the Q-Q plot of the dstrbuton of the relatve errors versus the normal dstrbuton. (a) (b) Fg. 5. Results of the VL submodel s cross-valdaton: (a) the scatter of the expermental value versus ts correspondng estmated value; and (b) the Q-Q plot of the dstrbuton of the relatve errors versus the normal dstrbuton Model comparson Some prevous work has already proposed a couple of P L and V L submodels based on geologcal parameters, as lsted n Table 4 [23, 44-46, 82]. However, the nadequate tranng data and the lack of ndependent varables affect the estmaton accuracy and applcablty of these models. Most of those studes tranng sets have less than 10 data ponts due to the tme-consumng adsorpton experments and the expensve corng process. Prevous models and the model developed n ths research were compared by calculatng the average relatve errors. The process s as follows. Frst, the tranng data are randomly extracted from the entre dataset to ft the regresson model. The proporton of the tranng data s 80% and 90% for the P L and V L submodels, respectvely. Then, the other 20% (for the P L submodel) and 10% (for the V L

15 submodel) of the entre data are used as the test data for the average relatve error calculaton to valdate the model. Snce the economcally attractve shale gas reservors always have relatvely hgh TOC and hgh R o, t s also essental to compare the dfferent models under these condtons. The extracton and valdaton process are performed 14 tmes for the P L submodels and 11 tmes for the V L submodels. The frst fve tmes for each model are of the entre dataset extracton, whch means that the test data are extracted randomly from all of the 91 data ponts for the P L submodels and from the 184 data ponts for the V L submodels. The next three tmes are for the hgh temperature scenaro, n whch the test data are stochastcally sampled from the data whose temperature s hgher than 65. There are also three comparsons belongng to the hgh TOC scenaro, n whch the test data are taken from the data whose TOC are hgher than 5%. The last three comparsons of the P L submodels are for the hgh R o scenaro, n whch the test data are extracted from the data wth R o hgher than 2%. Fnally, the fttng precsons of these models are compared based on the cross-valdaton results. Table 4 Dfferent PL and VL submodels from prevous studes Lterature [46] [82] [82] Model 1 ln P PL VL L a c T a TOC a TOC b b [23, 44-46] VL atoc b Concernng the P L submodels, 18 data ponts are extracted from the P L dataset as the test data. The valdaton process s repeated 14 tmes. The calculaton results are shown n Table 5. The overall average relatve error of the proposed P L submodel s 25.13%, whch s much less than the 35.90% and 40.04% from the exstng models. In the hgh temperature scenaro, the comparsons between the exstng and new models are ndependently conducted three tmes. Snce the shale gas reservors are generally deep, many reservors temperatures are hgher than 65 [83-87]. The average relatve errors of the exstng models are 47.77% and 42.56%, and that of the new model s 26.25% n the hgh temperature scenaro, ndcatng that the new model possesses an advantage over the prevous ones. Consderng real-world applcatons, the hgh TOC and hgh R o scenaros should be examned as well, because commercally attractve shale gas reservors always have hgh TOC and R o. The advantage of the new model s obvous n these scenaros. The average relatve errors decrease from 46.01% and 48.20% (the exstng models) to 27.52% (the new model) under the hgh TOC condton. In the hgh R o scenaro, the errors drop from 60.52% and 71.33% (the exstng models) to 33.60% (the new model). It s clear that the new model s average relatve errors are much less than the exstng models from the lterature, not only under the overall condton but also n the crcumstances of hgh temperature, hgh TOC, and hgh R o. Consderng real-world stuatons, the new P L submodel s more useful and accurate. Table 5 Relatve error comparson of the PL submodels n dfferent scenaros

16 Relatve Error (%) Test Number 1 ln P L a c T PL a TOC b ln T Ro * * PL atoc bln c * Test Test Test Test Test Average HghT HghT HghT Average HghTOC HghTOC HghTOC Average HghR o HghR o HghR o Average For the V L submodel, the new model s compared wth the exstng models n dfferent scenaros, as well. Table 6 shows the valdaton results. Accordng to Table 6, the average relatve error of the new model s 23.87%, whch s less than 25.42% and 25.79% for the exstng models, among the fve entre dataset extracton tests. Ths means that the new model s more accurate, n general. Furthermore, regardng the hgh temperature scenaros, the average relatve errors are 38.45% and 39.59% for the exstng models, and 23.34% for the new model. Ths means that the ntroducton of the temperature term to the model reduces the relatve error by more than 15%. Furthermore, the three tests wth hgh TOC ndcate that the new V L submodel has a smlar performance to the exstng models. As such, the new P L and V L submodels are statstcally better than the exstng models, especally under real-world condtons. Therefore, the proposed P L and V L submodels could be appled to Langmur parameters estmaton under real-world condtons. Table 6 Relatve error comparson of the VL submodels n dfferent scenaros Relatve Error (%) Test Number VL b a TOC L V atoc b ln L * *3 V atoc bt c Test Test

17 Test Test Test Average HghT HghT HghT Average HghTOC HghTOC HghTOC Average Geologcal-parameter-based estmaton model Snce the Langmur model s the most commonly used emprcal model to estmate adsorbed gas content, we propose a new model for estmatng the adsorbed gas content on the bass of the P L and V L submodels presented n the above. By pluggng Eq. (3) and (4) nto the Langmur model (Eq. (1)), the P L and V L terms can be substtuted by geologcal parameters, such as reservor temperature, TOC, and R o. All of these geologcal parameters can be determned wthout adsorpton experments. By replacng the Langmur parameters wth the geologcal parameters, the classcal Langmur model can be transformed to the new geologcal-parameter-based estmaton model n Eq. (8). Ths model only depends on the drectly measurable geologcal parameters. The adsorbed gas content can be easly estmated by the reservor depth h, TOC, R o, reservor pressure, and temperature. The regresson coeffcents, a P, b P, c P, a V, b V, and c V are from the P L and V L submodels. They are constants when the tranng database s fxed. * *3 exp( av TOC ) exp( bv T ) exp( cv ) * 1 * T bp ap TOC c * P O VL V = = PL 1 1 [exp( ) ( ) exp( )] P P R (8) Although the reservor temperature and pressure n Eq. (8) can be measured drectly, they can also be estmated when t s dffcult to determne them. The reservor temperature can be calculated by the reservor depth (h), the surface temperature and the temperature gradent (gradt), as shown n Eq. (9). In addton, the reservor pressure can be determned by the reservor depth and the pressure coeffcent (α) as Eq. (10). n s a normal unt vector, whch s dmensonless. The reservor sothermal surfaces are assumed to be horzontal, and thus the drecton of n s vertcally downward, whch s the same as the reservor depth s drecton. T s represents the surface

18 temperature,. P h means the hydrostatc pressure, or the pressure exerted by the column of water above the formaton, MPa. It can be estmated by the water densty and the reservor depth. α s the pressure coeffcent, whch s dmensonless. w represents water densty equal to 1, t/m³. g s the local acceleraton of gravty equal to 9.8, N/kg. T T h gradt Ts h n T n s (9) P P g h 9.8 h (10) h w 6. Case study: Adsorbed gas content estmaton of nne shale gas reservors To estmate the adsorbed gas content n dfferent reservors, the values of the geologcal parameters n dfferent reservors should be determned accordngly. In ths study, the followng reservors n Chna, Germany, and the U.S.A. are examned: the Schuan Basn, the Yangtze Platform, the Songlao Basn, the Ordos Basn, the Tarm Basn, the Northern Jangsu Basn, the Marcellus Shale, the Barnett Shale, and the Posdona Shale. The Marcellus Shale and the Barnett Shale are from the U.S.A., and the Posdona Shale s from Germany. The remanng sx reservors are n Chna. The data of TOC and R o were collected from prevous work, whch are shown n Table 7. Fg. 6 presents the dstrbuton of TOC and R o data for dfferent reservors. It s clear from Fg. 6a that both the mean and medan of TOC of all basns are hgher than 2%, whch s the commercal development threshold [13, 65]. Fg. 6b shows that, except for the Northern Jangsu Basn, the Songlao Basn, the Barnett Shale, and the Posdona Shale, the mean and medan of other reservors R o are hgher than 1.3%. The R o values ndcate that these reservors are relatvely mature and have passed the ol wndow [88]. Regardng the Yangtze Platform, the Schuan Basn, and the Marcellus Shale, ther R o values exceed 2%, whch ndcates that they are already n a dry gas wndow [13, 70, 89, 90]. The R o dstrbuton mples that the organc matter of these shale gas reservors s hghly matured, and ths explans why these shale reservors manly produce shale gas, but not shale ol. Concernng the reservor temperature, the Global Heat Flow Database provded by the Internatonal Heat Flow Commsson [91] contans temperature gradent data from all around the world. The temperature gradent data wthn Chna, Germany, and the U.S.A. have 667, 254, and 4249 data ponts, respectvely. The nverse dstance weghtng (IDW) nterpolaton method [92-94] s used to calculate reservor temperature snce temperature s a contnuous varable underground. Ths nterpolaton method assumes that the value of an unsampled pont s the weghted average of ts neghbor samples. In addton, the weght s nversely related to the dstance between the unsampled pont and ts neghbor. The temperature dstrbuton of each reservor s calculated n Appendx C, and the average temperature s used as the reservor temperature n the followng estmaton. Table 7 TOC and Ro data szes for dfferent shale reservors [48, ]

19 Reservor Schuan Yangtze Songlao Ordos Tarm Northern Marcellus Barnett Posdona Basn Platform Basn Basn Basn Jangsu Basn Shale Shale Shale Amount of TOC data Amount of Ro data (a) (b) Fg. 6. Dstrbuton of the TOC and Ro data for the Schuan Basn, the Yangtze Platform, the Songlao Basn, the Ordos Basn, the Tarm Basn, the Northern Jangsu Basn, the Marcellus Shale, the Barnett Shale, and the Posdona Shale: (a) the TOC box plot of dfferent reservors; and (b) the Ro box plot of dfferent reservors. The astersks represent outlers that are out of the 1.5 IQR (nterquartle range). The blank squares correspond to the means of TOC and Ro n dfferent reservors. The geologcal parameters are evaluated separately, and the results are shown n Table 8. To calculate the reservor pressure, each reservor pressure coeffcent s assumed to be 1. The surface temperature s assumed to be 20 when calculatng the reservor temperature n ths study. Fnally, the adsorbed gas contents are estmated by pluggng the geologcal parameters nto the geologcal-parameter-based estmaton model (Eq. (8)). The results are shown n the last column of Table 8 and Fg. 7. The observatons of adsorbed gas content n the Schuan Basn, the Ordos Basn, and the Marcellus Shale are shown n Fg. 7, as well. Owng to the hgh cost of adsorpton experments, there s a lack of expermental data about adsorbed gas contents n most reservors. However, the Schuan Basn, the Ordos Basn, and the Marcellus Shale are relatvely developed, and some researchers have estmated the adsorbed gas contents. Thus, ther data can serve as a reference used to compare the estmaton results n ths study. Accordng to prevous studes, the adsorbed gas content n the Schuan Basn ranges from 1.12 m³/t to 1.74 m³/t, wth the average of 1.28 m³/t [114]. The estmaton result n ths work s 1.31 m³/t, whch s consstent wth prevous work. Regardng the Ordos Basn, researchers have evaluated the adsorbed gas content n Chang 7 Member and Chang 9 Member of the Yanchang Formaton, whch are m³/t, wth the mean of 1.67 m³/t and m³/t wth the average of 1.32 m³/t, respectvely [115]. These results confrmed the estmaton n ths research, n whch the adsorbed gas content s 1.47 m³/t n the Ordos Basn. As for the Marcellus Shale, researchers have shown n a prevous study that the adsorbed gas content ranges from 0.85 m 3 /t to 1.4 m 3 /t [101]. The estmaton of our model s 1.24

20 m 3 /t, whch confrms wth the observaton n the prevous study. It s thus shown that the expermental results from prevous studes valdated the estmaton n the Schuan Basn, the Ordos Basn, and the Marcellus Shale from ths research. Moreover, the estmaton procedure s totally ndependent of adsorpton experments n ths study, whch reduces the estmaton cost and ncreases effcency. Table 8 Geologcal parameters and the estmated adsorbed gas contents n Chna shale gas reservors Reservor Depth (m) TOC (%) Ro (%) Reservor Temperature ( ) Reservor Pressure (MPa) Adsorbed Gas Content (m³/t) Schuan Basn Yangtze Platform Songlao Basn Ordos Basn Tarm Basn Northern Jangsu Basn Marcellus Shale Barnett Shale Posdona Shale Fg. 7. Comparson of adsorbed gas content estmatons and the observatons n nne reservors. The black squares ndcate the estmaton values n nne reservors. The floatng chart shows the dstrbutons of adsorbed gas content

21 observatons n the Marcellus Shale, the Schuan Basn, and the Ordos Basn. Regardng the Ordos Basn, the red column n the floatng chart descrbes the observatons n the Chang 9 Member, and the blue column represents the Chang 7 Member. 7. Dscusson and Concluson In ths study, statstcal learnng methods were utlzed to construct an adsorbed gas content estmaton model based on geologcal parameters. Statstcal learnng s effectve n fndng a predctve functon, and t can elucdate hdden patterns and unknown correlatons between shale gas content and other varables. In the model constructon process, 301 data ponts are collected from dfferent studes, experment reports, and databases. The outlers are detected by the K-NN algorthm, and the model performance s valdated by the leave-one-out cross-valdaton. Moreover, the usage of geologcal parameters makes t possble to estmate adsorbed gas content wthout conductng tme-consumng adsorpton experments and an expensve corng process. Thus, the geologcal-parameter-based estmaton model s effcent wth a relatvely low cost. Regardng varable selecton of the estmaton model, more varables may be consdered n the model, as long as there are enough data to avod overfttng. Consderng the data avalablty, we only take T, TOC, R o, and porosty as canddate varables n ths paper. These four varables are also the most commonly used varables n the prevous studes. Among the other varables, the clay content s mportant snce the adsorpton capactes of clay and kerogen are dfferent. Lu et al. proposed a B-Langmur model that can account for gas adsorpton on both clay mnerals and kerogen [116, 117]. Although our model can be extended to consder clay content wthout any change n the regresson model, the clay content data are not currently wdely avalable. It should be mentoned that ths model s not a complete substtuton for adsorpton experments. However, ths model s a good choce for the overall approxmate estmaton of gas content snce t can reduce the estmaton cost and ncrease estmaton effcency. Ths model also makes t possble to estmate the adsorbed gas content at a large scale, such as a whole reservor. Ths model can also be appled to an approxmate estmaton for a sngle well at a mcro scale when no ste-specfc measurements have been made. When the exact adsorbed gas content of a sample s expected, adsorpton experments stll consttute the optmal choce wth the avalablty of adequate equpment and suffcent tme. The geologcal-parameter-based estmaton model s constructed by substtutng the P L and V L terms n the classcal Langmur model by two submodels. Compared wth the exstng P L and V L submodels, the new submodels proposed n ths research possess an advantage under real-world condtons. Regardng the P L submodel, the estmaton errors decrease to nearly half of that of the exstng models n the hgh temperature scenaro, hgh TOC scenaro, and hgh R o scenaro. The overall average relatve error from cross-valdaton s ± 3.54 % wth a 90% confdental nterval. As for the V L submodel, the estmaton relatve error decreases apparently n the hgh temperature scenaro, and s smlar to the exstng models n the hgh TOC scenaro. The overall average relatve error s 23.76%, wth ts 90% confdence nterval beng (21.43%, 26.09%). Because most promsng reservors have hgh TOC and hgh Ro, the new submodels perform better under real-world condtons. Fnally, a geologcal-parameter-based estmaton model s developed to estmate adsorbed gas content n shale gas reservors, and t s appled to the Schuan Basn, the Yangtze Platform, the

Chapter 13: Multiple Regression

Chapter 13: Multiple Regression Chapter 13: Multple Regresson 13.1 Developng the multple-regresson Model The general model can be descrbed as: It smplfes for two ndependent varables: The sample ft parameter b 0, b 1, and b are used to