Optimal Subsampling for Large Sample Logistic Regression

Size: px
Start display at page:

Download "Optimal Subsampling for Large Sample Logistic Regression"

Transcription

1 Optimal Subsamplig fo Lage Sample Logistic Regessio Abstact Fo massive data, the family of subsamplig algoithms is popula to dowsize the data volume ad educe computatioal bude. Existig studies focus o appoximatig the odiay least squaes estimate i liea egessio, whee statistical leveage scoes ae ofte used to defie subsamplig pobabilities. I this pape, we popose fast subsamplig algoithms to efficietly appoximate the maximum likelihood estimate i logistic egessio. We fist establish cosistecy ad asymptotic omality of the estimato fom a geeal subsamplig algoithm, ad the deive optimal subsamplig pobabilities that miimize the asymptotic mea squaed eo of the esultat estimato. A alteative miimizatio citeio is also poposed to futhe educe the computatioal cost. The optimal subsamplig pobabilities deped o the data estimate, so we develop a two-step algoithm to appoximate the optimal subsamplig pocedue. This algoithm is computatioally efficiet ad has a sigificat eductio i computig time compaed to the data appoach. Cosistecy ad asymptotic omality of the estimato fom a two-step algoithm ae also established. Sythetic ad eal data sets ae used to evaluate the pactical pefomace of the poposed method. Keywods: A-optimality; Logistic Regessio; Massive Data; Optimal Subsamplig; Rae Evet.

2 Itoductio With the apid developmet of sciece ad techologies, massive data have bee geeated at a extaodiay speed. Upecedeted volumes of data offe eseaches both upecedeted oppotuities ad challeges. The key challege is that diectly applyig statistical methods to these supe-lage sample data usig covetioal computig methods is pohibitive. We shall ow peset two motivatig examples. Example. Cesus. The U.S. cesus systematically acquies ad ecods data of all esidets of the Uited States. The cesus data povide fudametal ifomatio to study socio-ecoomic issues. Kohavi (996) coducted a classificatio aalysis usig esidets ifomatio such as icome, age, wok class, educatio, the umbe of wokig hous pe week, ad etc. They used these ifomatio to pedict whethe the esidets ae high icome esidets, i.e., those with aal icome moe tha $50K, o ot. Give that the whole cesus data is supe-lage, the computatio of statistical aalysis is vey difficult. Example. Supesymmetic Paticles. Physical expeimets to ceate exotic paticles that occu oly at extemely high eegy desities have bee caied out usig mode acceleatos. e.g., lage Hado Collide (LHC). Obsevatios of these paticles ad measuemets of thei popeties may yield citical isights about the fudametal popeties of the physical uivese. Oe paticula example of such exotic paticles is supesymmetic paticles, the seach of which is a cetal scietific missio of the LHC (Baldi et al., 0). Statistical aalysis is cucial to distiguish collisio evets which poduce supesymmetic paticles (sigal) fom those poducig othe paticles (backgoud). Sice LHC cotiuously geeates petabytes of data each yea, the computatio of statistical aalysis is vey challegig. The above motivatig examples ae classificatio poblems with massive data. Logistic egessio models ae widely used fo classificatio i may disciplies, icludig busiess, compute sciece, educatio, ad geetics, amog othes (Hosme J et al., 0). Give covaiates x i s R d, logistic egessio models ae of the fom P (y i = x i ) = p i (β) = exp(xt i β) i =,,...,, () + exp(x T i β), whee y i s {0, } ae the esposes ad β is a d vecto of ukow egessio coefficiets belogig to a compact subset of R d. The ukow paamete β is ofte estimated by the maximum likelihood estimato (MLE) though maximizig the log-likelihood fuctio with espect to β, amely, ˆβ MLE = ag max β l(β) = ag max β [ yi log p i (β) + ( y i ) log{ p i (β)} ]. () i= Aalytically, thee is o geeal closed-fom solutio to the MLE ˆβ MLE, ad iteative pocedues ae ofte adopted to fid it umeically. A commoly used iteative pocedue is Newto s method. Specifically fo logistic egessio, Newto s method iteatively applies the followig fomula util ˆβ (t+) coveges. { ˆβ (t+) = ˆβ (t) (ˆβ(t) ) + w i x i x T i i= } (ˆβ(t) ) l β,

3 whee w i (β) = p i (β){ p i (β)}. Sice it equies O(d ) computig time i each iteatio, the optimizatio pocedue takes O(ζd ) time, whee ζ is the umbe of iteatios equied fo the optimizatio pocedue to covege. Oe commo featue of the two motivatig examples is thei supe-lage sample size. Fo such supe-lage sample poblems, the computig time O(d ) fo a sigle u may be too log to affod, let alog to calculate it iteatively. Theefoe, computatio is a bottleeck fo the applicatio of logistic egessio o massive data. Whe pove statistical methods ae o loge applicable due to limited computig esouces, a popula method to extact useful ifomatio fom data is the subsamplig method (Dieas et al., 006; Mahoey ad Dieas, 009; Dieas et al., 0). This appoach uses the estimate based o a subsample that is take adomly fom the data to appoximate the estimate fom the data. It is temed algoithmic leveagig i Ma et al. (0, 05) because the empiical statistical leveage scoes of the iput covaiate matix ae ofte used to defie the ouifom subsamplig pobabilities. Thee ae umeous vaiats of subsamplig algoithms to solve the odiay least squaes (OLS) i liea egessio fo lage data sets, see Dieas et al. (006, 0); Ma et al. (0, 05); Ma ad Su (05), amog othes. Aothe stategy is to use adom pojectios of data matices to fast appoximate the OLS estimate, which was studied i Rokhli ad Tyget (008), Dhillo et al. (0), Clakso ad Wooduff (0) ad McWilliams et al. (0). The afoemetioed appoaches have bee ivestigated exclusively withi the cotext of liea egessio, ad available esults ae maily o algoithmic popeties. Fo logistic egessio, Owe (007) deived iteestig asymptotic esults fo ifiitely imbalaced data sets. Kig ad Zeg (00) ivestigated the poblem of ae evets data. Fithia ad Hastie (0) poposed a efficiet local case-cotol (LCC) subsamplig method fo imbalaced data sets, i which the method was motivated by balacig the subsample. I this pape, we focus o appoximatig the data MLE usig a subsample, ad ou method is motivated by miimizig the asymptotic mea squaed eo () of the esultat subsample-estimato give the data. We igoously ivestigate the statistical popeties of the geeal subsamplig estimato ad obtai its asymptotic distibutio. Moe impotatly, usig this asymptotic distibutio, we deive optimal subsamplig methods motivated fom the A-optimality citeio (OSMAC) i the theoy of optimal expeimetal desig. I this pape, we have two majo cotibutios fo theoetical ad methodological developmets i subsamplig fo logistic egessio with massive data:. Chaacteizatios of optimal subsamplig. Most wok o subsamplig algoithms (ude the cotext of liea egessio) focuses o algoithmic issues. Oe exceptio is the wok by Ma et al. (0, 05), i which expected values ad vaiaces of estimatos fom algoithmic leveagig wee expessed appoximately. Howeve, thee was o pecise theoetical ivestigatio o whe these appoximatios hold. I this pape, we igoously pove that the esultat estimato fom a geeal subsamplig algoithm is cosistet to the data MLE, ad establish the asymptotic omality of the esultat estimato. Futhemoe, fom the asymptotic distibutio, we deive the optimal subsamplig method that miimizes the asymptotic o a weighted vesio of the asymptotic.. A ovel two-step subsamplig algoithm. The OSMAC that miimizes the asymptotic

4 s depeds o the data MLE ˆβ MLE, so the theoetical chaacteizatios do ot immediately taslate ito good algoithms. We popose a ovel two-step algoithm to addess this issue. The fist step is to detemie the impotace scoe of each data poit. I the secod step, the impotace scoes ae used to defie ouifom subsamplig pobabilities to be used fo samplig fom the data set. We pove that the estimato fom the two-step algoithm is cosistet ad asymptotically omal with the optimal asymptotic covaiace matix ude some optimality citeio. The twostep subsamplig algoithm us i O(d) time, wheeas the data MLE typically equies O(ζd ) time to u. This impovemet i computig time is much moe sigificat tha that obtaied fom applyig the leveage-based subsamplig algoithm to solve the OLS i liea egessio. I liea egessio, compaed to a data OLS which equies O(d ) time, the leveage-based algoithm with appoximate leveage scoes (Dieas et al., 0) equies O(d log /ε ) time with ε (0, /], which is o(d ) fo the case of log = o(d). The emaide of the pape is ogaized as follows. I sectio, we coduct a theoetical aalyses of a geeal subsamplig algoithm fo logistic egessio. I sectio, we develop optimal subsamplig pocedues to appoximate the MLE i logistic egessio. A two-step algoithm is developed i sectio to appoximate these optimal subsamplig pocedues, ad its theoetical popeties ae studied. The empiical pefomace of ou algoithms is evaluated by umeical expeimets o sythetic ad eal data sets i Sectios 5. Sectio 6 summaizes the pape. Techical poofs fo the theoetical esults, as well as additioal umeical expeimets ae give i the Supplemetay Mateials. Geeal Subsamplig Algoithm ad its Asymptotic Popeties I this sectio, we fist peset a geeal subsamplig algoithm fo appoximatig ˆβ MLE, ad the establish the cosistecy ad asymptotic omality of the esultat estimato. Algoithm descibes the geeal subsamplig pocedue. Now, we ivestigate asymptotic popeties of this geeal subsamplig algoithm, which povide guidace o how to develop algoithms with bette appoximatio qualities. Note that i the two motivatig examples, the sample sizes ae supe-lage, but the umbes of pedictos ae ulikely to icease eve if the sample sizes futhe icease. We assume that d is fixed ad. Fo easy of discussio, we assume that x i s ae idepedet ad idetically distibuted (i.i.d) with the same distibutio as that of x. The case of oadom x i s is peseted i the Supplemetay Mateials. To facilitate the pesetatio, deote the data matix as F = (X, y), whee X = (x, x,..., x ) T is the covaiate matix ad y = (y, y,..., y ) T is the vecto of esposes. Thoughout the pape, v deotes the Euclidea om of a vecto v, i.e., v = (v T v) /. We eed the followig assumptios to establish the fist asymptotic esult. Assumptio. As, M X = i= w i(ˆβ MLE )x i x T i goes to a positive-defiite matix i pobability ad i= x i = O P ().

5 Algoithm Geeal subsamplig algoithm Samplig: Assig subsamplig pobabilities π i, i =,,..., fo all data poits. Daw a adom subsample of size ( ), accodig to the pobabilities {π i } i=, fom the data. Deote the covaiates, esposes, ad subsamplig pobabilities i the subsample as x i, y i, ad π i, espectively, fo i =,,...,. Estimatio: Maximize the followig weighted log-likelihood fuctio to get the estimate β based o the subsample. l (β) = [yi log p i (β) + ( yi ) log{ p i (β)}], i= π i whee p i (β) = exp(β T x i )/{ + exp(β T x i )}. Due to the covexity of l (β), the maximizatio ca be implemeted by Newto s method, i.e., iteatively applyig the followig fomula util β (t+) ad β (t) ae close eough, { β (t+) = β (t) wi ( β(t)) x + i (x i ) T i= π i } { y i p i π i= i ( β(t))} x i, () whee w i (β) = p i (β){ p i (β)}. Assumptio. i= π i x i k = O P () fo k =,. Assumptio imposes two coditios o the covaiate distibutio ad this assumptio holds if E(xx T ) is positive defiite ad E x <. Assumptio is a coditio o both subsamplig pobabilities ad the covaiate distibutio. Fo uifom subsamplig with π i =, a sufficiet coditio fo this assumptio is that E x <. The theoem below pesets the cosistecy of the estimato fom the subsamplig algoithm to the data MLE. Theoem. If assumptios ad hold, the as ad, β is cosistet to ˆβ MLE i coditioal pobability, give F i pobability. Moeove, the ate of covegece is /. That is, with pobability appoachig oe, fo ay ɛ > 0, thee exists a fiite ɛ ad ɛ such that P ( β ˆβ MLE / ɛ F ) < ɛ () fo all > ɛ. The cosistecy esult shows that the appoximatio eo ca be made as small as possible by a lage eough subsample size, as the appoximatio eo is at the ode of O P F ( / ). Hee the pobability measue i O P F ( ) is the coditioal measue give F. This esult has some similaity to the fiite-sample esult of the wost-case eo boud fo aithmetic leveagig i liea egessio (Dieas et al., 0), but eithe of them gives the distibutio of the appoximatio eo. 5

6 Besides cosistecy, we deive the asymptotic distibutio of the appoximatio eo, ad pove that the appoximatio eo, β ˆβ MLE, is asymptotically omal. To obtai this esult, we eed a additioal assumptio below, which is equied by the Lidebeg-Felle cetal limit theoem. Assumptio. Thee exists some δ > 0 such that (+δ) i= π δ i x i +δ = O P (). The afoemetioed thee assumptios ae essetially momet coditios ad ae vey geeal. Fo example, a sub-gaussia distibutio (Buldygi ad Kozacheko, 980) has fiite momet geeatig fuctio o R ad thus has fiite momets up to ay fiite ode. If the distibutio of each compoet of x belogs to the class of sub-gaussia distibutios ad the covaiace matix of x is positive-defiite, the all the coditios ae satisfied by the subsamplig pobabilities cosideed i this pape. The esult of asymptotic omality is peseted i the followig theoem. Theoem. If assumptios,, ad hold, the as ad, coditioal o F i pobability, V / ( β ˆβ MLE ) N(0, I) (5) i distibutio, whee ad V = M X V cm X = O p( ) (6) V c = i= {y i p i (ˆβ MLE )} x i x T i π i. (7) Remak. Note that i Theoems ad we ae appoximatig the data MLE, ad the esults hold fo the case of ovesamplig ( > ). Howeve, this sceaio is ot pactical because it is moe computatioally itese tha usig the data. Additioally, the distace betwee ˆβ MLE ad β 0, the tue paamete, is at the ode of O P ( / ). Ovesamplig does ot esult i ay gai i tems of estimatig the tue paamete. Fo afoemetioed easos, the sceaio of ovesamplig is ot of ou iteest ad we focus o the sceaio that is much smalle tha, typically, o / 0. Result (5) shows that the distibutio of β ˆβ MLE give F ca be appoximated by that of u, a omal adom vaiable with distibutio N(0, V). I othe wods, the pobability P ( / β ˆβ MLE F ) ca be appoximated by P ( / u F ) fo ay. To facilitate the discussio, we wite esult (5) as β ˆβ MLE F a u, (8) whee a meas the distibutios of the two tems ae asymptotically the same. This esult is moe statistically ifomative tha a wost-case eo boud fo the appoximatio eo β ˆβ MLE. Moeove, this esult gives diect guidace o how to educe the appoximatio eo while a eo boud does ot, because a smalle boud does ot ecessaily mea a smalle appoximatio eo. Although the distibutio of β ˆβ MLE give F ca be appoximated by that of u, this does ot ecessaily imply that E( β ˆβ MLE F ) is close to E( u F ). E( u F ) 6

7 is a asymptotic (A) of β ad it is always well defied. Howeve, igoously speakig, E( β ˆβ MLE F ), o ay coditioal momet of β, is udefied, because thee is a ozeo pobability that β based o a subsample does ot exist. The same poblem exists i subsamplig estimatos fo the OLS i liea egessio. To addess this issue, we defie β to be 0 whe the MLE based o a subsample does ot exist. Ude this defiitio, if β is uifomly itegable ude the coditioal measue give F, / {E( β ˆβ MLE F ) E( u F )} 0 i pobability. Results i Theoems ad ae distibutioal esults coditioal o the obseved data, which fulfill ou pimay goal of appoximatig the data MLE ˆβ MLE. Coditioal ifeece is quite commo i statistics, ad the most popula method is the Bootstap (Efo, 979; Efo ad Tibshiai, 99). The Bootstap (opaametic) is the uifom subsamplig appoach with subsample size equalig the data sample size. If π i = / ad =, the esults i Theoems ad educe to the asymptotic esults fo the Bootstap. Howeve, the Bootstap ad the subsamplig method i the pape have vey distict goals. The Bootstap focuses o appoximatig complicated distibutios ad ae used whe explicit solutios ae uavailable, while the subsamplig method cosideed hee has a pimay motivatio to achieve feasible computatio ad is used eve closed-fom solutios ae available. Optimal Subsamplig Stategies To implemet Algoithm, oe has to specify the subsamplig pobability (SSP) π = {π i } i= fo the data. A easy choice is to use the uifom SSP π UNI = {π i = } i=. Howeve, a algoithm with the uifom SSP may ot be optimal ad a ouifom SSP may have a bette pefomace. I this sectio, we popose moe efficiet subsamplig pocedues by choosig ouifom π i s to miimize the asymptotic vaiace-covaiace matix V i (6). Howeve, sice V is a matix, the meaig of miimize eeds to be defied. We adopt the idea of the A-optimality fom optimal desig of expeimets ad use the tace of a matix to iduce a complete odeig of the vaiace-covaiace matices (Kiefe, 959). It tus out that this appoach is equivalet to miimizig the asymptotic of the esultat estimato. Sice this optimal subsamplig pocedue is motivated fom the A-optimality citeio, we call ou method the OSMAC.. Miimum Asymptotic of β Fom the esult i Theoem, the asymptotic of β is equal to the tace of V, amely, A( β) = E( u F ) = t(v). (9) Fom (6), V depeds o {π i } i=, ad clealy, {π i = } i= may ot poduce the smallest value of t(v). The key idea of optimal subsamplig is to choose ouifom SSP such that the A( β) i (9) is miimized. Sice miimizig the tace of the (asymptotic) vaiacecovaiace matix is called the A-optimality citeio (Kiefe, 959), the esultat SSP is A-optimal i the laguage of optimal desig. The followig theoem gives the A-optimal SSP that miimizes the asymptotic of β. 7

8 Theoem. I Algoithm, if the SSP is chose such that π i = y i p i (ˆβ MLE ) M X x i j= y j p j (ˆβ MLE ) M X x, i =,,...,, (0) j the the asymptotic of β, t(v), attais its miimum. As obseved i (0), the optimal SSP π = {πi } i= depeds o data though both the covaiates ad the esposes diectly. Fo the covaiates, the optimal SSP is lage fo a lage M X x i, which is the squae oot of the ith diagoal elemet of the matix XM X XT. The effect of the esposes o the optimal SSP depeds o discimiatio difficulties though the tem y i p i (ˆβ MLE ). Iteestigly, if the data MLE ˆβ MLE i y i p i (ˆβ MLE ) is eplace by a pilot estimate, the this tem is exactly the same as the pobability i the local casecotol (LCC) subsamplig pocedue i dealig with imbalaced data (Fithia ad Hastie, 0). Howeve, Poisso samplig ad uweighted MLE wee used i the LCC subsamplig pocedue. To see the effect of the esposes o the optimal SSP, let S 0 = {i : y i = 0} ad S = {i : y i = }. The effect of p i (ˆβ MLE ) o πi is positive fo the S 0 set, i.e. a lage p i (ˆβ MLE ) esults i a lage πi, while the effect is egative fo the S set, i.e. a lage p i (ˆβ MLE ) esults i a smalle πi. The optimal subsamplig appoach is moe likely to select data poits with smalle p i (ˆβ MLE ) s whe y i s ae ad data poits with lage p i (ˆβ MLE ) s whe y i s ae 0. Ituitively, it attempts to give pefeeces to data poits that ae moe likely to be mis-classified. This ca also be see i the expessio of t(v). Fom (6) ad (7), t(v) =t(m X V cm X [ ) ] = t {y i p i (ˆβ MLE )} M X x ix T i M X π i = i= i= {y i p i (ˆβ MLE )} t(m X x ix T i M X ) π i = {p i (ˆβMLE)} MX x i + π i i S 0 { p i (ˆβMLE)} MX x i. π i i S Fom the above equatio, a lage value of p i (ˆβ MLE ) esults i a lage value of the summatio fo the S 0 set, so a lage value is assiged to π i to educe this summatio. O the othe had fo the S set, a lage value of p i (ˆβ MLE ) esults i a smalle value of the summatio, so a smalle value is assiged to π i. The optimal subsamplig appoach also echos the esult i Silvapulle (98), which gave a ecessay ad sufficiet coditio fo the existece of the MLE i logistic egessio. To see this, let { } F 0 = k i x i k i > 0 i S 0 { } ad F = k i x i k i > 0. i S 8

9 Hee, F 0 ad F ae covex coes geeated by covaiates i the S 0 ad the S sets, espectively. Silvapulle (98) showed that the MLE i logistic egessio is uiquely defied if ad oly if F 0 F φ, whee φ is the empty set. Fom Theoem II i Dies (96), F 0 F φ if ad oly if thee does ot exist a β such that x T i β 0 fo all i S 0, x T i β 0 fo all i S, () ad at least oe stict iequality holds. The statemet i () is equivalet to the followig statemet i () below. p i (β) 0.5 fo all i S 0, p i (β) 0.5 fo all i S. () This meas if thee exist a β such that {p i (β), i S 0 } ad {p i (β), i S } ca be sepaated, the the MLE does ot exist. The optimal subsamplig SSP stives to icease the ovelap of these two sets i the diectio of p i (ˆβ MLE ). Thus it deceases the pobability of the sceaio that the MLE does ot exist based o a esultat subsample.. Miimum Asymptotic of M X β The optimal SSPs deived i the pevious sectio equie the calculatio of M X x i fo i =,,...,, which takes O(d ) time. I this sectio, we popose a modified optimality citeio, ude which calculatig the optimal SSPs equies less time. To motivate the optimality citeia, we eed to defie the patial odeig of positive defiite matices. Fo two positive defiite matices A ad A, A A if ad oly if A A is a oegative defiite matix. This defiitio is called the Loewe-odeig. Note that V = M X V cm X i (6) depeds o π though V c i (7), ad M X does ot deped o π. Fo two give SSPs π () ad π (), V(π () ) V(π () ) if ad oly if V c (π () ) V c (π () ). This gives us guidace to simplify the optimality citeio. Istead of focusig o the moe complicated matix V, we defie a alteative optimality citeio by focusig o V c. Specifically, istead of miimizig t(v) as i Sectio., we choose to miimize t(v c ). The pimay goal of this alteative optimality citeio is to futhe educe the computig time. The followig theoem gives the optimal SSP that miimizes the tace of V c. Theoem. I Algoithm, if the SSP is chose such that π i = y i p i (ˆβ MLE ) x i j= y, i =,,...,, () j p j (ˆβ MLE ) x j the t(v c ), attais its miimum. It tus out that the alteative optimality citeio ideed geatly educes the computig time. Fom Theoem, the effect of the covaiates o π = {πi } i= is peseted by x i, istead of M X x i as i π. The computatioal beefit is obvious: it equies O(d) time to calculate x i fo i =,,...,, which is sigificatly less tha the equied O(d ) time to calculate M X x i fo i =,,...,. 9

10 Besides the computatioal beefit, this alteative citeio also ejoys ice itepetatios fom the followig aspects. Fist, the tem y i p i (ˆβ MLE ) fuctios the same as i the case of π. Hece all the ice itepetatios ad popeties elated to this tem fo π i Sectio. ae tue fo π i Theoem. Secod, fom (8), M X ( β ˆβ MLE ) F a M X u, whee M X u N(0, V c ) give F. This shows that t(v c ) = E( M X u F ) is the A of M X β i appoximatig MX ˆβMLE. Theefoe, the SSP π is optimal i tems of miimizig the A of M X β. Thid, the alteative citeio also coespods to the commoly used liea optimality (L-optimality) citeio i optimal expeimetal desig (c.f. Chapte 0 of Atkiso et al., 007). The L- optimality citeio miimizes the tace of the poduct of the asymptotic vaiace-covaiace matix ad a costat matix. Its aim is to impove the quality of pedictio i liea egessio. Fo ou poblem, ote that t(v c ) = t(m X VM X ) = t(vm X ) ad V is the asymptotic vaiace-covaiace matix of β, so the SSP π is L-optimal i the laguage of optimal desig. Two-Step Algoithm The SSPs i (0) ad () deped o ˆβ MLE, which is the data MLE to be appoximated, so a exact OSMAC is ot applicable diectly. We popose a two-step algoithm to appoximate the OSMAC. I the fist step, a subsample of 0 is take to get a pilot estimate of ˆβ MLE, which is the used to appoximate the optimal SSPs fo dawig the moe ifomative secod step subsample. The two-step algoithm is peseted i Algoithm. Algoithm Two-step Algoithm Step : Ru Algoithm with subsample size 0 to obtai a estimate β 0, usig eithe the uifom SSP π UNI = { } i= o SSP {π pop i } i=, whee π pop i = ( 0 ) if i S 0 ad π pop i = ( ) if i S. Hee, 0 ad ae the umbes of elemets i sets S 0 ad S, espectively. Replace ˆβ MLE with β 0 i (0) o () to get a appoximate optimal SSP coespodig to a chose optimality citeio. Step : Subsample with eplacemet fo a subsample of size with the appoximate optimal SSP calculated i Step. Combie the samples fom the two steps ad obtai the estimate β based o the total subsample of size 0 + accodig to the Estimatio step i Algoithm. Remak. I Step, fo the S 0 ad S sets, diffeet subsamplig pobabilities ca be specified, each of which is equal to half of the ivese of the set size. The pupose is to balace the umbes of 0 s ad s i the esposes fo the subsample. If the data is vey imbalaced, the pobability that the MLE exists fo a subsample obtaied usig this appoach is highe tha that fo a subsample obtaied usig uifom subsamplig. This 0

11 pocedue is called the case-cotol samplig (Scott ad Wild, 986; Fithia ad Hastie, 0). If the popotio of s is close to 0.5, the uifom SSP is pefeable i Step due to its simplicity. Remak. As show i Theoem, β 0 fom Step appoximates ˆβ MLE accuately as log as 0 is ot too small. O the othe had, the efficiecy of the two-step algoithm would decease, if 0 gets close to the total subsample size 0 + ad is elatively small. We will eed 0 to be a small tem compaed with /, i.e., 0 = o( / ), i ode to pove the cosistecy ad asymptotically optimality of the two-step algoithm i Sectio.. Algoithm geatly educes the computatioal cost compaed to usig the data. The majo computig time is to appoximate the optimal SSPs which does ot equie iteative calculatios o the data. Oce the appoximately optimal SSPs ae available, the time to obtai β i the secod step is O(ζd ) whee ζ is the umbe of iteatios of the iteative pocedue i the secod step. If the S 0 ad S sets ae ot sepaated, the time to obtai β 0 i the fist step is O( + ζ 0 0 d ) whee ζ 0 is the umbe of iteatios of the iteative pocedue i the fist step. To calculate the estimated optimal SSPs, the equied times ae diffeet fo diffeet optimal SSPs. Fo πi, i =,..., the equied time is O(d). Fo πi, i =,...,, the equied time is loge because they ivolve M X = i= w i(ˆβ MLE )x i x T i. If ˆβ MLE is eplaced by β 0 i w i (ˆβ MLE ) ad the the data is used to calculate a estimate of M X, the equied time is O(d ). Note that M X ca be estimated by M 0 X = ( 0) 0 i= (π i ) wi ( β 0 )x i (x i ) T based o the selected subsample, fo which the calculatio oly equies O( 0 d ) time. Howeve, we still eed O(d ) time to appoximate πi because they deped o M X x i fo i =,,...,. Based o afoemetioed discussios, the time complexity of Algoithm with π is O(d + ζ 0 0 d + ζd ), ad the time complexity of Algoithm with π is O(d + ζ 0 0 d + ζd ). Cosideig the case of a vey lage such that d, ζ 0, ζ, 0 ad ae all much smalle tha, these time complexities ae O(d) ad O(d ), espectively.. Asymptotic popeties Fo the estimato obtaied fom Algoithm based o the SSPs π, we deive its asymptotic popeties ude the followig assumptio. Assumptio. The covaiate distibutio satisfies that E(xx T ) is positive defiite ad E(e at x ) < fo ay a R d. Assumptio imposes two coditios o covaiate distibutio. The fist coditio esues that the asymptotic covaiace matix is ak. The secod coditio equies that covaiate distibutios have light tails. Clealy, the class of sub-gaussia distibutios (Buldygi ad Kozacheko, 980) satisfy this coditio. The mai esult i Owe (007) also equies this coditio. We establish the cosistecy ad asymptotic omality of β based o π. The esults ae peseted i the followig two theoems. Theoem 5. Let 0 / 0. Ude Assumptio, if the estimate β 0 based o the fist step sample exists, the, as ad, with pobability appoachig oe, fo ay

12 ɛ > 0, thee exists a fiite ɛ ad ɛ such that fo all > ɛ. P ( β ˆβ MLE / ɛ F ) < ɛ I Theoem 5, as log as the fist step sample estimate β 0 exist, the two step algoithm poduces a cosistet estimato. We do ot eve equie that 0. If the fist step subsample 0, the fom Theoem, β0 exists with pobability appoachig oe. Ude this sceaio, the esultat two-step estimato is optimal i the sese of Theoem. We peset this esult i the followig theoem. Theoem 6. Assume that 0 / 0. Ude Assumptio, as 0,, ad, coditioal o F ad β 0, i= V / ( β ˆβ MLE ) N(0, I) i distibutio, i which V = M X V cm X with V c havig the expessio of { } { } V c = y i p i (ˆβ y i p i (ˆβ MLE ) x i MLE ) x i x T i. () x i Remak. I Theoem 6, we equie that 0 to get a cosistet pilot estimate which is used to idetify the moe ifomative data poits i the secod step, but 0 should be much smalle tha so that the moe ifomative secod step subsample domiates the likelihood fuctio. Theoem 6 shows that the two-step algoithm is asymptotically moe efficiet tha the uifom subsamplig o the case-cotol subsamplig i the sese of Theoem. Fom Theoem, as 0, β 0 is also asymptotic omal, but fom Theoem, the value of t(v c ) fo its asymptotic vaiace is lage tha that fo () with the same total subsample sizes.. Stadad eo fomula As poited out by a efeee, the stadad eo of a estimato is also impotat ad eeds to be estimated. It is cucial fo statistical ifeeces such as hypothesis testig ad cofidece iteval costuctio. The asymptotic omality i Theoems ad 6 ca be used to costuct fomulas to estimate the stadad eo. A simple way is to eplace ˆβ MLE with β i the asymptotic vaiace-covaiace matix i Theoem o 6 to get the estimated vesio. This appoach, howeve, equies calculatios o the data. We give a fomula that ivolves oly the selected subsample to estimate the vaiace-covaiace matix. We popose to estimate the vaiace-covaiace matix of β usig i= whee M X = V = M X V c M X, (5) 0 + wi ( β)x i (x i ) T, ( 0 + ) π i= i

13 ad V c = 0 + {yi p i ( β)} x i (x i ) T. ( 0 + ) (π i= i ) I the above fomula, MX ad V c ae motivated by the method of momets. If β is eplace by ˆβ MLE, the M X ad V c ae ubiased estimatos of M X ad V c, espectively. Stadad eos of compoets of β ca be estimated by the squae oots of the diagoal elemets of V. We will evaluate the pefomace of the fomula i (5) usig umeical expeimets i Sectio 5. 5 Numeical examples We evaluate the pefomace of the OSMAC appoach usig sythetic ad eal data sets i this sectio. We have some additioal umeical esults i Sectio S. of the Supplemetay Mateial, i which Sectio S.. pesets additioal esults of the OSMAC appoach o ae evet data ad Sectio S.. gives ucoditioal esults. As show i Theoem, the appoximatio eo ca be abitaily small whe the subsample size gets lage eough, so ay level of accuacy ca be achieved eve usig uifom subsamplig as log as the subsample size is sufficietly lage. I ode to make fai compaisos with uifom subsamplig, we set the total subsample sizes fo a two-step pocedue the same as that fo the uifom subsamplig appoach. I the secod step of all two-step pocedues, except the local case-cotol (LCC) pocedue, we combie the two-step subsamples i estimatio. This is valid fo the OSMAC appoach. Howeve, fo the LCC pocedue, the fist step subsample caot be combied ad oly the secod step subsample ca be used. Othewise, the esultat estimato will be biased (Fithia ad Hastie, 0). 5. Simulatio expeimets I this sectio, we use umeical expeimets based o simulated data sets to evaluate the OSMAC appoach poposed i pevious sectios. Data of size = 0, 000 ae geeated fom model () with the tue value of β, β 0, beig a 7 vecto of 0.5. We coside the followig 6 simulated data sets usig diffeet distibutios of x (detailed defiitios of these distibutios ca be foud i Appedix A of Gelma et al. (0)). ) mznomal. x follows a multivaiate omal distibutio with mea 0, N(0, Σ), whee Σ ij = 0.5 I(i j) ad I() is the idicato fuctio. Fo this data set, the umbe of s ad the umbe of 0 s i the esposes ae oughly equal. This data set is efeed to as mznomal data. ) znomal. x follows a multivaiate omal distibutio with ozeo mea, N(.5, Σ). About 95% of the esposes ae s, so this data set is a example of imbalaced data ad it is efeed to as znomal data. ) uenomal. x follows a multivaiate omal distibutio with zeo mea but its compoets have uequal vaiaces. To be specific, let x = (x,...x 7 ) T, i which x i follows a omal distibutio with mea 0 ad vaiace /i ad the coelatio betwee x i ad

14 x j is 0.5 I(i j), i, j =,..., 7. Fo this data set, the umbe of s ad the umbe of 0 s i the esposes ae oughly equal. This data set is efeed to as uenomal data. ) mixnomal. x is a mixtue of two multivaiate omal distibutios with diffeet meas, i.e., x 0.5N(, Σ) + 0.5N(, Σ). Fo this case, the distibutio of x is bimodal, ad the umbe of s ad the umbe of 0 s i the esposes ae oughly equal. This data set is efeed to as mixnomal data. 5) T. x follows a multivaiate t distibutio with degees of feedom, t (0, Σ)/0. Fo this case, the distibutio of x has heavy tails ad it does ot satisfy the coditios i Sectios ad. We use this case to exam how sesitive the OSMAC appoach is to the equied assumptios. The umbe of s ad the umbe of 0 s i the esposes ae oughly equal fo this data set. It is efeed to as T data. 6) EXP. Compoets of x ae idepedet ad each has a expoetial distibutio with a ate paamete of. Fo this case, the distibutio of x is skewed ad has a heavie tail o the ight, ad the popotio of s i the esposes is about 0.8. This data set is efeed to as EXP data. I ode to clealy show the effects of diffeet distibutios of x o the SSP, we ceate boxplots of SSPs, show i Figue fo the six data sets. It is see that distibutios of covaiates have high ifluece o optimal SSPs. Compaig the figues fo the mznomal ad znomal data sets, we see that a chage i the mea makes the distibutios of SSPs damatically diffeet. Aothe evidet patte is that usig V c istead of V to defie a optimality citeio makes the SSP diffeet, especially fo the case of unomal data set which has uequal vaiaces fo diffeet compoets of the covaiate. Fo the mznomal ad T data sets, the diffeece i the SSPs ae ot evidet. Fo the EXP data set, thee ae moe poits i the two tails of the distibutios. Now we evaluate the pefomace of Algoithm based o diffeet choices of SSPs. We calculate s of β fom S = 000 subsamples usig = S S s= β (s) ˆβ MLE, whee β (s) is the estimate fom the sth subsample. Figue pesets the s of β fom Algoithm based o diffeet SSPs, whee the fist step sample size 0 is fixed at 00. Fo compaiso, we povide the esults of uifom subsamplig ad the LCC subsamplig. We also calculate the data MLE usig 000 Bootstap samples. Fo all the six data sets, SSPs π ad π always esult i smalle tha the uifom SSP, which agees with the theoetical esult that they aim to miimize the asymptotic s of the esultat estimato. If compoets of x have equal vaiaces, the OSMAC with π ad π have simila pefomaces; fo the uenomal data set this is ot tue, ad the OSMAC with π domiates the OSMAC with π. The uifom SSP eve yields the smallest. It is woth otig that both the two OSMAC methods outpefoms the uifom subsamplig method fo the T ad EXP data sets. This idicates that the OSMAC appoach has advatage ove the uifom subsamplig eve whe data do ot satisfy the assumptios imposed i Sectios ad. Fo the LCC subsamplig, it ca be less efficiet tha the OSMAC pocedue if the data set is ot vey imbalaced. It pefoms well fo the znomal data which is imbalaced. This agee with the goal of the method i dealig with imbalaced data. The LCC subsamplig does ot pefom well fo

15 log(ssp) log(ssp) log(ssp) (a) mznomal (b) znomal (c) uenomal log(ssp) log(ssp) log(ssp) (d) mixnomal (e) T. (f) EXP Figue : Boxplots of SSPs fo diffeet data sets. Logaithm is take o SSPs fo bette pesetatio of the figues. small. The mai easo is that this method caot use the fist step sample so the effective sample size is smalle tha othe methods. To ivestigate the effect of diffeet sample size allocatios betwee the two steps, we calculate s fo vaious popotios of fist step samples with fixed total subsample sizes. Results ae give i Figue with total subsample size 0 + = 800 ad 00 fo the mznomal data set. It shows that, the pefomace of a two-step algoithm impoves at fist by iceasig 0, but the it becomes less efficiet afte a cetai poit as 0 gets lage. This is because if 0 is too small, the fist step estimate is ot accuate; if 0 is too close to, the the moe ifomative secod step subsample would be small. These obsevatios idicate that, empiically, a value aoud 0. is a good choice fo 0 /( 0 + ) i ode to have a efficiet two-step algoithm. Howeve, fidig a systematic way of detemiig the optimal sample sizes allocatio betwee two steps eeds futhe study. Results fo the othe five data sets ae simila so they ae omitted to save space. Figue gives popotios of coect classificatios o the esposes usig diffeet methods. To avoid poducig ove-optimistic esults, we geeate two data sets coespodig to each of the six sceaios, use oe of them to obtai estimates with diffeet methods, ad the pefom classificatio o the othe data. The classificatio ule is to classify the espose to be if p i ( β) is lage tha 0.5, ad 0 othewise. Fo compaisos, we also use the data MLE to classify the data. As show i Figue, all the methods, except LCC with small, poduce popotios close to that fom usig the data MLE, showig the compaable pefomace of the OSMAC algoithms to that of the data appoach i classificatio. 5

16 uifom LCC uifom LCC (a) mznomal (b) znomal uifom LCC 0 uifom LCC (c) uenomal (d) mixnomal uifom LCC uifom LCC (e) T (f) EXP Figue : s fo diffeet secod step subsample size with the fist step subsample size beig fixed at 0 = 00. 6

17 uifom uifom /( 0 + ) /( 0 + ) (a) 0 + = 800 (b) 0 + = 00 Figue : s vs popotios of the fist step subsample with fixed total subsample sizes fo the mznomal data set. To assess the pefomace of the fomula i (5), we use it to calculate the estimated, i.e., t(ṽ), ad compae the aveage estimated with the empiical. Figue 5 pesets the esults fo OSMAC with π. It is see that the estimated s ae vey close to the empiical s, except fo the case of znomal data which is imbalaced. This idicates that the poposed fomula woks well if the data is ot vey imbalaced. Accodig to ou simulatio expeimets, it woks well if the popotio of s i the esposes is betwee 0.5 ad Fo moe imbalaced data o ae evets data, the fomula may ot be accuate because the popeties of the MLE ae diffeet fom these fo the egula cases (Owe, 007; Kig ad Zeg, 00). The pefomace of the fomula i (5) fo OSMAC with π is simila to that fo OSMAC with π, so esults ae omitted fo clea pesetatio of the plot. To futhe evaluate the pefomace of the poposed method i statistical ifeece, we coside cofidece iteval costuctio usig the asymptotic omality ad the estimated vaiace-covaiace matix i (5). Fo illustatio, we take the paamete of iteest as β, the fist elemet of β. The coespodig 95% cofidece iteval is costucted usig β ± Z SE β, whee SE β = V is the stadad eo of β, ad Z is the 97.5th pecetile of the stadad omal distibutio. We epeat the simulatio 000 times ad estimate the coveage pobability of the cofidece iteval by the popotio that it coves the tue vale of β. Figue 6 gives the esults. The cofidece iteval woks pefectly fo the mxnomal, uenomal ad T data. Fo mixnomal ad EXP data sets, the empiical coveage pobabilities ae slightly smalle tha the iteded cofidece level, but the esults ae acceptable. Fo the imbalaced znomal data, the coveage pobabilities ae lowe tha the omial coveage pobabilities. This agees with the fact i Figue 5 that the fomula i (5) does ot appoximate the asymptotic vaiace-covaiace matix well fo imbalace data. To evaluate the computatioal efficiecy of the subsamplig algoithms, we ecod the computig time ad umbes of iteatios of Algoithm ad the uifom subsamplig implemeted i the R pogammig laguage (R Coe Team, 05). Computatios wee caied out o a desktop uig Widow 0 with a Itel I7 pocesso ad 6GB memoy. 7

18 Popotio uifom LCC Popotio uifom LCC (a) mznomal (b) znomal Popotio uifom LCC Popotio uifom LCC (c) uenomal (d) mixnomal Popotio uifom LCC Popotio uifom LCC (e) T (f) EXP Figue : Popotios of coect classificatios fo diffeet secod step subsample size with the fist step subsample size beig fixed at 0 = 00. The gay hoizotal dashed lies ae those usig the tue paamete. 8

19 b b b b b Empiical Estimated b b b b b b b Empiical Estimated b b (a) mznomal (b) znomal b b b b b Empiical Estimated b b b b b b b Empiical Estimated b b (c) uenomal (d) mixnomal b b b b b Empiical Estimated b b b b b b b Empiical Estimated b b (e) T (f) EXP Figue 5: Estimated ad empiical s fo the OSMAC with π. The fist step subsample size is fixed at 0 = 00 ad the secod step subsample size chages. 9

20 Coveage Pobability uifom Coveage Pobability uifom (a) mznomal (b) znomal Coveage Pobability uifom Coveage Pobability uifom (c) uenomal (d) mixnomal Coveage Pobability uifom Coveage Pobability uifom (e) T (f) EXP Figue 6: Empiical coveage pobabilities fo diffeet secod step subsample size with the fist step subsample size beig fixed at 0 = 00. 0

21 Fo fai compaiso, we couted oly the CPU time used by 000 epetitios of each method. Table gives the esults fo the mznomal data set fo algoithms based o π, π, ad π UNI. The computig time fo usig the data is also give i the last ow of Table fo compaisos. It is ot supisig to obseve that the uifom subsamplig algoithm equies the least computig time because it does ot equie a additioal step to calculate the SSP. The algoithm based o π equies loge computig time tha the algoithm based o π, which agees with the theoetical aalysis i Sectio. All the subsamplig algoithms take sigificatly less computig time compaed to usig the data appoach. Table pesets the aveage umbes of iteatios i Newto s method. It shows that fo Algoithm, the fist step may equie additioal iteatios compaed to the secod step, but oveall, the equied umbes of iteatios fo all methods ae close to 7, the umbe of iteatios used by the data. This shows that usig a smalle subsample does ot icease the equied umbe of iteatios much fo Newto s method. Table : CPU secods fo the mznomal data set with 0 = 00 ad diffeet. The CPU secods fo usig the data is give i the last ow. Method Uifom Full data CPU secods:.80 Table : Aveage umbes of iteatios used i Newto s method () fo the mznomal data set with 0 = 00 ad diffeet. Fo the data, the umbe of iteatios is 7. uifom Fist step Secod step Fist step Secod step To futhe ivestigate the computatioal gai of the subsamplig appoach fo massive data volume, we icease the value of d to d = 50 ad icease the values of to be = 0, 0 5, 0 6 ad 0 7. We ecod the computig time fo the case whe x is multivaiate omal. Table pesets the esult based o oe iteatio of calculatio. It is see that as iceases, the computatioal efficiecy fo a subsamplig method elative to the data appoach is gettig moe ad moe sigificat.

22 Table : CPU secods with 0 = 00, = 000 ad diffeet data size whe the covaiates ae fom a d = 50 dimesioal omal distibutio. Method Uifom Full Numeical evaluatios fo ae evets data To ivestigate the pefomace of the poposed method fo the case of ae evets, we geeate ae evets data usig the same cofiguatios that ae used to geeate the znomal data, except that we chage the mea of x to -. o -.9. With these values,.0% ad 0.% of esposes ae i the data of size = Figue 7 pesets the esults fo these two sceaios. It is see that both ad wok well fo these two sceaios ad thei pefomaces ae simila. The uifom subsamplig is eithe stable o efficiet. Whe the evet ate is 0.%, coespodig to the subsample sizes of 00, 00, 500, 700, 900, ad 00, thee ae 90, 88, 80, 7, 65, ad 9 cases out of 000 epetitios of the simulatio that the MLE ae ot foud. Fo the cases that the MLE ae foud, the s ae ,.856,.689,.08, , ad 9.578, espectively. These s ae much lage tha those fom the OSMAC ad thus ae omitted i Figue 7 fo bette pesetatio. Fo the OSMAC, thee ae 8 cases out of 000 that the MLE ae ot foud oly whe 0 = 00 ad = 00. Fo compaiso, we also calculate the of the data appoach usig 000 Bootstap samples (the gay dashed lie). Note that the Bootstap is the uifom subsamplig with the subsample size beig equal to the data sample size. Iteestigly, it is see fom Figue 7 that OSMAC methods ca poduce s that ae much smalle tha the Bootstap s. To futhe ivestigate this iteestig case, we cay out aothe simulatio usig the exact same setup. A data is geeated i each epetitio ad hece the esultat s ae the ucoditioal s. Results ae peseted i Figue 8. Although the ucoditioal s of the OSMAC methods ae lage tha that of the data appoach, they ae vey close whe gets lage, especially whe the ae evet ate is 0.%. Hee, 0.% is the aveage pecetage of s i the esposes of all 000 simulated data. Note that the tue value of β is used i calculatig both the coditioal s ad the ucoditioal s. Compaig Figue 7 (b) ad Figue 8 (b), coditioal ifeece of OSMAC ca ideed be moe efficiet tha the data appoach fo ae evets data. These two figues also idicate that the oigial Bootstap method does ot wok pefectly fo the case of ae evets data. Fo additioal esults o moe exteme ae evets data, please ead Sectio S.. i the Supplemetay Mateial.

23 (a).0% of y i s ae (b) 0.% of y i s ae Figue 7: s fo ae evet data with diffeet secod step subsample size ad a fixed fist step subsample size 0 = 00, whee the covaiates follow multivaiate omal distibutios (a).0% of y i s ae (b) 0.% of y i s ae Figue 8: Ucoditioal s fo ae evet data with diffeet secod step subsample size ad a fixed fist step subsample size 0 = 00, whee the covaiates follow multivaiate omal distibutios.

24 5. Cesus icome data set I this sectio, we apply the poposed methods to a cesus icome data set (Kohavi, 996), which was extacted fom the 99 Cesus database. Thee ae totally 8, 8 obsevatios i this data set, ad the espose vaiable is whethe a peso s icome exceeds $50K a yea. Thee ae,687 idividuals (.9%) i the data whose icome exceed $50K a yea. Ifeetial task is to estimate the effect o icome fom the followig covaiates: x, age; x, fial weight (Flwgt); x, highest level of educatio i umeical fom; x, capital loss (LosCap); x 5, hous woked pe week. The vaiable fial weight (x ) is the umbe of people the obsevatio epesets. The values wee assiged by Populatio Divisio at the Cesus Bueau, ad they ae elated to the socio-ecoomic chaacteistic, i.e., people with simila socio-ecoomic chaacteistics have simila weights. Capital loss (x 5 ) is the loss i icome due to bad ivestmets; it is the diffeece betwee lowe sellig pices of ivestmets ad highe puchasig pices of ivestmets made by the idividual. The paamete coespodig to x i is deoted as β i fo i =,..., 5. A itecept paamete, say β 0, is also iclude i the model. Aothe iteest is to detemie whethe a peso s icome exceeds $50K a yea usig the covaiates. We obtaied the data fom the Machie Leaig Repositoy (Lichma, 0), whee it is patitioed ito a taiig set of =, 56 obsevatios ad a validatio set of 6, 8 obsevatios. Thus we apply the poposed method o the tai set ad use the validatio set to evaluate the pefomace of classificatio. Fo this data set, the data estimates usig all the obsevatio i the taiig set ae: ˆβ0 = 8.67 (0.6), ˆβ = 0.67 (0.06), ˆβ = (0.05), ˆβ = (0.07), ˆβ = 0. (0.0) ad ˆβ 5 = 0.55 (0.06), whee the umbes i the paetheses ae the associated stadad eos. Table gives the aveage of paamete estimates alog with the empiical ad estimated stadad eos fom diffeet methods based o 000 subsamples of 0 + = 00 with 0 = 00 ad = 000. It is see that all subsamplig method poduce estimates close to those fom the data appoach. I geeal, OSMAC with π ad OSMAC with π poduce the smallest stadad eos. The estimated stadad eos ae vey close to the empiical stadad eos, showig that the poposed asymptotic vaiace-covaiace fomula i (5) woks well fo the ead data. The stadad eos fo the subsample estimates ae lage tha those fo the data estimates. Howeve, they ae quite good i view of the elatively small subsample size. All methods show that the effect of each vaiable o icome is positive. Howeve, the effect of fial weight is ot sigificat at sigificace level 0.05 accodig to ay subsample-based method, while this vaiable is sigificat at the same sigificace level accodig to the data aalysis. The easo is that the subsample ifeece is ot as poweful as the data appoach due to its elatively smalle sample size. Actually, fo statistical ifeece i lage sample, o matte how small the tue paamete is, as log as it is a ozeo costat, the coespodig vaiable ca always be detected as sigificat with lage eough sample size. This is also tue fo coditioal ifeece based o a subsample if the subsample size is lage eough. It is iteestig that capital loss has a sigificatly positive effect o icome, this is because people with low icome seldom have ivestmets. Figue 9 (a) shows the s that wee calculated fom S = 000 subsamples of size 0 + with a fixed 0 = 00. I this figue, all s ae small ad go to 0 as the subsample size

25 Table : Aveage estimates fo the Adult icome data set based o 000 subsamples. The umbes i the paetheses ae the associated empiical ad aveage estimated stadad eos, espectively. I the table, β is fo age, β is fo fial weight, β is fo highest level of educatio i umeical fom, β is fo capital loss, ad β 5 is fo hous woked pe week. uifom Itecept (0.69, 0.609) (0.0, 0.8) (0.5, 0.50) β 0.68 (0.079, 0.078) 0.60 (0.068, 0.07) 0.60 (0.068, 0.067) β 0.06 (0.076, 0.077) (0.067, 0.068) 0.06 (0.06, 0.06) β 0.88 (0.090, 0.090) 0.88 (0.079, 0.075) (0.07, 0.07) β 0. (0.070, 0.07) 0. (0.058, 0.059) 0. (0.060, 0.057) β (0.085, 0.087) 0.56 (0.068, 0.070) 0.56 (0.07, 0.070) gets lage, showig the estimatio cosistecy of the subsamplig methods. The OSMAC with π always has the smallest. Figue 9 (b) gives the popotios of coect classificatios o the esposes i the validatio set fo diffeet secod step subsample sizes with a fixed 0 = 00 whe the classificatio theshold is 0.5. Fo compaiso, we also obtaied the esults of classificatio usig the data estimate which is the gay hoizotal dashed lie. Ideed, usig all the =, 56 obsevatios i the taiig set yields bette esults tha usig subsamples of much smalle sizes, but the diffeece is eally small. Oe poit woth to metio is that although the OSMAC with π always yields a smalle compaed to the OSMAC with π, its pefomace i classificatio is ifeio to the OSMAC with π. This is because π aims to miimize the asymptotic ad may ot miimize the misclassificatio ate, although the two goals ae highly elated (a) s vs uifom Popotio uifom (b) Popotios of coect classificatios vs Figue 9: s ad popotios of coect classificatios fo the adult icome data set with 0 = 00 ad diffeet secod step subsample size. The gay hoizotal dashed lie i figue (b) is the esult usig the data MLE. 5

Lecture 24: Observability and Constructibility

Lecture 24: Observability and Constructibility ectue 24: Obsevability ad Costuctibility 7 Obsevability ad Costuctibility Motivatio: State feedback laws deped o a kowledge of the cuet state. I some systems, xt () ca be measued diectly, e.g., positio

More information

Chapter 2 Sampling distribution

Chapter 2 Sampling distribution [ 05 STAT] Chapte Samplig distibutio. The Paamete ad the Statistic Whe we have collected the data, we have a whole set of umbes o desciptios witte dow o a pape o stoed o a compute file. We ty to summaize

More information

= 5! 3! 2! = 5! 3! (5 3)!. In general, the number of different groups of r items out of n items (when the order is ignored) is given by n!

= 5! 3! 2! = 5! 3! (5 3)!. In general, the number of different groups of r items out of n items (when the order is ignored) is given by n! 0 Combiatoial Aalysis Copyight by Deiz Kalı 4 Combiatios Questio 4 What is the diffeece betwee the followig questio i How may 3-lette wods ca you wite usig the lettes A, B, C, D, E ii How may 3-elemet

More information

CHAPTER 5 : SERIES. 5.2 The Sum of a Series Sum of Power of n Positive Integers Sum of Series of Partial Fraction Difference Method

CHAPTER 5 : SERIES. 5.2 The Sum of a Series Sum of Power of n Positive Integers Sum of Series of Partial Fraction Difference Method CHAPTER 5 : SERIES 5.1 Seies 5. The Sum of a Seies 5..1 Sum of Powe of Positive Iteges 5.. Sum of Seies of Patial Factio 5..3 Diffeece Method 5.3 Test of covegece 5.3.1 Divegece Test 5.3. Itegal Test 5.3.3

More information

By the end of this section you will be able to prove the Chinese Remainder Theorem apply this theorem to solve simultaneous linear congruences

By the end of this section you will be able to prove the Chinese Remainder Theorem apply this theorem to solve simultaneous linear congruences Chapte : Theoy of Modula Aithmetic 8 Sectio D Chiese Remaide Theoem By the ed of this sectio you will be able to pove the Chiese Remaide Theoem apply this theoem to solve simultaeous liea cogueces The

More information

KEY. Math 334 Midterm II Fall 2007 section 004 Instructor: Scott Glasgow

KEY. Math 334 Midterm II Fall 2007 section 004 Instructor: Scott Glasgow KEY Math 334 Midtem II Fall 7 sectio 4 Istucto: Scott Glasgow Please do NOT wite o this exam. No cedit will be give fo such wok. Rathe wite i a blue book, o o you ow pape, pefeably egieeig pape. Wite you

More information

( ) 1 Comparison Functions. α is strictly increasing since ( r) ( r ) α = for any positive real number c. = 0. It is said to belong to

( ) 1 Comparison Functions. α is strictly increasing since ( r) ( r ) α = for any positive real number c. = 0. It is said to belong to Compaiso Fuctios I this lesso, we study stability popeties of the oautoomous system = f t, x The difficulty is that ay solutio of this system statig at x( t ) depeds o both t ad t = x Thee ae thee special

More information

Using Difference Equations to Generalize Results for Periodic Nested Radicals

Using Difference Equations to Generalize Results for Periodic Nested Radicals Usig Diffeece Equatios to Geealize Results fo Peiodic Nested Radicals Chis Lyd Uivesity of Rhode Islad, Depatmet of Mathematics South Kigsto, Rhode Islad 2 2 2 2 2 2 2 π = + + +... Vieta (593) 2 2 2 =

More information

MATH Midterm Solutions

MATH Midterm Solutions MATH 2113 - Midtem Solutios Febuay 18 1. A bag of mables cotais 4 which ae ed, 4 which ae blue ad 4 which ae gee. a How may mables must be chose fom the bag to guaatee that thee ae the same colou? We ca

More information

MATH /19: problems for supervision in week 08 SOLUTIONS

MATH /19: problems for supervision in week 08 SOLUTIONS MATH10101 2018/19: poblems fo supevisio i week 08 Q1. Let A be a set. SOLUTIONS (i Pove that the fuctio c: P(A P(A, defied by c(x A \ X, is bijective. (ii Let ow A be fiite, A. Use (i to show that fo each

More information

Multivector Functions

Multivector Functions I: J. Math. Aal. ad Appl., ol. 24, No. 3, c Academic Pess (968) 467 473. Multivecto Fuctios David Hestees I a pevious pape [], the fudametals of diffeetial ad itegal calculus o Euclidea -space wee expessed

More information

Lecture 6: October 16, 2017

Lecture 6: October 16, 2017 Ifomatio ad Codig Theoy Autum 207 Lectue: Madhu Tulsiai Lectue 6: Octobe 6, 207 The Method of Types Fo this lectue, we will take U to be a fiite uivese U, ad use x (x, x 2,..., x to deote a sequece of

More information

A NOTE ON DOMINATION PARAMETERS IN RANDOM GRAPHS

A NOTE ON DOMINATION PARAMETERS IN RANDOM GRAPHS Discussioes Mathematicae Gaph Theoy 28 (2008 335 343 A NOTE ON DOMINATION PARAMETERS IN RANDOM GRAPHS Athoy Boato Depatmet of Mathematics Wilfid Lauie Uivesity Wateloo, ON, Caada, N2L 3C5 e-mail: aboato@oges.com

More information

a) The average (mean) of the two fractions is halfway between them: b) The answer is yes. Assume without loss of generality that p < r.

a) The average (mean) of the two fractions is halfway between them: b) The answer is yes. Assume without loss of generality that p < r. Solutios to MAML Olympiad Level 00. Factioated a) The aveage (mea) of the two factios is halfway betwee them: p ps+ q ps+ q + q s qs qs b) The aswe is yes. Assume without loss of geeality that p

More information

The Multivariate-t distribution and the Simes Inequality. Abstract. Sarkar (1998) showed that certain positively dependent (MTP 2 ) random variables

The Multivariate-t distribution and the Simes Inequality. Abstract. Sarkar (1998) showed that certain positively dependent (MTP 2 ) random variables The Multivaiate-t distibutio ad the Simes Iequality by Hey W. Block 1, Saat K. Saka 2, Thomas H. Savits 1 ad Jie Wag 3 Uivesity of ittsbugh 1,Temple Uivesity 2,Gad Valley State Uivesity 3 Abstact. Saka

More information

THE ANALYSIS OF SOME MODELS FOR CLAIM PROCESSING IN INSURANCE COMPANIES

THE ANALYSIS OF SOME MODELS FOR CLAIM PROCESSING IN INSURANCE COMPANIES Please cite this atle as: Mhal Matalyck Tacaa Romaiuk The aalysis of some models fo claim pocessig i isuace compaies Scietif Reseach of the Istitute of Mathemats ad Compute Sciece 004 Volume 3 Issue pages

More information

Counting Functions and Subsets

Counting Functions and Subsets CHAPTER 1 Coutig Fuctios ad Subsets This chapte of the otes is based o Chapte 12 of PJE See PJE p144 Hee ad below, the efeeces to the PJEccles book ae give as PJE The goal of this shot chapte is to itoduce

More information

Conditional Convergence of Infinite Products

Conditional Convergence of Infinite Products Coditioal Covegece of Ifiite Poducts William F. Tech Ameica Mathematical Mothly 106 1999), 646-651 I this aticle we evisit the classical subject of ifiite poducts. Fo stadad defiitios ad theoems o this

More information

Introduction to the Theory of Inference

Introduction to the Theory of Inference CSSM Statistics Leadeship Istitute otes Itoductio to the Theoy of Ifeece Jo Cye, Uivesity of Iowa Jeff Witme, Obeli College Statistics is the systematic study of vaiatio i data: how to display it, measue

More information

Strong Result for Level Crossings of Random Polynomials

Strong Result for Level Crossings of Random Polynomials IOSR Joual of haacy ad Biological Scieces (IOSR-JBS) e-issn:78-8, p-issn:19-7676 Volue 11, Issue Ve III (ay - Ju16), 1-18 wwwiosjoualsog Stog Result fo Level Cossigs of Rado olyoials 1 DKisha, AK asigh

More information

Consider unordered sample of size r. This sample can be used to make r! Ordered samples (r! permutations). unordered sample

Consider unordered sample of size r. This sample can be used to make r! Ordered samples (r! permutations). unordered sample Uodeed Samples without Replacemet oside populatio of elemets a a... a. y uodeed aagemet of elemets is called a uodeed sample of size. Two uodeed samples ae diffeet oly if oe cotais a elemet ot cotaied

More information

ZERO - ONE INFLATED POISSON SUSHILA DISTRIBUTION AND ITS APPLICATION

ZERO - ONE INFLATED POISSON SUSHILA DISTRIBUTION AND ITS APPLICATION ZERO - ONE INFLATED POISSON SUSHILA DISTRIBUTION AND ITS APPLICATION CHOOKAIT PUDPROMMARAT Depatmet of Sciece, Faculty of Sciece ad Techology, Sua Suadha Rajabhat Uivesity, Bagkok, Thailad E-mail: chookait.pu@ssu.ac.th

More information

THE ANALYTIC LARGE SIEVE

THE ANALYTIC LARGE SIEVE THE ANALYTIC LAGE SIEVE 1. The aalytic lage sieve I the last lectue we saw how to apply the aalytic lage sieve to deive a aithmetic fomulatio of the lage sieve, which we applied to the poblem of boudig

More information

Complementary Dual Subfield Linear Codes Over Finite Fields

Complementary Dual Subfield Linear Codes Over Finite Fields 1 Complemetay Dual Subfield Liea Codes Ove Fiite Fields Kiagai Booiyoma ad Somphog Jitma,1 Depatmet of Mathematics, Faculty of Sciece, Silpao Uivesity, Naho Pathom 73000, hailad e-mail : ai_b_555@hotmail.com

More information

Strong Result for Level Crossings of Random Polynomials. Dipty Rani Dhal, Dr. P. K. Mishra. Department of Mathematics, CET, BPUT, BBSR, ODISHA, INDIA

Strong Result for Level Crossings of Random Polynomials. Dipty Rani Dhal, Dr. P. K. Mishra. Department of Mathematics, CET, BPUT, BBSR, ODISHA, INDIA Iteatioal Joual of Reseach i Egieeig ad aageet Techology (IJRET) olue Issue July 5 Available at http://wwwijetco/ Stog Result fo Level Cossigs of Rado olyoials Dipty Rai Dhal D K isha Depatet of atheatics

More information

EVALUATION OF SUMS INVOLVING GAUSSIAN q-binomial COEFFICIENTS WITH RATIONAL WEIGHT FUNCTIONS

EVALUATION OF SUMS INVOLVING GAUSSIAN q-binomial COEFFICIENTS WITH RATIONAL WEIGHT FUNCTIONS EVALUATION OF SUMS INVOLVING GAUSSIAN -BINOMIAL COEFFICIENTS WITH RATIONAL WEIGHT FUNCTIONS EMRAH KILIÇ AND HELMUT PRODINGER Abstact We coside sums of the Gaussia -biomial coefficiets with a paametic atioal

More information

On ARMA(1,q) models with bounded and periodically correlated solutions

On ARMA(1,q) models with bounded and periodically correlated solutions Reseach Repot HSC/03/3 O ARMA(,q) models with bouded ad peiodically coelated solutios Aleksade Weo,2 ad Agieszka Wy oma ska,2 Hugo Steihaus Cete, Woc aw Uivesity of Techology 2 Istitute of Mathematics,

More information

Progression. CATsyllabus.com. CATsyllabus.com. Sequence & Series. Arithmetic Progression (A.P.) n th term of an A.P.

Progression. CATsyllabus.com. CATsyllabus.com. Sequence & Series. Arithmetic Progression (A.P.) n th term of an A.P. Pogessio Sequece & Seies A set of umbes whose domai is a eal umbe is called a SEQUENCE ad sum of the sequece is called a SERIES. If a, a, a, a 4,., a, is a sequece, the the expessio a + a + a + a 4 + a

More information

ELEMENTARY AND COMPOUND EVENTS PROBABILITY

ELEMENTARY AND COMPOUND EVENTS PROBABILITY Euopea Joual of Basic ad Applied Scieces Vol. 5 No., 08 ELEMENTARY AND COMPOUND EVENTS PROBABILITY William W.S. Che Depatmet of Statistics The Geoge Washigto Uivesity Washigto D.C. 003 E-mail: williamwsche@gmail.com

More information

BINOMIAL THEOREM An expression consisting of two terms, connected by + or sign is called a

BINOMIAL THEOREM An expression consisting of two terms, connected by + or sign is called a BINOMIAL THEOREM hapte 8 8. Oveview: 8.. A epessio cosistig of two tems, coected by + o sig is called a biomial epessio. Fo eample, + a, y,,7 4 5y, etc., ae all biomial epessios. 8.. Biomial theoem If

More information

Applied Mathematical Sciences, Vol. 2, 2008, no. 9, Parameter Estimation of Burr Type X Distribution for Grouped Data

Applied Mathematical Sciences, Vol. 2, 2008, no. 9, Parameter Estimation of Burr Type X Distribution for Grouped Data pplied Mathematical Scieces Vol 8 o 9 45-43 Paamete stimatio o Bu Type Distibutio o Gouped Data M ludaat M T lodat ad T T lodat 3 3 Depatmet o Statistics Yamou Uivesity Ibid Joda aludaatm@hotmailcom ad

More information

2012 GCE A Level H2 Maths Solution Paper Let x,

2012 GCE A Level H2 Maths Solution Paper Let x, GCE A Level H Maths Solutio Pape. Let, y ad z be the cost of a ticet fo ude yeas, betwee ad 5 yeas, ad ove 5 yeas categoies espectively. 9 + y + 4z =. 7 + 5y + z = 8. + 4y + 5z = 58.5 Fo ude, ticet costs

More information

The Pigeonhole Principle 3.4 Binomial Coefficients

The Pigeonhole Principle 3.4 Binomial Coefficients Discete M athematic Chapte 3: Coutig 3. The Pigeohole Piciple 3.4 Biomial Coefficiets D Patic Cha School of Compute Sciece ad Egieeig South Chia Uivesity of Techology Ageda Ch 3. The Pigeohole Piciple

More information

BINOMIAL THEOREM NCERT An expression consisting of two terms, connected by + or sign is called a

BINOMIAL THEOREM NCERT An expression consisting of two terms, connected by + or sign is called a 8. Oveview: 8.. A epessio cosistig of two tems, coected by + o sig is called a biomial epessio. Fo eample, + a, y,,7 4, etc., ae all biomial 5y epessios. 8.. Biomial theoem BINOMIAL THEOREM If a ad b ae

More information

Math 166 Week-in-Review - S. Nite 11/10/2012 Page 1 of 5 WIR #9 = 1+ r eff. , where r. is the effective interest rate, r is the annual

Math 166 Week-in-Review - S. Nite 11/10/2012 Page 1 of 5 WIR #9 = 1+ r eff. , where r. is the effective interest rate, r is the annual Math 66 Week-i-Review - S. Nite // Page of Week i Review #9 (F-F.4, 4.-4.4,.-.) Simple Iteest I = Pt, whee I is the iteest, P is the picipal, is the iteest ate, ad t is the time i yeas. P( + t), whee A

More information

Some Properties of the K-Jacobsthal Lucas Sequence

Some Properties of the K-Jacobsthal Lucas Sequence Deepia Jhala et. al. /Iteatioal Joual of Mode Scieces ad Egieeig Techology (IJMSET) ISSN 349-3755; Available at https://www.imset.com Volume Issue 3 04 pp.87-9; Some Popeties of the K-Jacobsthal Lucas

More information

Lower Bounds for Cover-Free Families

Lower Bounds for Cover-Free Families Loe Bouds fo Cove-Fee Families Ali Z. Abdi Covet of Nazaeth High School Gade, Abas 7, Haifa Nade H. Bshouty Dept. of Compute Sciece Techio, Haifa, 3000 Apil, 05 Abstact Let F be a set of blocks of a t-set

More information

FIXED POINT AND HYERS-ULAM-RASSIAS STABILITY OF A QUADRATIC FUNCTIONAL EQUATION IN BANACH SPACES

FIXED POINT AND HYERS-ULAM-RASSIAS STABILITY OF A QUADRATIC FUNCTIONAL EQUATION IN BANACH SPACES IJRRAS 6 () July 0 www.apapess.com/volumes/vol6issue/ijrras_6.pdf FIXED POINT AND HYERS-UAM-RASSIAS STABIITY OF A QUADRATIC FUNCTIONA EQUATION IN BANACH SPACES E. Movahedia Behbaha Khatam Al-Abia Uivesity

More information

Some Integral Mean Estimates for Polynomials

Some Integral Mean Estimates for Polynomials Iteatioal Mathematical Foum, Vol. 8, 23, o., 5-5 HIKARI Ltd, www.m-hikai.com Some Itegal Mea Estimates fo Polyomials Abdullah Mi, Bilal Ahmad Da ad Q. M. Dawood Depatmet of Mathematics, Uivesity of Kashmi

More information

Ch 3.4 Binomial Coefficients. Pascal's Identit y and Triangle. Chapter 3.2 & 3.4. South China University of Technology

Ch 3.4 Binomial Coefficients. Pascal's Identit y and Triangle. Chapter 3.2 & 3.4. South China University of Technology Disc ete Mathem atic Chapte 3: Coutig 3. The Pigeohole Piciple 3.4 Biomial Coefficiets D Patic Cha School of Compute Sciece ad Egieeig South Chia Uivesity of Techology Pigeohole Piciple Suppose that a

More information

On randomly generated non-trivially intersecting hypergraphs

On randomly generated non-trivially intersecting hypergraphs O adomly geeated o-tivially itesectig hypegaphs Balázs Patkós Submitted: May 5, 009; Accepted: Feb, 010; Published: Feb 8, 010 Mathematics Subject Classificatio: 05C65, 05D05, 05D40 Abstact We popose two

More information

A note on random minimum length spanning trees

A note on random minimum length spanning trees A ote o adom miimum legth spaig tees Ala Fieze Miklós Ruszikó Lubos Thoma Depatmet of Mathematical Scieces Caegie Mello Uivesity Pittsbugh PA15213, USA ala@adom.math.cmu.edu, usziko@luta.sztaki.hu, thoma@qwes.math.cmu.edu

More information

On composite conformal mapping of an annulus to a plane with two holes

On composite conformal mapping of an annulus to a plane with two holes O composite cofomal mappig of a aulus to a plae with two holes Mila Batista (July 07) Abstact I the aticle we coside the composite cofomal map which maps aulus to ifiite egio with symmetic hole ad ealy

More information

Using Counting Techniques to Determine Probabilities

Using Counting Techniques to Determine Probabilities Kowledge ticle: obability ad Statistics Usig outig Techiques to Detemie obabilities Tee Diagams ad the Fudametal outig iciple impotat aspect of pobability theoy is the ability to detemie the total umbe

More information

Supplementary materials. Suzuki reaction: mechanistic multiplicity versus exclusive homogeneous or exclusive heterogeneous catalysis

Supplementary materials. Suzuki reaction: mechanistic multiplicity versus exclusive homogeneous or exclusive heterogeneous catalysis Geeal Pape ARKIVOC 009 (xi 85-03 Supplemetay mateials Suzui eactio: mechaistic multiplicity vesus exclusive homogeeous o exclusive heteogeeous catalysis Aa A. Kuohtia, Alexade F. Schmidt* Depatmet of Chemisty

More information

Range Symmetric Matrices in Minkowski Space

Range Symmetric Matrices in Minkowski Space BULLETIN of the Bull. alaysia ath. Sc. Soc. (Secod Seies) 3 (000) 45-5 LYSIN THETICL SCIENCES SOCIETY Rae Symmetic atices i ikowski Space.R. EENKSHI Depatmet of athematics, amalai Uivesity, amalaiaa 608

More information

SVD ( ) Linear Algebra for. A bit of repetition. Lecture: 8. Let s try the factorization. Is there a generalization? = Q2Λ2Q (spectral theorem!

SVD ( ) Linear Algebra for. A bit of repetition. Lecture: 8. Let s try the factorization. Is there a generalization? = Q2Λ2Q (spectral theorem! Liea Algeba fo Wieless Commuicatios Lectue: 8 Sigula Value Decompositio SVD Ove Edfos Depatmet of Electical ad Ifomatio echology Lud Uivesity it 00-04-06 Ove Edfos A bit of epetitio A vey useful matix

More information

EDEXCEL NATIONAL CERTIFICATE UNIT 28 FURTHER MATHEMATICS FOR TECHNICIANS OUTCOME 2- ALGEBRAIC TECHNIQUES TUTORIAL 1 - PROGRESSIONS

EDEXCEL NATIONAL CERTIFICATE UNIT 28 FURTHER MATHEMATICS FOR TECHNICIANS OUTCOME 2- ALGEBRAIC TECHNIQUES TUTORIAL 1 - PROGRESSIONS EDEXCEL NATIONAL CERTIFICATE UNIT 8 FURTHER MATHEMATICS FOR TECHNICIANS OUTCOME - ALGEBRAIC TECHNIQUES TUTORIAL - PROGRESSIONS CONTENTS Be able to apply algebaic techiques Aithmetic pogessio (AP): fist

More information

Minimization of the quadratic test function

Minimization of the quadratic test function Miimizatio of the quadatic test fuctio A quadatic fom is a scala quadatic fuctio of a vecto with the fom f ( ) A b c with b R A R whee A is assumed to be SPD ad c is a scala costat Note: A symmetic mati

More information

LESSON 15: COMPOUND INTEREST

LESSON 15: COMPOUND INTEREST High School: Expoeial Fuctios LESSON 15: COMPOUND INTEREST 1. You have see this fomula fo compoud ieest. Paamete P is the picipal amou (the moey you stat with). Paamete is the ieest ate pe yea expessed

More information

This web appendix outlines sketch of proofs in Sections 3 5 of the paper. In this appendix we will use the following notations: c i. j=1.

This web appendix outlines sketch of proofs in Sections 3 5 of the paper. In this appendix we will use the following notations: c i. j=1. Web Appedix: Supplemetay Mateials fo Two-fold Nested Desigs: Thei Aalysis ad oectio with Nopaametic ANOVA by Shu-Mi Liao ad Michael G. Akitas This web appedix outlies sketch of poofs i Sectios 3 5 of the

More information

Auchmuty High School Mathematics Department Sequences & Series Notes Teacher Version

Auchmuty High School Mathematics Department Sequences & Series Notes Teacher Version equeces ad eies Auchmuty High chool Mathematics Depatmet equeces & eies Notes Teache Vesio A sequece takes the fom,,7,0,, while 7 0 is a seies. Thee ae two types of sequece/seies aithmetic ad geometic.

More information

DANIEL YAQUBI, MADJID MIRZAVAZIRI AND YASIN SAEEDNEZHAD

DANIEL YAQUBI, MADJID MIRZAVAZIRI AND YASIN SAEEDNEZHAD MIXED -STIRLING NUMERS OF THE SEOND KIND DANIEL YAQUI, MADJID MIRZAVAZIRI AND YASIN SAEEDNEZHAD Abstact The Stilig umbe of the secod id { } couts the umbe of ways to patitio a set of labeled balls ito

More information

Generalized Fibonacci-Lucas Sequence

Generalized Fibonacci-Lucas Sequence Tuish Joual of Aalysis ad Numbe Theoy, 4, Vol, No 6, -7 Available olie at http://pubssciepubcom/tjat//6/ Sciece ad Educatio Publishig DOI:6/tjat--6- Geealized Fiboacci-Lucas Sequece Bijeda Sigh, Ompaash

More information

12.6 Sequential LMMSE Estimation

12.6 Sequential LMMSE Estimation 12.6 Sequetial LMMSE Estimatio Same kid if settig as fo Sequetial LS Fied umbe of paametes (but hee they ae modeled as adom) Iceasig umbe of data samples Data Model: [ H[ θ + w[ (+1) 1 p 1 [ [[0] [] ukow

More information

r, this equation is graphed in figure 1.

r, this equation is graphed in figure 1. Washigto Uivesity i St Louis Spig 8 Depatmet of Ecoomics Pof James Moley Ecoomics 4 Homewok # 3 Suggested Solutio Note: This is a suggested solutio i the sese that it outlies oe of the may possible aswes

More information

Modelling rheological cone-plate test conditions

Modelling rheological cone-plate test conditions ANNUAL TRANSACTIONS OF THE NORDIC RHEOLOGY SOCIETY, VOL. 16, 28 Modellig heological coe-plate test coditios Reida Bafod Schülle 1 ad Calos Salas-Bigas 2 1 Depatmet of Chemisty, Biotechology ad Food Sciece,

More information

Lecture 3 : Concentration and Correlation

Lecture 3 : Concentration and Correlation Lectue 3 : Cocetatio ad Coelatio 1. Talagad s iequality 2. Covegece i distibutio 3. Coelatio iequalities 1. Talagad s iequality Cetifiable fuctios Let g : R N be a fuctio. The a fuctio f : 1 2 Ω Ω L Ω

More information

The Application of a Maximum Likelihood Approach to an Accelerated Life Testing with an Underlying Three- Parameter Weibull Model

The Application of a Maximum Likelihood Approach to an Accelerated Life Testing with an Underlying Three- Parameter Weibull Model Iteatioal Joual of Pefomability Egieeig Vol. 4, No. 3, July 28, pp. 233-24. RAMS Cosultats Pited i Idia The Applicatio of a Maximum Likelihood Appoach to a Acceleated Life Testig with a Udelyig Thee- Paamete

More information

Finite q-identities related to well-known theorems of Euler and Gauss. Johann Cigler

Finite q-identities related to well-known theorems of Euler and Gauss. Johann Cigler Fiite -idetities elated to well-ow theoems of Eule ad Gauss Joha Cigle Faultät fü Mathemati Uivesität Wie A-9 Wie, Nodbegstaße 5 email: oha.cigle@uivie.ac.at Abstact We give geealizatios of a fiite vesio

More information

Lecture 2: Stress. 1. Forces Surface Forces and Body Forces

Lecture 2: Stress. 1. Forces Surface Forces and Body Forces Lectue : Stess Geophysicists study pheomea such as seismicity, plate tectoics, ad the slow flow of ocks ad mieals called ceep. Oe way they study these pheomea is by ivestigatig the defomatio ad flow of

More information

arxiv:math/ v3 [math.oc] 5 Apr 2008

arxiv:math/ v3 [math.oc] 5 Apr 2008 Least-Squaes Pices of Games Yukio Hiashita axiv:math/0703079v3 [math.oc] 5 Ap 2008 Abstact What ae the pices of adom vaiables? I this pape, we defie the least-squaes pices of coi-flippig games, which ae

More information

Sums of Involving the Harmonic Numbers and the Binomial Coefficients

Sums of Involving the Harmonic Numbers and the Binomial Coefficients Ameica Joual of Computatioal Mathematics 5 5 96-5 Published Olie Jue 5 i SciRes. http://www.scip.og/oual/acm http://dx.doi.og/.46/acm.5.58 Sums of Ivolvig the amoic Numbes ad the Biomial Coefficiets Wuyugaowa

More information

( ) ( ) ( ) ( ) Solved Examples. JEE Main/Boards = The total number of terms in the expansion are 8.

( ) ( ) ( ) ( ) Solved Examples. JEE Main/Boards = The total number of terms in the expansion are 8. Mathematics. Solved Eamples JEE Mai/Boads Eample : Fid the coefficiet of y i c y y Sol: By usig fomula of fidig geeal tem we ca easily get coefficiet of y. I the biomial epasio, ( ) th tem is c T ( y )

More information

Advanced Physical Geodesy

Advanced Physical Geodesy Supplemetal Notes Review of g Tems i Moitz s Aalytic Cotiuatio Method. Advaced hysical Geodesy GS887 Chistophe Jekeli Geodetic Sciece The Ohio State Uivesity 5 South Oval Mall Columbus, OH 4 7 The followig

More information

RELIABILITY ASSESSMENT OF SYSTEMS WITH PERIODIC MAINTENANCE UNDER RARE FAILURES OF ITS ELEMENTS

RELIABILITY ASSESSMENT OF SYSTEMS WITH PERIODIC MAINTENANCE UNDER RARE FAILURES OF ITS ELEMENTS Y Geis ELIABILITY ASSESSMENT OF SYSTEMS WITH PEIODIC MAINTENANCE UNDE AE FAILUES OF ITS ELEMENTS T&A # (6) (Vol) 2, Mach ELIABILITY ASSESSMENT OF SYSTEMS WITH PEIODIC MAINTENANCE UNDE AE FAILUES OF ITS

More information

SOME ARITHMETIC PROPERTIES OF OVERPARTITION K -TUPLES

SOME ARITHMETIC PROPERTIES OF OVERPARTITION K -TUPLES #A17 INTEGERS 9 2009), 181-190 SOME ARITHMETIC PROPERTIES OF OVERPARTITION K -TUPLES Deick M. Keiste Depatmet of Mathematics, Pe State Uivesity, Uivesity Pak, PA 16802 dmk5075@psu.edu James A. Selles Depatmet

More information

Technical Report: Bessel Filter Analysis

Technical Report: Bessel Filter Analysis Sasa Mahmoodi 1 Techical Repot: Bessel Filte Aalysis 1 School of Electoics ad Compute Sciece, Buildig 1, Southampto Uivesity, Southampto, S17 1BJ, UK, Email: sm3@ecs.soto.ac.uk I this techical epot, we

More information

AS Mathematics. MFP1 Further Pure 1 Mark scheme June Version: 1.0 Final

AS Mathematics. MFP1 Further Pure 1 Mark scheme June Version: 1.0 Final AS Mathematics MFP Futhe Pue Mak scheme 0 Jue 07 Vesio:.0 Fial Mak schemes ae pepaed by the Lead Assessmet Wite ad cosideed, togethe with the elevat questios, by a pael of subject teaches. This mak scheme

More information

ICS141: Discrete Mathematics for Computer Science I

ICS141: Discrete Mathematics for Computer Science I Uivesity of Hawaii ICS141: Discete Mathematics fo Compute Sciece I Dept. Ifomatio & Compute Sci., Uivesity of Hawaii Ja Stelovsy based o slides by D. Bae ad D. Still Oigials by D. M. P. Fa ad D. J.L. Goss

More information

OPTIMAL ESTIMATORS FOR THE FINITE POPULATION PARAMETERS IN A SINGLE STAGE SAMPLING. Detailed Outline

OPTIMAL ESTIMATORS FOR THE FINITE POPULATION PARAMETERS IN A SINGLE STAGE SAMPLING. Detailed Outline OPTIMAL ESTIMATORS FOR THE FIITE POPULATIO PARAMETERS I A SIGLE STAGE SAMPLIG Detailed Outlie ITRODUCTIO Focu o implet poblem: We ae lookig fo a etimato fo the paamete of a fiite populatio i a igle adom

More information

Fitting the Generalized Logistic Distribution. by LQ-Moments

Fitting the Generalized Logistic Distribution. by LQ-Moments Applied Mathematical Scieces, Vol. 5, 0, o. 54, 66-676 Fittig the Geealized Logistic Distibutio by LQ-Momets Ai Shabi Depatmet of Mathematic, Uivesiti Teologi Malaysia ai@utm.my Abdul Aziz Jemai Scieces

More information

Models of network routing and congestion control

Models of network routing and congestion control Models of etok outig ad cogestio cotol Fak Kelly, Cambidge statslabcamacuk/~fak/tlks/amhesthtml Uivesity of Massachusetts mhest, Mach 26, 28 Ed-to-ed cogestio cotol sedes eceives Sedes lea though feedback

More information

On a Problem of Littlewood

On a Problem of Littlewood Ž. JOURAL OF MATHEMATICAL AALYSIS AD APPLICATIOS 199, 403 408 1996 ARTICLE O. 0149 O a Poblem of Littlewood Host Alze Mosbache Stasse 10, 51545 Waldbol, Gemay Submitted by J. L. Bee Received May 19, 1995

More information

A two-sided Iterative Method for Solving

A two-sided Iterative Method for Solving NTERNATONAL JOURNAL OF MATHEMATCS AND COMPUTERS N SMULATON Volume 9 0 A two-sided teative Method fo Solvig * A Noliea Matix Equatio X= AX A Saa'a A Zaea Abstact A efficiet ad umeical algoithm is suggested

More information

Recursion. Algorithm : Design & Analysis [3]

Recursion. Algorithm : Design & Analysis [3] Recusio Algoithm : Desig & Aalysis [] I the last class Asymptotic gowth ate he Sets Ο, Ω ad Θ Complexity Class A Example: Maximum Susequece Sum Impovemet of Algoithm Compaiso of Asymptotic Behavio Aothe

More information

9.7 Pascal s Formula and the Binomial Theorem

9.7 Pascal s Formula and the Binomial Theorem 592 Chapte 9 Coutig ad Pobability Example 971 Values of 97 Pascal s Fomula ad the Biomial Theoem I m vey well acquaited, too, with mattes mathematical, I udestad equatios both the simple ad quadatical

More information

The Discrete Fourier Transform

The Discrete Fourier Transform (7) The Discete Fouie Tasfom The Discete Fouie Tasfom hat is Discete Fouie Tasfom (DFT)? (ote: It s ot DTFT discete-time Fouie tasfom) A liea tasfomatio (mati) Samples of the Fouie tasfom (DTFT) of a apeiodic

More information

INVERSE CAUCHY PROBLEMS FOR NONLINEAR FRACTIONAL PARABOLIC EQUATIONS IN HILBERT SPACE

INVERSE CAUCHY PROBLEMS FOR NONLINEAR FRACTIONAL PARABOLIC EQUATIONS IN HILBERT SPACE IJAS 6 (3 Febuay www.apapess.com/volumes/vol6issue3/ijas_6_3_.pdf INVESE CAUCH POBLEMS FO NONLINEA FACTIONAL PAABOLIC EQUATIONS IN HILBET SPACE Mahmoud M. El-Boai Faculty of Sciece Aleadia Uivesit Aleadia

More information

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d Liear regressio Daiel Hsu (COMS 477) Maximum likelihood estimatio Oe of the simplest liear regressio models is the followig: (X, Y ),..., (X, Y ), (X, Y ) are iid radom pairs takig values i R d R, ad Y

More information

Greatest term (numerically) in the expansion of (1 + x) Method 1 Let T

Greatest term (numerically) in the expansion of (1 + x) Method 1 Let T BINOMIAL THEOREM_SYNOPSIS Geatest tem (umeically) i the epasio of ( + ) Method Let T ( The th tem) be the geatest tem. Fid T, T, T fom the give epasio. Put T T T ad. Th will give a iequality fom whee value

More information

Minimal order perfect functional observers for singular linear systems

Minimal order perfect functional observers for singular linear systems Miimal ode efect fuctioal obseves fo sigula liea systems Tadeusz aczoek Istitute of Cotol Idustial lectoics Wasaw Uivesity of Techology, -66 Waszawa, oszykowa 75, POLAND Abstact. A ew method fo desigig

More information

Math 7409 Homework 2 Fall from which we can calculate the cycle index of the action of S 5 on pairs of vertices as

Math 7409 Homework 2 Fall from which we can calculate the cycle index of the action of S 5 on pairs of vertices as Math 7409 Hoewok 2 Fall 2010 1. Eueate the equivalece classes of siple gaphs o 5 vetices by usig the patte ivetoy as a guide. The cycle idex of S 5 actig o 5 vetices is 1 x 5 120 1 10 x 3 1 x 2 15 x 1

More information

ON EUCLID S AND EULER S PROOF THAT THE NUMBER OF PRIMES IS INFINITE AND SOME APPLICATIONS

ON EUCLID S AND EULER S PROOF THAT THE NUMBER OF PRIMES IS INFINITE AND SOME APPLICATIONS Joual of Pue ad Alied Mathematics: Advaces ad Alicatios Volume 0 Numbe 03 Pages 5-58 ON EUCLID S AND EULER S PROOF THAT THE NUMBER OF PRIMES IS INFINITE AND SOME APPLICATIONS ALI H HAKAMI Deatmet of Mathematics

More information

Department of Mathematics, IST Probability and Statistics Unit

Department of Mathematics, IST Probability and Statistics Unit Depatmet of Mathematics, IST Pobability ad Statistics Uit Reliability ad Quality Cotol d. Test ( Recuso) st. Semeste / Duatio: hm // 9:AM, Room V. Please justify you aswes. This test has two pages ad fou

More information

SHIFTED HARMONIC SUMS OF ORDER TWO

SHIFTED HARMONIC SUMS OF ORDER TWO Commu Koea Math Soc 9 0, No, pp 39 55 http://dxdoiog/03/ckms0939 SHIFTED HARMONIC SUMS OF ORDER TWO Athoy Sofo Abstact We develop a set of idetities fo Eule type sums I paticula we ivestigate poducts of

More information

The number of r element subsets of a set with n r elements

The number of r element subsets of a set with n r elements Popositio: is The umbe of elemet subsets of a set with elemets Poof: Each such subset aises whe we pick a fist elemet followed by a secod elemet up to a th elemet The umbe of such choices is P But this

More information

Probabilistic Analysis of Dual-Pivot Quicksort Count

Probabilistic Analysis of Dual-Pivot Quicksort Count J. W. Goethe-Uivesität Fakfut am Mai Fachbeeich 2 Istitut fü Mathematik Pobabilistic Aalysis of Dual-Pivot Quicksot Cout Masteabeit vo Jasmi Staub Matikelumme: 495465 E-Mail: jstaub@math.ui-fakfut.de Beteue:

More information

At the end of this topic, students should be able to understand the meaning of finite and infinite sequences and series, and use the notation u

At the end of this topic, students should be able to understand the meaning of finite and infinite sequences and series, and use the notation u Natioal Jio College Mathematics Depatmet 00 Natioal Jio College 00 H Mathematics (Seio High ) Seqeces ad Seies (Lecte Notes) Topic : Seqeces ad Seies Objectives: At the ed of this topic, stdets shold be

More information

Mapping Radius of Regular Function and Center of Convex Region. Duan Wenxi

Mapping Radius of Regular Function and Center of Convex Region. Duan Wenxi d Iteatioal Cofeece o Electical Compute Egieeig ad Electoics (ICECEE 5 Mappig adius of egula Fuctio ad Cete of Covex egio Dua Wexi School of Applied Mathematics Beijig Nomal Uivesity Zhuhai Chia 363463@qqcom

More information

Disjoint Sets { 9} { 1} { 11} Disjoint Sets (cont) Operations. Disjoint Sets (cont) Disjoint Sets (cont) n elements

Disjoint Sets { 9} { 1} { 11} Disjoint Sets (cont) Operations. Disjoint Sets (cont) Disjoint Sets (cont) n elements Disjoit Sets elemets { x, x, } X =, K Opeatios x Patitioed ito k sets (disjoit sets S, S,, K Fid-Set(x - etu set cotaiig x Uio(x,y - make a ew set by combiig the sets cotaiig x ad y (destoyig them S k

More information

Generalized Near Rough Probability. in Topological Spaces

Generalized Near Rough Probability. in Topological Spaces It J Cotemp Math Scieces, Vol 6, 20, o 23, 099-0 Geealized Nea Rough Pobability i Topological Spaces M E Abd El-Mosef a, A M ozae a ad R A Abu-Gdaii b a Depatmet of Mathematics, Faculty of Sciece Tata

More information

Einstein Classes, Unit No. 102, 103, Vardhman Ring Road Plaza, Vikas Puri Extn., Outer Ring Road New Delhi , Ph. : ,

Einstein Classes, Unit No. 102, 103, Vardhman Ring Road Plaza, Vikas Puri Extn., Outer Ring Road New Delhi , Ph. : , MB BINOMIAL THEOREM Biomial Epessio : A algebaic epessio which cotais two dissimila tems is called biomial epessio Fo eample :,,, etc / ( ) Statemet of Biomial theoem : If, R ad N, the : ( + ) = a b +

More information

ON CERTAIN CLASS OF ANALYTIC FUNCTIONS

ON CERTAIN CLASS OF ANALYTIC FUNCTIONS ON CERTAIN CLASS OF ANALYTIC FUNCTIONS Nailah Abdul Rahma Al Diha Mathematics Depatmet Gils College of Educatio PO Box 60 Riyadh 567 Saudi Aabia Received Febuay 005 accepted Septembe 005 Commuicated by

More information

ECEN 5014, Spring 2013 Special Topics: Active Microwave Circuits and MMICs Zoya Popovic, University of Colorado, Boulder

ECEN 5014, Spring 2013 Special Topics: Active Microwave Circuits and MMICs Zoya Popovic, University of Colorado, Boulder ECEN 5014, Spig 013 Special Topics: Active Micowave Cicuits ad MMICs Zoya Popovic, Uivesity of Coloado, Boulde LECTURE 7 THERMAL NOISE L7.1. INTRODUCTION Electical oise is a adom voltage o cuet which is

More information

by Vitali D. Milman and Gideon Schechtman Abstract - A dierent proof is given to the result announced in [MS2]: For each

by Vitali D. Milman and Gideon Schechtman Abstract - A dierent proof is given to the result announced in [MS2]: For each AN \ISOMORPHIC" VERSION OF DVORETZKY'S THEOREM, II by Vitali D. Milma ad Gideo Schechtma Abstact - A dieet poof is give to the esult aouced i [MS2]: Fo each

More information

New Sharp Lower Bounds for the First Zagreb Index

New Sharp Lower Bounds for the First Zagreb Index SCIENTIFIC PUBLICATIONS OF THE STATE UNIVERSITY OF NOVI PAZAR SER. A:APPL. MATH. INFORM. AND MECH. vol. 8, 1 (016), 11-19. New Shap Lowe Bouds fo the Fist Zageb Idex T. Masou, M. A. Rostami, E. Suesh,

More information

A Generalization of the Deutsch-Jozsa Algorithm to Multi-Valued Quantum Logic

A Generalization of the Deutsch-Jozsa Algorithm to Multi-Valued Quantum Logic A Geealizatio of the Deutsch-Jozsa Algoithm to Multi-Valued Quatum Logic Yale Fa The Catli Gabel School 885 SW Baes Road Potlad, OR 975-6599, USA yalefa@gmail.com Abstact We geealize the biay Deutsch-Jozsa

More information

Discussion 02 Solutions

Discussion 02 Solutions STAT 400 Discussio 0 Solutios Spig 08. ~.5 ~.6 At the begiig of a cetai study of a goup of pesos, 5% wee classified as heavy smoes, 30% as light smoes, ad 55% as osmoes. I the fiveyea study, it was detemied

More information

3.1 Random variables

3.1 Random variables 3 Chapte III Random Vaiables 3 Random vaiables A sample space S may be difficult to descibe if the elements of S ae not numbes discuss how we can use a ule by which an element s of S may be associated

More information